{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Unprobed SeqFISH gene analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook demonstrates imputation of unprobed genes for seqFISH ST dataset.\n", "\n", "Input data can be downloaded from [seqfish.tar.gz](https://zenodo.org/record/8214466/files/seqfish.tar.gz?download=1), please extract it with\n", "```sh\n", "tar -xzvf seqfish.tar.gz\n", "```\n", "and change the variable `pre_datapath` in the next cell accordingly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import warnings\n", "import pickle\n", "\n", "import squidpy as sq\n", "import numpy as np\n", "import scanpy as sc\n", "import pandas as pd\n", "\n", "from matplotlib import pyplot as plt\n", "import seaborn as sns\n", "from transpa.util import expTransImp\n", "\n", "\n", "warnings.filterwarnings('ignore')\n", "pre_datapath = \"../../output/preprocessed_dataset/seqFISH_single_cell.pkl\"\n", "\n", "seed = 10\n", "device = torch.device(\"cuda:0\") if torch.cuda.is_available() else torch.device(\"cpu\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1. Load preprocessed SeqFISH and single-cell reference datasets. \n", "Retrieve leiden clusters if uncertainty estimation is required" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "\n", "with open(pre_datapath, 'rb') as infile:\n", " spa_adata, scrna_adata, raw_spatial_df, raw_scrna_df, raw_shared_gene = pickle.load(infile)\n", " cls_key = 'leiden'\n", " classes = scrna_adata.obs[cls_key]\n", " ct_list = np.unique(classes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2. Select genes from single cell reference data\n", "#### 2.1 Select top-1000 highly variable genes" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 32844 × 29452\n", " obs: 'cell', 'barcode', 'sample', 'pool', 'stage', 'sequencing.batch', 'theiler', 'doub.density', 'doublet', 'cluster', 'cluster.sub', 'cluster.stage', 'cluster.theiler', 'stripped', 'celltype', 'colour', 'sizeFactor', 'leiden'\n", " var: 'ENSEMBL', 'SYMBOL', 'SymbolUniq', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'\n", " uns: 'log1p', 'hvg'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sc.pp.highly_variable_genes(scrna_adata, n_top_genes=1000)\n", "scrna_adata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2.2 Uniformly select 1000 more genes\n", "Uniformly select 1000 more genes from single-cell reference, take union between highly-variable and random genes, and exclude probed 351 genes to retain the final unprobed genes" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "802 valid hv sc genes\n" ] }, { "data": { "text/plain": [ "1754" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_genes = np.setdiff1d(scrna_adata.var_names[scrna_adata.var['highly_variable']], raw_shared_gene)\n", "n_valid_hc_sc_genes = len(test_genes)\n", "print(len(test_genes), \" valid hv sc genes\")\n", "np.random.seed(42)\n", "test_genes = np.setdiff1d(np.union1d(test_genes, np.random.choice(scrna_adata.var_names, 1000, replace=False)), raw_shared_gene)\n", "len(test_genes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3. Run TransImpLR with uncertainty score estimation\n", "\n", "Run low-rank mode, raw cell as gene profile, and turn on uncertainty estimation with 200 local bootstrap samples." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[TransImp] Epoch: 2000/2000, loss: 0.781033, (IMP) 0.781033: 100%|██████████| 2000/2000 [00:32<00:00, 61.25it/s]\n" ] } ], "source": [ "res = expTransImp(\n", " df_ref=raw_scrna_df,\n", " df_tgt=raw_spatial_df,\n", " train_gene=raw_shared_gene,\n", " test_gene=np.concatenate([raw_shared_gene, test_genes]),\n", " n_simulation=200,\n", " signature_mode='cell',\n", " mapping_mode='lowrank',\n", " classes=classes,\n", " n_epochs=2000,\n", " seed=seed,\n", " device=device)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4. Prepare adata object for imputation" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "imp_adata = sc.AnnData(res[0])\n", "imp_adata.var_names = np.concatenate([raw_shared_gene, test_genes])\n", "imp_adata.obs = spa_adata.obs.copy()\n", "imp_adata.obsm = spa_adata.obsm\n", "imp_adata.obsp = spa_adata.obsp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 5. Compute spatial autocorrelation with Squidpy" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "sq.gr.spatial_autocorr(\n", " imp_adata,\n", " n_jobs=10,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Step 6. Rank unprobed genes by uncertainty scores (sigmoid transformed)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | pred_var | \n", "
|---|---|
| Rplp0 | \n", "0.498482 | \n", "
| Rpl13 | \n", "0.498531 | \n", "
| Rps27a | \n", "0.498591 | \n", "
| Rps6 | \n", "0.498615 | \n", "
| Rps14 | \n", "0.498619 | \n", "
| ... | \n", "... | \n", "
| Gm15290 | \n", "0.500156 | \n", "
| Gm14344 | \n", "0.500156 | \n", "
| Gm14243 | \n", "0.500156 | \n", "
| Gm14174 | \n", "0.500156 | \n", "
| Gm29129 | \n", "0.500156 | \n", "
1754 rows × 1 columns
\n", "