Cell type deconvoluation & ST velocity
Key APIs
expDeconv
expVeloImp
External packages:
Data download
Input data can be downloaded from Mouse_brain.tar.gz, please extract it with
tar -xzvf Mouse_brain.tar.gz
and change the variables RNA_PATH and ST_PATH in the next cell accordingly.
I. Cell type deconvoluation
Translate cell types from single cell reference to ST data (full-rank cluster mode)
In addition to shared genes,
TransDeconvrequires single cell type lables and per-cell annotations as input;After fitting the translation function,
TransDeconvwill output the predictions as well as the weights for each cell type
preds, weight = expDeconv(
adata_ref=rna_adata,
adata_tgt=HybISS,
seed=seed,
device=device
)
[1]:
import scvelo as scv
import scanpy as sc
import numpy as np
import pandas as pd
from transpa.util import expVeloImp, leiden_cluster, expDeconv, build_adata, run_velocity
import torch
import warnings
print(scv.__version__)
print(sc.__version__)
warnings.filterwarnings('ignore')
seed = 10
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
# load preprocessed scRNA-seq and spatial datasets
RNA_PATH = 'Mouse_brain/RNA_adata.h5ad'
ST_PATH = 'Mouse_brain/HybISS_adata.h5ad'
RNA = sc.read_h5ad(RNA_PATH)
HybISS = sc.read_h5ad(ST_PATH)
RNA, HybISS
0.3.4
1.11.5
[1]:
(AnnData object with n_obs × n_vars = 40733 × 16907
obs: 'CellID', 'Age', 'Region', 'Tissue', 'Class', 'Subclass', 'initial_size_spliced', 'initial_size_unspliced', 'initial_size', 'n_counts'
var: 'GeneName', 'gene_count_corr'
layers: 'spliced', 'unspliced',
AnnData object with n_obs × n_vars = 4628 × 119
obs: 'n_counts'
var: 'GeneName'
obsm: 'X_xy_loc', 'xy_loc')
The following code prepares input for expDeconv, it
filters SC types with less than 2 samples
prepare cell cluster labels and annotations
[2]:
sc.pp.pca(HybISS, n_comps=30)
celltype_counts = RNA.obs['Class'].value_counts()
celltype_drop = celltype_counts.index[celltype_counts < 2]
rna_adata = RNA.copy()
if len(celltype_drop):
print(f'Drop celltype {list(celltype_drop)} (contain less 2 samples)')
rna_adata = RNA[~RNA.obs['Class'].isin(celltype_drop),].copy()
classes = rna_adata.obs.Class.values
ct_list = np.unique(classes)
Run the main cell type deconvoluation method
Normalize result and select the top1 cell type
[3]:
preds, weight = expDeconv(
adata_ref=rna_adata,
adata_tgt=HybISS,
label_key='Class',
seed=seed,
device=device
)
df_results = pd.DataFrame(weight, columns=ct_list)
df_results = (df_results.T/df_results.sum(axis=1)).T
HybISS.obs['Class'] = df_results.columns[np.argmax(df_results.values, axis=1)]
[expDeconv] 40733 ref cells, 4628 tgt spots, 19 cell types, 117 shared genes
[expDeconv] Auto-detected raw counts: False
[expDeconv] Selected 117 marker genes (topk=50, 19 cell types)
[expDeconv] Leiden clustering: 28 clusters
[expDeconv] Augmented target with cluster means -> 4656 total rows (4628 original spots)
[expDeconv] Score-init weights computed (4656 spots x 19 cell types)
[expDeconv] Starting fit_deconv (n_epochs=8000, lr=0.01, wd=0.001, wt_js=2.0)
[fit_deconv] Epoch 1000/8000, loss: 6.937523
[fit_deconv] Epoch 2000/8000, loss: 3.534264
[fit_deconv] Epoch 3000/8000, loss: 2.646825
[fit_deconv] Epoch 4000/8000, loss: 2.290432
[fit_deconv] Epoch 5000/8000, loss: 2.118853
[fit_deconv] Epoch 6000/8000, loss: 2.031667
[fit_deconv] Epoch 7000/8000, loss: 1.992047
[fit_deconv] Epoch 8000/8000, loss: 1.984471
[expDeconv] Training complete
[expDeconv] Prediction done: 4656 spots x 117 cell types
Plot the deconvolution result
[4]:
HybISS.obsm['spatial'] = HybISS.obsm['X_xy_loc'] * np.array([[1, -1]])
sc.pl.spatial(HybISS, color='Class', title="CellType Deconvolution Result (Top-1 Cell Type)", spot_size=0.8)
II. RNA velocity imputation
Translate spliced and unspliced count matrices from SC to ST domain (Low-rank cell mode), before applying velocity estimation methods.
Provide
adata_raw(pre-normalisation AnnData) so that S/U are extracted automaticallyOutput is imputed Spliced, Unspliced and other related matrices
res = expVeloImp(
adata_ref=RNA,
adata_tgt=HybISS,
test_gene=RNA.var_names,
signature_mode='cluster',
n_epochs=2000,
seed=seed,
device=device,
adata_raw=RNA_raw)
S, U, V, X = res
Check spliced and unspliced expressions
[5]:
scv.pl.proportions(RNA, 'Class')
RNA_raw = RNA.copy()
scv.pp.filter_and_normalize(RNA, retain_genes=HybISS.var_names)
WARNING: Did not normalize X as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize spliced as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize unspliced as it looks processed already. To enforce normalization, set `enforce=True`.
SC RNA Velocity
[6]:
run_velocity(RNA, mode='stochastic')
scv.pl.velocity_embedding_grid(RNA,
vkey="stc_velocity",
basis='X_umap', color="Class",
legend_fontsize=5,
size=50,
legend_loc='right',
arrow_length=2,
dpi=100,
title='RNA Velocity')
computing neighbors
finished (0:00:23) --> added
'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
finished (0:01:17) --> added
'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
computing velocities
finished (0:04:04) --> added
'stc_velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 10/96 cores)
WARNING: Unable to create progress bar. Consider installing `tqdm` as `pip install tqdm` and `ipywidgets` as `pip install ipywidgets`,
or disable the progress bar using `show_progress_bar=False`.
finished (0:02:59) --> added
'stc_velocity_graph', sparse matrix with cosine correlations (adata.uns)
--> added 'stc_velocity_length' (adata.obs)
--> added 'stc_velocity_confidence' (adata.obs)
--> added 'stc_velocity_confidence_transition' (adata.obs)
computing velocity embedding
finished (0:00:12) --> added
'stc_velocity_umap', embedded velocity vectors (adata.obsm)
Prepare data
[7]:
print(f'RNA: {RNA.shape}, HybISS: {HybISS.shape}')
RNA: (40733, 16907), HybISS: (4628, 119)
Run main imputation method for ST velocity estimation
[8]:
res = expVeloImp(
adata_ref=RNA,
adata_tgt=HybISS,
test_gene=RNA.var_names,
signature_mode='cell',
mapping_mode='lowrank',
classes=classes,
ct_list=ct_list,
label_key='Class',
n_epochs=1000,
seed=seed,
device=device,
adata_raw=RNA_raw)
S, U, V, X = res
Organize new anndata for imputation result
[9]:
imp_adata = build_adata(S, U, X, HybISS, RNA)
run_velocity(imp_adata)
computing neighbors
finished (0:00:00) --> added
'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
Normalized count data: X, spliced, unspliced.
computing moments based on connectivities
finished (0:00:07) --> added
'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
computing velocities
finished (0:00:11) --> added
'stc_velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 10/96 cores)
finished (0:00:24) --> added
'stc_velocity_graph', sparse matrix with cosine correlations (adata.uns)
--> added 'stc_velocity_length' (adata.obs)
--> added 'stc_velocity_confidence' (adata.obs)
--> added 'stc_velocity_confidence_transition' (adata.obs)
[9]:
AnnData object with n_obs × n_vars = 4627 × 16907
obs: 'n_counts', 'Class', 'leiden', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'stc_velocity_self_transition', 'stc_velocity_length', 'stc_velocity_confidence', 'stc_velocity_confidence_transition'
var: 'gene_count_corr', 'mean', 'std', 'stc_velocity_gamma', 'stc_velocity_qreg_ratio', 'stc_velocity_r2', 'stc_velocity_genes'
uns: 'pca', 'Class_colors', 'neighbors', 'umap', 'leiden', 'stc_velocity_params', 'stc_velocity_graph', 'stc_velocity_graph_neg'
obsm: 'X_xy_loc', 'xy_loc', 'X_pca', 'spatial', 'X_umap'
layers: 'spliced', 'unspliced', 'Ms', 'Mu', 'stc_velocity', 'variance_stc_velocity'
obsp: 'distances', 'connectivities'
Ploting ST velocity
[10]:
scv.pl.proportions(imp_adata, 'Class')
[11]:
scv.pl.velocity_embedding_grid(imp_adata,
vkey="stc_velocity",
basis='xy_loc', color="Class",
legend_loc='right',
scale=0.5,
arrow_size=1.5,
arrow_length=2,
size=30,
alpha=0.8,
dpi=150,
title='ST velocity')
computing velocity embedding
finished (0:00:01) --> added
'stc_velocity_xy_loc', embedded velocity vectors (adata.obsm)
[ ]: