We introduce two metrics: EPS (Expression Predictability Score) and SPS (Slice Predictability Score), to quantify the predictability of gene expression from histology image. Python package expression_copilot is developed to calculate these metrics efficiently. It also provides several baseline models to predict gene expression from image embeddings, such as MLP and linear regression.
Important
Requires Python >= 3.10
We recommend to install expression_copilot to a new conda environment:
conda create -n eps python=3.11 -y && conda activate eps
pip install expression_copilot(Optional) If you have CUDA-enabled GPU, you could install cuml&cupy to accelerate KNN building, and install torch to accelerate MLP baseline training:
conda create -n eps_cuda -c conda-forge -c rapidsai -c nvidia python=3.11 rapids=25.06 'cuda-version>=12.0,<=12.8' -y && conda activate eps_cuda
pip install expression_copilot[torch]You could also use our pre-built docker image directly:
# GPU version
docker run --gpus all -it --rm huhansan666666/expression_copilot:latest
# CPU version
docker run -it --rm huhansan666666/expression_copilot:latestThe following code snippet shows how to calculate EPS and SPS via expression_copilot package. We assume you have already preprocessed your spatial transcriptomics data into an AnnData object (adata), where adata.X should store raw counts and adata.obsm['IMAGE_KEY_NAME'] should store image embeddings of spots. (Preprocessed steps are described in Advanced Tutorial in detail)
import scanpy as sc
import numpy as np
from expression_copilot import ExpressionCopilotModel
# Load data
# adata.X is raw counts
# adata.obsm['X_uni'] stores image embeddings of spots
url = 'https://drive.google.com/uc?id=10WD9vFgsoMoTt6g3017XxNK_bq8qp3oM'
adata = sc.read('./adata_with_image_emb.h5ad', backup_url=url)
# Init model
model = ExpressionCopilotModel(adata, image_key = 'X_uni')
# Calculate EPS and SPS
eps = model.calc_metrics_per_gene()
sps = eps.mean()
# Run baseline model (support 'ridge', 'linear', 'ensemble', 'mlp')
baseline_metrics_per_gene, _ = model.calc_baseline_metrics(method = 'mlp')We provide several tutorials in the resource/tutorials folder. You could also run them in Google Colab directly:
| Name | Description | Colab |
|---|---|---|
| Basic Tutorial | Basic tutorial of calculating EPS | |
| Advanced Tutorial | Start with 10x spatial-ranger output from scratch | |
| Multi-omics Tutorial | Calculating EPS and SPS on single cell multi-omics data |
In coming.
If you want to repeat results in the manuscript, please check the experiments folder.
Please open a new github issue if you have any question.
numbarelated bugs
We use numba to increase the speed (up to 12x). However, it may have compatibility issues with different python/numpy versions. We tested the latest version of numba (0.6.12) and it works fine with Python 3.11/3.12, numpy 1.26.
We thank the following great open-source projects for their help or inspiration: