UNAGI: Deep Generative Model for Deciphering Cellular Dynamics and In-Silico Drug Discovery in Complex Diseases
Full documentations and tutorials can be accessed at UNAGI-docs.
UNAGI is a comprehensive unsupervised in-silico cellular dynamics and drug discovery framework. UNAGI adeptly deciphers cellular dynamics from human disease time-series single-cell data and facilitates in-silico drug perturbations to earmark therapeutic targets and drugs potentially active against complex human diseases. All outputs, from cellular dynamics to drug perturbations, are rendered in an interactive visual format within the UNAGI framework. Nestled within a deep learning architecture Variational Autoencoder-Generative adversarial network (VAE-GAN), UNAGI is tailored to manage diverse data distributions frequently arising post-normalization. It also innovatively employs disease-informed cell embeddings, harnessing crucial gene markers derived from the disease dataset. On achieving cell embeddings, UNAGI fabricates a graph that chronologically links cell clusters across disease stages, subsequently deducing the gene regulatory network orchestrating these connections. UNAGI is primed to leverage time-series data, enabling a precise portrayal of cellular dynamics and a superior capture of disease markers and regulators. Lastly, the deep generative prowess of the UNAGI framework powers an in-silico drug perturbation module, simulating drug impacts by manipulating the latent space informed by real drug perturbation data from the CMAP database. This allows for an empirical assessment of drug efficacy based on cellular shifts towards healthier states following drug treatment. The in-silico perturbation module can similarly be utilized to investigate therapeutic pathways, employing an approach akin to the one used in drug perturbation analysis.- 
Learning disease-specific cell embeddings through iterative training processes. 
- 
Constructing temporal dynamic graphs from time-series single-cell data and reconstructing temporal gene regulatory networks to decipher cellular dynamics. 
- 
Identifying dynamic and hierarchical static markers to profile cellular dynamics, both longitudinally and at specific time points. 
- 
Performing in-silico perturbations to identify potential therapeutic pathways and drug/compound candidates. 
Create a new conda environment
conda create -n unagi python=3.9
conda activate unagi
Installing UNAGI directly from GitHub ensures you have the latest version. (Please install directly from GitHub to use the provided Jupyter notebooks for tutorials and walkthrough examples.)
git clone https://github.com/mcgilldinglab/UNAGI.git
cd UNAGI
pip install .
- Python >=3.9 (Python3.9 is recommended)
- pyro-ppl>=1.8.6
- scanpy>=1.9.5
- anndata==0.8.0
- torch >= 2.0.0
- matplotlib>=3.7.1
Required files
Preprocessed CMAP database (Link) provides drug-gene pairs data to run UNAGI drug perturbation function.
- Option 1 : 'cmap_drug_target.npy' uses the direct drug target genes provided in CMAP LINCS 2020.
- Option 2: 'cmap_drug_treated_res_cutoff.npy' uses genes that are up/down-regulated significantly after individual drug treatments in CMAP LINCS 2020. We kept the top 5% drug-gene pairs based on level 5 MODZ score.
- 'cmap_direction_df.npy' indicates the direction of gene regulated by drugs after treatments. The drug regulation direction of gene is based on level 5 MODZ score.
- Use your own drug-target pairs, please see this tutorial.
Preprocessed IPF snRNA-seq dataset: One Drive
- UNAGI outcomes to reproduce the figures and tables generated for the manuscript.
Example dataset: Link.
- The dataset for UNAGI walkthrough demonstration.
iDREM installation:
git clone https://github.com/phoenixding/idrem.git
iDREM prerequisites:
Install the iDREM to the source folder of UNAGI
- 
Java To use iDREM, a version of Java 1.7 or later must be installed. If Java 1.7 or later is not currently installed, please refer to http://www.java.com for installation instructions. 
- 
JavaScript To enable the interactive visualization powered by Javascript. (The users are still able to run the software of-line, but Internet access is needed to view the result interactively.) 
Prepare datasets to run UNAGI.
UNAGI training and analysis on an example dataset.
Visualization on an example dataset.
Run UNAGI on Customized drug/compound database and Customized pathway database.
Predict the post-treatment gene expression changes using the PCLS data.
From loading data to downstream analysis.
Please visit UNAGI-docs for more examples and tutorials.