G0-FM is a foundation model to classify G0, slow and fast cycling states in single cell RNA-seq cancer data. This model has currently been trained on breast cancer only.
G0-FM/
├── src/ # Source code directory
│ ├── DLM_exp_helpers/ # Helper functions for experiments
│ ├── scLLM/ # Main model implementation
│ └── scLLM_support_data/ # Supporting data and utilities
├── Data/ # Data directory for training and testing
├── Exp/ # Experiment results and configurations
├── Outputs/ # Model outputs and predictions
└── LICENSE # License information
- Python 3.8+
- CUDA compatible GPU (recommended)
- Conda (recommended for environment management)
- Create and activate a new conda environment:
conda create -n g0lm python=3.8
conda activate g0lm- Install required packages:
pip install torch torchvision
pip install scanpy anndata
pip install scikit-learn pandas numpy matplotlib
pip install lifelines dill tqdmThe project workflow consists of several key steps, each with its specific purpose and outputs:
Located in Exp/step0_preprocess/
- Processes raw single-cell RNA sequencing data
- Performs quality control and filtering
- Normalizes data and prepares it for model input
- Outputs preprocessed data in AnnData (.h5ad) format
Located in Exp/step1_train_phase1/
- Initial model training phase
- Trains the model on breast cancer dataset
- Focuses on learning cell cycle state patterns
- Includes model checkpoints and training logs
Located in Exp/step2_train_phase2/
- Fine-tuning phase of the model
- Adapts the model for specific cancer types
- Optimizes performance on G0 state classification
- Generates refined model weights
Located in Exp/step3_embedding_space/
- Analyzes the learned cell state representations
- Visualizes embedding space using dimensionality reduction
- Identifies cell state clusters
- Provides insights into model's internal representations
Located in Exp/step4_eval/
- Comprehensive model evaluation
- Performs cross-validation
- Generates performance metrics
- Creates visualization of results
Each step contains detailed Jupyter notebooks and scripts with specific parameters and configurations. To reproduce results:
- Start with preprocessing your data following notebooks in step0
- Follow the sequential steps (1-4) in the
Exp/directory - Each step's directory contains README files with specific instructions
- Results and outputs will be saved in the
Outputs/directory
Check the Exp/ directory for example notebooks and scripts demonstrating various use cases and experiments.
This project is licensed under the terms specified in the LICENSE file.
If you use this model in your research, please cite: [Citation information to be added]
For questions and feedback, please contact Shi Pan at UCL Genetics Institute.