Skip to content
/ GFETM Public

Genome Foundation Model-Based Embedding Topic Model for scATAC-seq Modeling (Conference Version Published in RECOMB 2024)

fym0503/GFETM

Repository files navigation

GFETM: Genome Foundation-based Embedded Topic Model for scATAC-seq Modeling

Preprint Link: https://www.biorxiv.org/content/10.1101/2023.11.09.566403v1.full.pdf

RECOMB 2024 Conference version: https://dl.acm.org/doi/10.1007/978-1-0716-3989-4_20

framework_gfetm

Environment Configuration and Installation

git clone https://github.com/fym0503/GFETM.git
cd GFETM
conda create -n GFETM_env python=3.8.10
conda activate GFETM_env
pip install -r requirements.txt
pip install -e .

Dataset Preparation

We provided a sample dataset in Google Drive as instructed in data. To pre-process your own datasets, please follow the instructions in https://github.com/fym0503/GFETM/blob/main/scripts/general/data_preprocess.ipynb. To proceed with the preprocessing, please make sure your dataset has a .h5ad format with .var['chr','start','end'] indicating the chromosomes, start position and end positin of the peak coordinates.

Tutorials

We provided a tutorial at https://github.com/fym0503/GFETM/blob/main/scripts/tutorial_minimal.ipynb to perform experiments on the human HSC dataset.

Reproducibility

We provided some scripts for replicating figures in our study at https://github.com/fym0503/GFETM/blob/main/reproducibility/.

Contact

The full paper is still under review. If you have any questions about the code, feel free to propose an issue or email at [email protected]

About

Genome Foundation Model-Based Embedding Topic Model for scATAC-seq Modeling (Conference Version Published in RECOMB 2024)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published