Cue2: a deep learning framework for SV calling and genotyping

Installation

1. Clone the repository: $> git clone [email protected]:PopicLab/cue2.git

2. Navigate into the cue2 folder: $> cd cue2

3. Setup a Python virtual environment (recommended)

Create the virtual environment (in the env directory): $> python3.9 -m venv env
Activate the environment: $> source env/bin/activate

4. Install the framework: $> pip install .

5. Set the PYTHONPATH: export PYTHONPATH=${PYTHONPATH}:/path/to/cue2

6. Download the latest pre-trained Cue models from this Google Cloud Storage bucket

User guide

Execution

To call structural variants: $> cue call --config </path/to/config>
To train a new model: $> cue train --config </path/to/config>
To generate a training dataset: $> cue generate --config </path/to/config>

Each cue command accepts a YAML file with configuration parameters. Template config files are provided in the docs/config_templates/ directory.

The key parameters for each cue command are listed below.

call:

bam [required] path to the alignments file (BAM/CRAM format)
fa [required] path to the reference FASTA file
chr_names [optional] list of chromosomes to process: null (all) or a specific list e.g. ["chr1", "chr21"] (default: null)
model_path [required] path to the pretrained Cue model (recommended: the latest available model)
gpu_ids [optional] list of GPU ids to use for calling (default: CPU(s) will be used if empty)
n_jobs_per_gpu [optional] number of parallel jobs to launch on the same GPU (default: 1)
n_cpus [optional] number of CPUs to use for calling if no GPUs are listed (default: 1)

train:

dataset_dirs [required] list of annotated imagesets to use for training
dataset_lens [required] list containing the number of images to select from each imageset listed in dataset_dirs
gpu_ids [optional] GPU id to use for training -- a CPU will be used if empty
report_interval [optional] frequency (in number of batches) for reporting training stats and image predictions (default: 50)

generate:

bam [required] path to the alignments file (BAM/CRAM format)
vcf [required] path to the ground truth SV VCF file
fa [required] path to the reference FASTA file
n_cpus [optional] number of CPUs to use for image generation (parallelized by chromosome) (default: 1)
chr_names [optional] list of chromosomes to process: null (all) or a specific list e.g. ["chr1", "chr21"] (default: null)

Recommended workflow

Create a new directory for the experiment.
Place the YAML config file in this directory (see the provided templates).
Populate the YAML config file with the parameters specific to this experiment.
Execute the appropriate cue command providing the path to the newly configured YAML file. cue will automatically create auxiliary directories with results in the folder where the config YAML file is located.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cue		cue
docs/config_templates		docs/config_templates
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cue2: a deep learning framework for SV calling and genotyping

Table of Contents

Installation

User guide

Execution

Recommended workflow

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

PopicLab/cue2

Folders and files

Latest commit

History

Repository files navigation

Cue2: a deep learning framework for SV calling and genotyping

Table of Contents

Installation

User guide

Execution

Recommended workflow

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages