Skip to content

gagneurlab/Modanovo

Repository files navigation

Modanovo

Modanovo is a de novo peptide sequencing tool for post-translationally modified (PTM) peptides, built on top of Casanovo (v4.0.0).


Installation

We recommend using a fresh conda environment with Python 3.10.

conda create --name modanovo-env python=3.10
conda activate modanovo-env

Install the dependencies:

  1. PyTorch (pick the command matching your CUDA/CPU setup from the PyTorch site; generic example):
pip3 install torch
  1. Depthcharge-MS (pinned commit):
pip install git+https://github.com/wfondrie/depthcharge.git@bd2861f
  1. Clone this repository:
git clone https://github.com/gagneurlab/Modanovo.git
cd modanovo
  1. Install Modanovo:
pip install .

For development (editable install):

pip install -e .

Usage

Modanovo supports three modes: training, evaluation, and inference.

Training

Train a new model from scratch:

modanovo train -c <config_path> -p <val_paths> <train_paths>

Fine-tune from pretrained Casanovo weights:

modanovo train -c <config_path> -m <model_path> -p <val_paths> <train_paths>

Where <model_path> points to the pretrained Casanovo v4.0.0 weights.

Evaluation

Evaluate a trained model on validation/test spectra:

modanovo evaluate -c <config_path> -m <model_path> -p <val_paths>

Inference

Run Modanovo in inference mode:

modanovo sequence -c <config_path> -m <model_path> -o <out_path>

This writes peptide sequence predictions in .mzTab format.


Quickstart (example)

Assuming you’ve installed Modanovo and have a model checkpoint:

# from the repo root
modanovo sequence \
  -c modanovo/config.yaml \
  -m path/to/casanovo_or_modanovo_weights.ckpt \
  -o outputs/predictions.mztab

Make sure that the defined residues are compatible with the model weights. Leaving the config entry expanded_residues in the configuration file empty uses Casanovo's tokens. By default, fine-tuning residues are those from the MULTI-PTM dataset in PROSPECT-PTM.


Example data & configs


Compatibility

  • Compatible with Casanovo v4.0.0 weights and formats.

References

  • Casanovo: _Yilmaz, Melih, William E Fondrie, Wout Bittremieux, et al. 2024. “Sequence-to-Sequence Translation from Mass Spectra to Peptides with a Transformer Model.” Nature Communications 15 (1): 6427.

  • PROSPECT-PTM: Gabriel, Wassim, Omar Shouman, Ayla Schroeder, Florian Boessl, and Mathias Wilhelm. 2024. “PROSPECT PTMs: Rich Labeled Tandem Mass Spectrometry Dataset of Modified Peptides for Machine Learning in Proteomics.” Advances in Neural Information Processing Systems 37.


Citation

If you use Modanovo in your research, please cite:

FIXME

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages