CSU-MS2

This is the code repo for the paper Contrastively Spectral-structural Unification between MS/MS Spectra and Molecular Structures Enabling Cross-Modal Retrieval for Compound Identification. We developed a method named CSU-MS2 to cross-modal match MS/MS spectra against molecular structures for compound identification.

Package required:

We recommend to use conda and pip.

Installation

The main packages can be seen in requirements.txt

Install Anaconda https://www.anaconda.com/

Install main packages in requirements.txt with following commands

 conda create --name CSU-MS2 python=3.8.18
 conda activate CSU-MS2
 python -m pip install -r requirements.txt

Model training

Train the model based on your own Structure-Spectrum training dataset with run.py function. Multi-gpu or multi-node parallel training can be performed using Distributed Data Parallel (DDP) provided in the code.

main(rank, world_size, num_gpus, rank_is_set, ds_args)

Library searching

Searching in a smiles library with search_library.py function. users Users can load the different collision energy level model according to the collision energy setting, or load three energy level models, and use the weighted scores of different energy levels as the final score with search_user_defined_library.py

#this is an example code using single model for cross-modal retrieval
config_path = "/model/low_energy/checkpoints/config.yaml"
single_collision_energy_pretrain_model_path = "/model/low_energy/checkpoints/checkpoints/model.pth"
model_inference = ModelInference(config_path=config_path,
                             pretrain_model_path=single_collision_energy_pretrain_model_path,
                             device="cpu")
output_file='.../'
os.mkdir(output_file)
ms_list=list(load_from_mgf(".../.mgf"))
reference_library = pd.read_csv('...')
for i in tqdm(range(len(ms_list))):
        result=pd.DataFrame(columns=['smiles','score'])
        spectrum = ms_list[i]
        spectrum = spectrum_processing(spectrum)
        ms_feature = model_inference.ms2_encode(ms_list[i:i+1])
        query_ms = float(spectrum.metadata['precursor_mz'])-1.008
        search_res=search_structure_from_mass(reference_library, query_ms, 10)
        smiles_lst = list(search_res['SMILES'])
        smiles_feature, smiles_list = get_feature(smiles_lst,save_name=None,
            model_inference=model_inference,n=1,flag_get_value=True)
        indice, score, candidate = get_topK_result(library=smiles_list,ms_feature=ms_feature, 
                                          smiles_feature=smiles_feature, topK=100)
        result['smiles']=candidate[0]
        result['score']=score[0]
        result.to_csv(output_file+'results'+str(i)+'.csv')

CSU-MS2 web server and Dataset

The CSU-MS2 web server and CSU-MS2-DB are hosted on Hugging Face, and can be visited through the following links:

🌐 CSU-MS2 web server: The application interface allows users to upload unknow spectra and accsess results in real time. Visit the app here: CSU-MS2 web server.
📂 CSU-MS2-DB: Explore the dataset here: CSU-MS2-DB.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
CSU-MS2		CSU-MS2
Config		Config
data		data
image		image
model		model
README.md		README.md
calculate_distance.py		calculate_distance.py
infer.py		infer.py
requirements.txt		requirements.txt
search_library.py		search_library.py
search_library_example.ipynb		search_library_example.ipynb
search_user_defined_library.py		search_user_defined_library.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CSU-MS2

Package required:

Installation

Model training

Library searching

CSU-MS2 web server and Dataset

About

Uh oh!

Releases

Packages

Languages

tingxiecsu/CSU-MS2

Folders and files

Latest commit

History

Repository files navigation

CSU-MS2

Package required:

Installation

Model training

Library searching

CSU-MS2 web server and Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages