Skip to content

robinschmid/microbe_masst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

217 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

Welcome to domainMASSTs

This repository contains the code and data for the different domain-specific MASSTs currently under development in the Dorrestein Lab at UC San Diego. This includes microbeMASST, plantMASST, tissueMASST, microbiomeMASST, and foodMASST. Aggregated search outputs can be generated and visualized using metadataMASST.

The code for the different standalone web applications, which allow users to search one spectrum at a time, can be found in GNPS_MASST

Standalone Web Apps:

  1. microbeMASST
  2. plantMASST
  3. tissueMASST
  4. microbiomeMASST
  5. foodMASST
  6. metadataMASST

Publications associated with the search tools:

  1. microbeMASST - Nature Microbiology
  2. plantMASST - bioRxiv
  3. tissueMASST - bioRxiv
  4. microbiomeMASST - bioRxiv
  5. foodMASST - npj Science of Food

Batch search of multiple spectra against all domainMASSTs

Running jobs.py allows users to leverage the Fast Search API and execute a batch search of multiple MS/MS spectra against the current indexed data in GNPS/MassIVE, Metabolomics Workbench, Metabolights, and NORMAN and generate multiple outputs for all listed domainMASSTs simultaneously.

  1. A series of interactive HTML trees files will be generated for each domain-specific MASST ending with _domain.html (e.g., _microbe.html)
  2. A series of JSON files for the different trees will be generated (e.g., _microbe.json)
  3. A _matches.tsv file will be generated. This contains all the scans found to match your searched spectrum of interest in the data that have been currently indexed. This includes also samples that are not part of the curated domain-specific MASSTs.
  4. A _library.tsv file will be generated. This contains a list of spectra from the GNPS libraries found to match your spectrum of interest. This enables a Level 2 annotation according the Metabolomics Standards Initiative.
  5. A _datasets.tsv file will be generated. This contains the number of unique samples found to be matching your searched spectrum in each currently indexed dataset.
  6. A series of _count_domain.tsv files will be generated, containing information on matches found for each specific domain MASST.

Execute batch run

  1. Navigate to the jobs.py and add entries to the files list as ("input_directory/input_file", "output_directory/output_prefix)
  2. Check and adjust the different parameters for the search, such as minimum cosine score, mz tolerance, and number of minimum matching peaks based on your research question.
  3. Run jobs.py

Note:

  1. You can run either a single .mgf file generated via MZmine, from the molecular networking in GNPS workflow, or a list of USIs provided either via a .csv or .tsv file.
  2. Make sure to run jobs.py a couple of times, until no new output is generated by having the option: skip_existing=True. Due to the Fast Search API some of the entries will fail. Nevertheless sequent re-runs should catch all the possible matches. (This should not be an issue anymore)
  3. Please make user to use Python 3.10

Lineages

Within the folder lineages you can find the complete lineage information of each NCBI taxonomy IDs used in microbeMASST and plantMASST. These tools currently cover

Tool Kingdom Phylum Class Order Family Genus Species Strain
microbeMASST 8 20 48 124 278 561 1379 542
plantMASST 1 1 11 81 319 1796 3712 NA

How to cite?

Please cite the following paper: microbeMASST: a taxonomically informed mass spectrometry search tool for microbial metabolomics data

About

Using MASST or fastMASST, adding metadata onto a tree ontology for microbes

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors