This repository contains the code and data for the different domain-specific MASSTs currently under development in the Dorrestein Lab at UC San Diego. This includes microbeMASST, plantMASST, tissueMASST, microbiomeMASST, and foodMASST. Aggregated search outputs can be generated and visualized using metadataMASST.
The code for the different standalone web applications, which allow users to search one spectrum at a time, can be found in GNPS_MASST
Standalone Web Apps:
Publications associated with the search tools:
- microbeMASST - Nature Microbiology
- plantMASST - bioRxiv
- tissueMASST - bioRxiv
- microbiomeMASST - bioRxiv
- foodMASST - npj Science of Food
Running jobs.py allows users to leverage the Fast Search API and execute a batch search of multiple MS/MS spectra against the current indexed data in GNPS/MassIVE, Metabolomics Workbench, Metabolights, and NORMAN and generate multiple outputs for all listed domainMASSTs simultaneously.
- A series of interactive HTML trees files will be generated for each domain-specific MASST ending with _domain.html (e.g., _microbe.html)
- A series of JSON files for the different trees will be generated (e.g., _microbe.json)
- A _matches.tsv file will be generated. This contains all the scans found to match your searched spectrum of interest in the data that have been currently indexed. This includes also samples that are not part of the curated domain-specific MASSTs.
- A _library.tsv file will be generated. This contains a list of spectra from the GNPS libraries found to match your spectrum of interest. This enables a Level 2 annotation according the Metabolomics Standards Initiative.
- A _datasets.tsv file will be generated. This contains the number of unique samples found to be matching your searched spectrum in each currently indexed dataset.
- A series of _count_domain.tsv files will be generated, containing information on matches found for each specific domain MASST.
- Navigate to the jobs.py and add entries to the files list as
("input_directory/input_file", "output_directory/output_prefix) - Check and adjust the different parameters for the search, such as minimum cosine score, mz tolerance, and number of minimum matching peaks based on your research question.
- Run jobs.py
- You can run either a single .mgf file generated via MZmine, from the molecular networking in GNPS workflow, or a list of USIs provided either via a .csv or .tsv file.
- Make sure to run jobs.py a couple of times, until no new output is generated by having the option:
skip_existing=True. Due to the Fast Search API some of the entries will fail. Nevertheless sequent re-runs should catch all the possible matches. (This should not be an issue anymore) - Please make user to use Python 3.10
Within the folder lineages you can find the complete lineage information of each NCBI taxonomy IDs used in microbeMASST and plantMASST. These tools currently cover
| Tool | Kingdom | Phylum | Class | Order | Family | Genus | Species | Strain |
|---|---|---|---|---|---|---|---|---|
| microbeMASST | 8 | 20 | 48 | 124 | 278 | 561 | 1379 | 542 |
| plantMASST | 1 | 1 | 11 | 81 | 319 | 1796 | 3712 | NA |
Please cite the following paper: microbeMASST: a taxonomically informed mass spectrometry search tool for microbial metabolomics data