Nami Sakamoto, Takaki Oka, Yuki Matsuzawa, Kozo Nishida, Jayashankar Jayaprakash, Aya Hori, Makoto Arita, Hiroshi Tsugawa
{"title":"MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data.","authors":"Nami Sakamoto, Takaki Oka, Yuki Matsuzawa, Kozo Nishida, Jayashankar Jayaprakash, Aya Hori, Makoto Arita, Hiroshi Tsugawa","doi":"10.3390/metabo14110602","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background</b>: Untargeted lipidomics using collision-induced dissociation-based tandem mass spectrometry (CID-MS/MS) is essential for biological and clinical applications. However, annotation confidence still relies on manual curation by analytical chemists, despite the development of various software tools for automatic spectral processing based on rule-based fragment annotations. <b>Methods</b>: In this study, we present a novel machine learning model, MS2Lipid, for the prediction of known lipid subclasses from MS/MS queries, providing an orthogonal approach to existing lipidomics software programs in determining the lipid subclass of ion features. We designed a new descriptor, MCH (mode of carbon and hydrogen), to increase the specificity of lipid subclass prediction in nominal mass resolution MS data. <b>Results</b>: The model, trained with 6760 and 6862 manually curated MS/MS spectra for the positive and negative ion modes, respectively, classified queries into one or several of 97 lipid subclasses, achieving an accuracy of 97.4% in the test set. The program was further validated using various datasets from different instruments and curators, with the average accuracy exceeding 87.2%. Using an integrated approach with molecular spectral networking, we demonstrated the utility of MS2Lipid by annotating microbiota-derived esterified bile acids, whose abundance was significantly increased in fecal samples of obese patients in a human cohort study. This suggests that the machine learning model provides an independent criterion for lipid subclass classification, enhancing the annotation of lipid metabolites within known lipid classes. <b>Conclusions</b>: MS2Lipid is a highly accurate machine learning model that enhances lipid subclass annotation from MS/MS data and provides an independent criterion.</p>","PeriodicalId":18496,"journal":{"name":"Metabolites","volume":"14 11","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11596251/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Metabolites","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/metabo14110602","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Untargeted lipidomics using collision-induced dissociation-based tandem mass spectrometry (CID-MS/MS) is essential for biological and clinical applications. However, annotation confidence still relies on manual curation by analytical chemists, despite the development of various software tools for automatic spectral processing based on rule-based fragment annotations. Methods: In this study, we present a novel machine learning model, MS2Lipid, for the prediction of known lipid subclasses from MS/MS queries, providing an orthogonal approach to existing lipidomics software programs in determining the lipid subclass of ion features. We designed a new descriptor, MCH (mode of carbon and hydrogen), to increase the specificity of lipid subclass prediction in nominal mass resolution MS data. Results: The model, trained with 6760 and 6862 manually curated MS/MS spectra for the positive and negative ion modes, respectively, classified queries into one or several of 97 lipid subclasses, achieving an accuracy of 97.4% in the test set. The program was further validated using various datasets from different instruments and curators, with the average accuracy exceeding 87.2%. Using an integrated approach with molecular spectral networking, we demonstrated the utility of MS2Lipid by annotating microbiota-derived esterified bile acids, whose abundance was significantly increased in fecal samples of obese patients in a human cohort study. This suggests that the machine learning model provides an independent criterion for lipid subclass classification, enhancing the annotation of lipid metabolites within known lipid classes. Conclusions: MS2Lipid is a highly accurate machine learning model that enhances lipid subclass annotation from MS/MS data and provides an independent criterion.
MetabolitesBiochemistry, Genetics and Molecular Biology-Molecular Biology
CiteScore
5.70
自引率
7.30%
发文量
1070
审稿时长
17.17 days
期刊介绍:
Metabolites (ISSN 2218-1989) is an international, peer-reviewed open access journal of metabolism and metabolomics. Metabolites publishes original research articles and review articles in all molecular aspects of metabolism relevant to the fields of metabolomics, metabolic biochemistry, computational and systems biology, biotechnology and medicine, with a particular focus on the biological roles of metabolites and small molecule biomarkers. Metabolites encourages scientists to publish their experimental and theoretical results in as much detail as possible. Therefore, there is no restriction on article length. Sufficient experimental details must be provided to enable the results to be accurately reproduced. Electronic material representing additional figures, materials and methods explanation, or supporting results and evidence can be submitted with the main manuscript as supplementary material.