Machine learning methods for compound annotation in non-targeted mass spectrometry—A brief overview of fingerprinting, in silico fragmentation and de novo methods
Francesco F. Russo, Yannek Nowatzky, Carsten Jaeger, Maria K. Parr, Phillipp Benner, Thilo Muth, Jan Lisec
{"title":"Machine learning methods for compound annotation in non-targeted mass spectrometry—A brief overview of fingerprinting, in silico fragmentation and de novo methods","authors":"Francesco F. Russo, Yannek Nowatzky, Carsten Jaeger, Maria K. Parr, Phillipp Benner, Thilo Muth, Jan Lisec","doi":"10.1002/rcm.9876","DOIUrl":null,"url":null,"abstract":"<p>Non-targeted screenings (NTS) are essential tools in different fields, such as forensics, health and environmental sciences. NTSs often employ mass spectrometry (MS) methods due to their high throughput and sensitivity in comparison to, for example, nuclear magnetic resonance–based methods. As the identification of mass spectral signals, called annotation, is labour intensive, it has been used for developing supporting tools based on machine learning (ML). However, both the diversity of mass spectral signals and the sheer quantity of different ML tools developed for compound annotation present a challenge for researchers in maintaining a comprehensive overview of the field.</p><p>In this work, we illustrate which ML-based methods are available for compound annotation in non-targeted MS experiments and provide a nuanced comparison of the ML models used in MS data analysis, unravelling their unique features and performance metrics. Through this overview we support researchers to judiciously apply these tools in their daily research. This review also offers a detailed exploration of methods and datasets to show gaps in current methods, and promising target areas, offering a starting point for developers intending to improve existing methodologies.</p>","PeriodicalId":225,"journal":{"name":"Rapid Communications in Mass Spectrometry","volume":"38 20","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/rcm.9876","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Rapid Communications in Mass Spectrometry","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rcm.9876","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Non-targeted screenings (NTS) are essential tools in different fields, such as forensics, health and environmental sciences. NTSs often employ mass spectrometry (MS) methods due to their high throughput and sensitivity in comparison to, for example, nuclear magnetic resonance–based methods. As the identification of mass spectral signals, called annotation, is labour intensive, it has been used for developing supporting tools based on machine learning (ML). However, both the diversity of mass spectral signals and the sheer quantity of different ML tools developed for compound annotation present a challenge for researchers in maintaining a comprehensive overview of the field.
In this work, we illustrate which ML-based methods are available for compound annotation in non-targeted MS experiments and provide a nuanced comparison of the ML models used in MS data analysis, unravelling their unique features and performance metrics. Through this overview we support researchers to judiciously apply these tools in their daily research. This review also offers a detailed exploration of methods and datasets to show gaps in current methods, and promising target areas, offering a starting point for developers intending to improve existing methodologies.
非目标筛选(NTS)是法医、健康和环境科学等不同领域的重要工具。与基于核磁共振的方法等相比,质谱(MS)方法具有高通量和高灵敏度的特点,因此 NTS 通常采用质谱(MS)方法。质谱信号的识别(称为标注)是一项劳动密集型工作,因此一直被用于开发基于机器学习(ML)的辅助工具。然而,质谱信号的多样性和为化合物注释而开发的不同 ML 工具的数量之多,给研究人员全面了解该领域带来了挑战。在这项工作中,我们说明了哪些基于 ML 的方法可用于非靶向 MS 实验中的化合物注释,并对 MS 数据分析中使用的 ML 模型进行了细致的比较,揭示了它们的独特特征和性能指标。通过本综述,我们支持研究人员在日常研究中明智地应用这些工具。本综述还对各种方法和数据集进行了详细探讨,以显示当前方法中存在的差距和有前景的目标领域,从而为有意改进现有方法的开发人员提供一个起点。
期刊介绍:
Rapid Communications in Mass Spectrometry is a journal whose aim is the rapid publication of original research results and ideas on all aspects of the science of gas-phase ions; it covers all the associated scientific disciplines. There is no formal limit on paper length ("rapid" is not synonymous with "brief"), but papers should be of a length that is commensurate with the importance and complexity of the results being reported. Contributions may be theoretical or practical in nature; they may deal with methods, techniques and applications, or with the interpretation of results; they may cover any area in science that depends directly on measurements made upon gaseous ions or that is associated with such measurements.