PAH-Finder: A Pattern Recognition Workflow for Identification of PAHs and Their Derivatives

IF 6.7 1区 化学 Q1 CHEMISTRY, ANALYTICAL Analytical Chemistry Pub Date : 2025-01-07 DOI:10.1021/acs.analchem.4c04249
Zixuan Zhang, Xin Xu, Shipei Xing, Changzhi Shi, Zecang You, Xiaojun Deng, Ling Tan, Zhe Mo, Mingliang Fang
{"title":"PAH-Finder: A Pattern Recognition Workflow for Identification of PAHs and Their Derivatives","authors":"Zixuan Zhang, Xin Xu, Shipei Xing, Changzhi Shi, Zecang You, Xiaojun Deng, Ling Tan, Zhe Mo, Mingliang Fang","doi":"10.1021/acs.analchem.4c04249","DOIUrl":null,"url":null,"abstract":"Polycyclic aromatic hydrocarbons (PAHs) are pervasive environmental pollutants with significant health risks due to their carcinogenic, mutagenic, and teratogenic properties. Traditional methods for PAH identification, primarily relying on gas chromatography–mass spectrometry (GC–MS), utilize spectral library searches together with other techniques, such as mass defect analysis. However, these methods are limited by incomplete spectral libraries and a high false positive rate. Here, we present PAH-Finder, a data-driven workflow that integrates machine learning with high-resolution mass spectrometry (HRMS). PAH-Finder introduces a novel approach to evaluate the fragment distribution of PAH backbones in MS spectra by normalizing fragment <i>m</i>/<i>z</i> values to a 0–100% range relative to the molecular ion peak. Seven machine learning features capture PAH fragmentation characteristics, and a random forest model trained on 98 PAH spectra and 1003 background spectra achieved an F1 score of ∼0.9 in 5-fold cross validation. Additionally, PAH-Finder leverages the presence of doubly charged fragments and molecular formula prediction to enhance the identification accuracy. In a case study, PAH-Finder identified 135 PAHs, including 7 types of previously unreported PAH formulas in particulate matter samples, demonstrating a 246% increase in annotation efficiency compared to the NIST20 library search. It also identified 32 heteroatom-doped PAHs not included in the training data set, showcasing its robustness of generalization. PAH-Finder’s high accuracy in detecting a broad spectrum of PAHs facilitates efficient data processing and interpretation for nontargeted analysis, enhancing our understanding of air pollution and public health protection. PAH-Finder is freely available at Github (https://github.com/FangLabNTU/PAH-Finder).","PeriodicalId":27,"journal":{"name":"Analytical Chemistry","volume":"30 1","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.analchem.4c04249","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Polycyclic aromatic hydrocarbons (PAHs) are pervasive environmental pollutants with significant health risks due to their carcinogenic, mutagenic, and teratogenic properties. Traditional methods for PAH identification, primarily relying on gas chromatography–mass spectrometry (GC–MS), utilize spectral library searches together with other techniques, such as mass defect analysis. However, these methods are limited by incomplete spectral libraries and a high false positive rate. Here, we present PAH-Finder, a data-driven workflow that integrates machine learning with high-resolution mass spectrometry (HRMS). PAH-Finder introduces a novel approach to evaluate the fragment distribution of PAH backbones in MS spectra by normalizing fragment m/z values to a 0–100% range relative to the molecular ion peak. Seven machine learning features capture PAH fragmentation characteristics, and a random forest model trained on 98 PAH spectra and 1003 background spectra achieved an F1 score of ∼0.9 in 5-fold cross validation. Additionally, PAH-Finder leverages the presence of doubly charged fragments and molecular formula prediction to enhance the identification accuracy. In a case study, PAH-Finder identified 135 PAHs, including 7 types of previously unreported PAH formulas in particulate matter samples, demonstrating a 246% increase in annotation efficiency compared to the NIST20 library search. It also identified 32 heteroatom-doped PAHs not included in the training data set, showcasing its robustness of generalization. PAH-Finder’s high accuracy in detecting a broad spectrum of PAHs facilitates efficient data processing and interpretation for nontargeted analysis, enhancing our understanding of air pollution and public health protection. PAH-Finder is freely available at Github (https://github.com/FangLabNTU/PAH-Finder).

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多环芳烃查找器:多环芳烃及其衍生物识别的模式识别工作流
多环芳烃(PAHs)是普遍存在的环境污染物,具有致癌性、诱变性和致畸性,对健康具有重大风险。传统的多环芳烃鉴定方法主要依靠气相色谱-质谱(GC-MS),利用光谱库搜索和其他技术,如质量缺陷分析。然而,这些方法受到光谱库不完整和假阳性率高的限制。在这里,我们介绍了PAH-Finder,这是一种数据驱动的工作流程,将机器学习与高分辨率质谱(HRMS)相结合。PAH- finder引入了一种新的方法,通过将片段m/z值归一化到相对于分子离子峰0-100%的范围,来评估MS光谱中PAH骨干的片段分布。7个机器学习特征捕获了多环芳烃的碎片化特征,在5倍交叉验证中,对98个多环芳烃光谱和1003个背景光谱进行训练的随机森林模型的F1分数达到了~ 0.9。此外,PAH-Finder利用双荷电片段的存在和分子式预测来提高鉴定准确性。在一个案例研究中,PAH- finder发现了135种多环芳烃,其中包括7种以前未报道的颗粒物样品中的多环芳烃配方,与NIST20库搜索相比,注释效率提高了246%。该方法还识别出32个未包含在训练数据集中的杂原子掺杂多环芳烃,显示了其泛化的鲁棒性。PAH-Finder在检测广谱多环芳烃方面的高精度促进了非目标分析的有效数据处理和解释,增强了我们对空气污染和公共健康保护的理解。PAH-Finder在Github (https://github.com/FangLabNTU/PAH-Finder)上免费提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Analytical Chemistry
Analytical Chemistry 化学-分析化学
CiteScore
12.10
自引率
12.20%
发文量
1949
审稿时长
1.4 months
期刊介绍: Analytical Chemistry, a peer-reviewed research journal, focuses on disseminating new and original knowledge across all branches of analytical chemistry. Fundamental articles may explore general principles of chemical measurement science and need not directly address existing or potential analytical methodology. They can be entirely theoretical or report experimental results. Contributions may cover various phases of analytical operations, including sampling, bioanalysis, electrochemistry, mass spectrometry, microscale and nanoscale systems, environmental analysis, separations, spectroscopy, chemical reactions and selectivity, instrumentation, imaging, surface analysis, and data processing. Papers discussing known analytical methods should present a significant, original application of the method, a notable improvement, or results on an important analyte.
期刊最新文献
Target-Induced Tandem Built-In Electric Fields in the Hollow SnS2/ZIS@ZnS Heterojunction for the Photoelectrochemical Immunoassay. A Six-Membered Concerted Mechanism for CO2 Capture by Amines Studied under Charged Microdroplet Reaction Conditions. Epitope-Imprinted Polymers: Fabrication Technologies and Emerging Applications. Amino Acid-Intercalated LDH as a Chiral Nanozyme for Dual-Mode Enantioselective Catalysis and Recognition. Photoresponsive HOF-Based Intelligent Packaging for Real-Time Monitoring Food Freshness.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1