Machine Learning-Based SERS Chemical Space for Two-Way Prediction of Structures and Spectra of Untrained Molecules.

IF 14.4 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Journal of the American Chemical Society Pub Date : 2025-02-14 DOI:10.1021/jacs.4c15804
Jaslyn Ru Ting Chen, Emily Xi Tan, Jingxiang Tang, Shi Xuan Leong, Sean Kai Xun Hue, Chi Seng Pun, In Yee Phang, Xing Yi Ling
{"title":"Machine Learning-Based SERS Chemical Space for Two-Way Prediction of Structures and Spectra of Untrained Molecules.","authors":"Jaslyn Ru Ting Chen, Emily Xi Tan, Jingxiang Tang, Shi Xuan Leong, Sean Kai Xun Hue, Chi Seng Pun, In Yee Phang, Xing Yi Ling","doi":"10.1021/jacs.4c15804","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying unknown molecules beyond existing databases remains challenging in surface-enhanced Raman scattering (SERS) spectroscopy. Conventional SERS analysis relies on matching experimental and cataloged spectra, limiting identification to known molecules in databases. With a vast chemical space of >10<sup>60</sup> molecules, it is impractical to obtain the spectra of every molecule and rely solely on <i>in silico</i> techniques for spectral predictions. Here, we showcase an ML-based SERS chemical space that leverages key spectra-structure correlations to achieve two-way spectra-to-structure and structure-to-spectra predictions for untrained molecules with a >90% average accuracy. Using a SERS chemical space comprising 38 linear molecules from four classes (alcohols, aldehydes, amines, and carboxylic acids), our experimental and <i>in silico</i> studies reveal underlying spectral features that enable the prediction of untrained molecules represented by two molecular descriptors (functional group and carbon chain length). For forward spectra-to-structure predictions, we devise a two-step \"classification and regression\" ML framework to sequentially predict the functional group and carbon chain length of untrained molecules with 100% accuracy and ≤1 carbon difference, respectively. In addition, using an eXtreme Gradient Boosting (XGBoost) regressor trained on the two molecular descriptors, we attain inverse structure-to-spectra prediction with a high average cosine similarity of 90.4% between the predicted and experimental spectra. Our ML-based SERS chemical space represents a shift in molecular identification from traditional spectral matching to predictive modeling of spectra-structure relationships. These insights could motivate the expansion of SERS chemical spaces and realize demands for present and future SERS technologiesfor accurate unknown identification across diverse fields.</p>","PeriodicalId":49,"journal":{"name":"Journal of the American Chemical Society","volume":" ","pages":""},"PeriodicalIF":14.4000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Chemical Society","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/jacs.4c15804","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Identifying unknown molecules beyond existing databases remains challenging in surface-enhanced Raman scattering (SERS) spectroscopy. Conventional SERS analysis relies on matching experimental and cataloged spectra, limiting identification to known molecules in databases. With a vast chemical space of >1060 molecules, it is impractical to obtain the spectra of every molecule and rely solely on in silico techniques for spectral predictions. Here, we showcase an ML-based SERS chemical space that leverages key spectra-structure correlations to achieve two-way spectra-to-structure and structure-to-spectra predictions for untrained molecules with a >90% average accuracy. Using a SERS chemical space comprising 38 linear molecules from four classes (alcohols, aldehydes, amines, and carboxylic acids), our experimental and in silico studies reveal underlying spectral features that enable the prediction of untrained molecules represented by two molecular descriptors (functional group and carbon chain length). For forward spectra-to-structure predictions, we devise a two-step "classification and regression" ML framework to sequentially predict the functional group and carbon chain length of untrained molecules with 100% accuracy and ≤1 carbon difference, respectively. In addition, using an eXtreme Gradient Boosting (XGBoost) regressor trained on the two molecular descriptors, we attain inverse structure-to-spectra prediction with a high average cosine similarity of 90.4% between the predicted and experimental spectra. Our ML-based SERS chemical space represents a shift in molecular identification from traditional spectral matching to predictive modeling of spectra-structure relationships. These insights could motivate the expansion of SERS chemical spaces and realize demands for present and future SERS technologiesfor accurate unknown identification across diverse fields.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在表面增强拉曼散射(SERS)光谱学中,识别现有数据库之外的未知分子仍然具有挑战性。传统的 SERS 分析依赖于匹配实验光谱和编目光谱,从而将识别范围限制在数据库中的已知分子。由于存在大于 1060 种分子的巨大化学空间,获取每种分子的光谱并完全依赖于硅学技术进行光谱预测是不切实际的。在这里,我们展示了基于 ML 的 SERS 化学空间,它利用关键的光谱-结构相关性,对未经训练的分子进行光谱-结构和结构-光谱双向预测,平均准确率大于 90%。通过使用由四类(醇、醛、胺和羧酸)38 个线性分子组成的 SERS 化学空间,我们的实验和硅学研究揭示了潜在的光谱特征,这些特征能够预测由两个分子描述符(官能团和碳链长度)代表的未经训练的分子。对于从光谱到结构的正向预测,我们设计了一个两步 "分类和回归 "ML 框架,以顺序预测未训练分子的官能团和碳链长,准确率分别为 100%,碳链差≤1。此外,利用在两个分子描述符上训练的梯度提升(XGBoost)回归器,我们实现了结构到光谱的反向预测,预测光谱和实验光谱之间的平均余弦相似度高达 90.4%。我们基于 ML 的 SERS 化学空间代表了分子识别从传统光谱匹配到光谱-结构关系预测建模的转变。这些见解将推动 SERS 化学空间的扩展,实现当前和未来 SERS 技术在不同领域准确识别未知分子的需求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
24.40
自引率
6.00%
发文量
2398
审稿时长
1.6 months
期刊介绍: The flagship journal of the American Chemical Society, known as the Journal of the American Chemical Society (JACS), has been a prestigious publication since its establishment in 1879. It holds a preeminent position in the field of chemistry and related interdisciplinary sciences. JACS is committed to disseminating cutting-edge research papers, covering a wide range of topics, and encompasses approximately 19,000 pages of Articles, Communications, and Perspectives annually. With a weekly publication frequency, JACS plays a vital role in advancing the field of chemistry by providing essential research.
期刊最新文献
Analysis of the TiO2 Photoanode Process Using Intensity Modulated Photocurrent Spectroscopy and Distribution of Relaxation Times Red Light Mediated Photoconversion of Silicon Rhodamines to Oxygen Rhodamines for Single-Molecule Microscopy Light-Independent Fe3O4–Methanosarcina acetivorans Biohybrid Enhances Nitrogen Fixation and Methanogenesis Rapid Microwave-Assisted Chemical Recycling of Poly(p-Phenylene Terephthalamide) Verdazyl-Based Radicals for High-Field Dynamic Nuclear Polarization NMR
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1