TransPeakNet通过多任务预训练和无监督学习进行溶剂感知的二维核磁共振预测。

IF 6.2 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Communications Chemistry Pub Date : 2025-02-20 DOI:10.1038/s42004-025-01455-9
Yunrui Li, Hao Xu, Ambrish Kumar, Duo-Sheng Wang, Christian Heiss, Parastoo Azadi, Pengyu Hong
{"title":"TransPeakNet通过多任务预训练和无监督学习进行溶剂感知的二维核磁共振预测。","authors":"Yunrui Li, Hao Xu, Ambrish Kumar, Duo-Sheng Wang, Christian Heiss, Parastoo Azadi, Pengyu Hong","doi":"10.1038/s42004-025-01455-9","DOIUrl":null,"url":null,"abstract":"<p><p>Nuclear Magnetic Resonance (NMR) spectroscopy is essential for revealing molecular structure, electronic environment, and dynamics. Accurate NMR shift prediction allows researchers to validate structures by comparing predicted and observed shifts. While Machine Learning (ML) has improved one-dimensional (1D) NMR shift prediction, predicting 2D NMR remains challenging due to limited annotated data. To address this, we introduce an unsupervised training framework for predicting cross-peaks in 2D NMR, specifically Heteronuclear Single Quantum Coherence (HSQC). Our approach pretrains an ML model on an annotated 1D dataset of <sup>1</sup>H and <sup>13</sup>C shifts, then finetunes it in an unsupervised manner using unlabeled HSQC data, which simultaneously generates cross-peak annotations. Our model also adjusts for solvent effects. Evaluation on 479 expert-annotated HSQC spectra demonstrates our model's superiority over traditional methods (ChemDraw and Mestrenova), achieving Mean Absolute Errors (MAEs) of 2.05 ppm and 0.165 ppm for <sup>13</sup>C shifts and <sup>1</sup>H shifts respectively. Our algorithmic annotations show a 95.21% concordance with experts' assignments, underscoring the approach's potential for structural elucidation in fields like organic chemistry, pharmaceuticals, and natural products.</p>","PeriodicalId":10529,"journal":{"name":"Communications Chemistry","volume":"8 1","pages":"51"},"PeriodicalIF":6.2000,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11842623/pdf/","citationCount":"0","resultStr":"{\"title\":\"TransPeakNet for solvent-aware 2D NMR prediction via multi-task pre-training and unsupervised learning.\",\"authors\":\"Yunrui Li, Hao Xu, Ambrish Kumar, Duo-Sheng Wang, Christian Heiss, Parastoo Azadi, Pengyu Hong\",\"doi\":\"10.1038/s42004-025-01455-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Nuclear Magnetic Resonance (NMR) spectroscopy is essential for revealing molecular structure, electronic environment, and dynamics. Accurate NMR shift prediction allows researchers to validate structures by comparing predicted and observed shifts. While Machine Learning (ML) has improved one-dimensional (1D) NMR shift prediction, predicting 2D NMR remains challenging due to limited annotated data. To address this, we introduce an unsupervised training framework for predicting cross-peaks in 2D NMR, specifically Heteronuclear Single Quantum Coherence (HSQC). Our approach pretrains an ML model on an annotated 1D dataset of <sup>1</sup>H and <sup>13</sup>C shifts, then finetunes it in an unsupervised manner using unlabeled HSQC data, which simultaneously generates cross-peak annotations. Our model also adjusts for solvent effects. Evaluation on 479 expert-annotated HSQC spectra demonstrates our model's superiority over traditional methods (ChemDraw and Mestrenova), achieving Mean Absolute Errors (MAEs) of 2.05 ppm and 0.165 ppm for <sup>13</sup>C shifts and <sup>1</sup>H shifts respectively. Our algorithmic annotations show a 95.21% concordance with experts' assignments, underscoring the approach's potential for structural elucidation in fields like organic chemistry, pharmaceuticals, and natural products.</p>\",\"PeriodicalId\":10529,\"journal\":{\"name\":\"Communications Chemistry\",\"volume\":\"8 1\",\"pages\":\"51\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-02-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11842623/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications Chemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1038/s42004-025-01455-9\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1038/s42004-025-01455-9","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

核磁共振波谱是揭示分子结构、电子环境和动力学的重要手段。准确的核磁共振位移预测允许研究人员通过比较预测和观察到的位移来验证结构。虽然机器学习(ML)改进了一维(1D)核磁共振偏移预测,但由于注释数据有限,预测二维核磁共振仍然具有挑战性。为了解决这个问题,我们引入了一个无监督的训练框架来预测二维核磁共振的交叉峰,特别是异核单量子相干(HSQC)。我们的方法在1H和13C移位的注释1D数据集上预训练ML模型,然后使用未标记的HSQC数据以无监督的方式对其进行微调,同时生成交叉峰注释。我们的模型还对溶剂效应进行了调整。对479个专家注释的HSQC光谱的评估表明,我们的模型优于传统方法(ChemDraw和Mestrenova),在13C位移和1H位移下,平均绝对误差(MAEs)分别为2.05 ppm和0.165 ppm。我们的算法注释与专家作业的一致性为95.21%,强调了该方法在有机化学,药物和天然产物等领域的结构解析潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TransPeakNet for solvent-aware 2D NMR prediction via multi-task pre-training and unsupervised learning.

Nuclear Magnetic Resonance (NMR) spectroscopy is essential for revealing molecular structure, electronic environment, and dynamics. Accurate NMR shift prediction allows researchers to validate structures by comparing predicted and observed shifts. While Machine Learning (ML) has improved one-dimensional (1D) NMR shift prediction, predicting 2D NMR remains challenging due to limited annotated data. To address this, we introduce an unsupervised training framework for predicting cross-peaks in 2D NMR, specifically Heteronuclear Single Quantum Coherence (HSQC). Our approach pretrains an ML model on an annotated 1D dataset of 1H and 13C shifts, then finetunes it in an unsupervised manner using unlabeled HSQC data, which simultaneously generates cross-peak annotations. Our model also adjusts for solvent effects. Evaluation on 479 expert-annotated HSQC spectra demonstrates our model's superiority over traditional methods (ChemDraw and Mestrenova), achieving Mean Absolute Errors (MAEs) of 2.05 ppm and 0.165 ppm for 13C shifts and 1H shifts respectively. Our algorithmic annotations show a 95.21% concordance with experts' assignments, underscoring the approach's potential for structural elucidation in fields like organic chemistry, pharmaceuticals, and natural products.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Communications Chemistry
Communications Chemistry Chemistry-General Chemistry
CiteScore
7.70
自引率
1.70%
发文量
146
审稿时长
13 weeks
期刊介绍: Communications Chemistry is an open access journal from Nature Research publishing high-quality research, reviews and commentary in all areas of the chemical sciences. Research papers published by the journal represent significant advances bringing new chemical insight to a specialized area of research. We also aim to provide a community forum for issues of importance to all chemists, regardless of sub-discipline.
期刊最新文献
Growth of fatty acid vesicles coupled with amino acid sequences of peptides toward evolvable protocells. Amination-degradation of super engineering plastics for the construction of surface emissive resin materials. Expanding the paradigm of glycopeptide antibiotic recognition through molecular dynamics simulations. Reframing chemistry education in the age of automation and AI. A quantum-mechanical framework for million-atom scale biological systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1