Hao Li, Da Long, Li Yuan, Yu Wang, Yonghong Tian, Xinchang Wang, Fanyang Mo
{"title":"Decoupled peak property learning for efficient and interpretable electronic circular dichroism spectrum prediction.","authors":"Hao Li, Da Long, Li Yuan, Yu Wang, Yonghong Tian, Xinchang Wang, Fanyang Mo","doi":"10.1038/s43588-024-00757-7","DOIUrl":null,"url":null,"abstract":"<p><p>Electronic circular dichroism (ECD) spectra contain key information about molecular chirality by discriminating the absolute configurations of chiral molecules, which is crucial in asymmetric organic synthesis and the drug industry. However, existing predictive approaches lack the consideration of ECD spectra owing to the data scarcity and the limited interpretability to achieve trustworthy prediction. Here we establish a large-scale dataset for chiral molecular ECD spectra and propose ECDFormer for accurate and interpretable ECD spectrum prediction. ECDFormer decomposes ECD spectra into peak entities, uses the QFormer architecture to learn peak properties and renders peaks into spectra. Compared with spectrum sequence prediction methods, our decoupled peak prediction approach substantially enhances both accuracy and efficiency, improving the peak symbol accuracy from 37.3% to 72.7% and decreasing the time cost from an average of 4.6 central processing unit hours to 1.5 s. Moreover, ECDFormer demonstrated its ability to capture molecular orbital information directly from spectral data using the explainable peak-decoupling approach. Furthermore, ECDFormer proved to be equally proficient at predicting various types of spectrum, including infrared and mass spectroscopies, highlighting its substantial generalization capabilities.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":""},"PeriodicalIF":12.0000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43588-024-00757-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Electronic circular dichroism (ECD) spectra contain key information about molecular chirality by discriminating the absolute configurations of chiral molecules, which is crucial in asymmetric organic synthesis and the drug industry. However, existing predictive approaches lack the consideration of ECD spectra owing to the data scarcity and the limited interpretability to achieve trustworthy prediction. Here we establish a large-scale dataset for chiral molecular ECD spectra and propose ECDFormer for accurate and interpretable ECD spectrum prediction. ECDFormer decomposes ECD spectra into peak entities, uses the QFormer architecture to learn peak properties and renders peaks into spectra. Compared with spectrum sequence prediction methods, our decoupled peak prediction approach substantially enhances both accuracy and efficiency, improving the peak symbol accuracy from 37.3% to 72.7% and decreasing the time cost from an average of 4.6 central processing unit hours to 1.5 s. Moreover, ECDFormer demonstrated its ability to capture molecular orbital information directly from spectral data using the explainable peak-decoupling approach. Furthermore, ECDFormer proved to be equally proficient at predicting various types of spectrum, including infrared and mass spectroscopies, highlighting its substantial generalization capabilities.