{"title":"利用红外光谱自动阐明结构。","authors":"Marvin Alberts, Teodoro Laino, Alain C. Vaucher","doi":"10.1038/s42004-024-01341-w","DOIUrl":null,"url":null,"abstract":"The application of machine learning models in chemistry has made remarkable strides in recent years. While analytical chemistry has received considerable interest from machine learning practitioners, its adoption into everyday use remains limited. Among the available analytical methods, Infrared (IR) spectroscopy stands out in terms of affordability, simplicity, and accessibility. However, its use has been limited to the identification of a selected few functional groups, as most peaks lie beyond human interpretation. We present a transformer model that enables chemists to leverage the complete information contained within an IR spectrum to directly predict the molecular structure. To cover a large chemical space, we pretrain the model using 634,585 simulated IR spectra and fine-tune it on 3,453 experimental spectra. Our approach achieves a top–1 accuracy of 44.4% and top–10 accuracy of 69.8% on compounds containing 6 to 13 heavy atoms. When solely predicting scaffolds, the model accurately predicts the top–1 scaffold in 84.5% and among the top–10 in 93.0% of cases. Infrared spectroscopy stands out as an analytical tool for its affordability, simplicity, and accessibility, however, its use has been limited to the identification of a select few functional groups, as most peaks lie beyond human interpretation. Here, the authors use a transformer model that enables chemists to leverage all information contained within an IR spectrum to directly predict the molecular structure.","PeriodicalId":10529,"journal":{"name":"Communications Chemistry","volume":" ","pages":"1-11"},"PeriodicalIF":5.9000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42004-024-01341-w.pdf","citationCount":"0","resultStr":"{\"title\":\"Leveraging infrared spectroscopy for automated structure elucidation\",\"authors\":\"Marvin Alberts, Teodoro Laino, Alain C. Vaucher\",\"doi\":\"10.1038/s42004-024-01341-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The application of machine learning models in chemistry has made remarkable strides in recent years. While analytical chemistry has received considerable interest from machine learning practitioners, its adoption into everyday use remains limited. Among the available analytical methods, Infrared (IR) spectroscopy stands out in terms of affordability, simplicity, and accessibility. However, its use has been limited to the identification of a selected few functional groups, as most peaks lie beyond human interpretation. We present a transformer model that enables chemists to leverage the complete information contained within an IR spectrum to directly predict the molecular structure. To cover a large chemical space, we pretrain the model using 634,585 simulated IR spectra and fine-tune it on 3,453 experimental spectra. Our approach achieves a top–1 accuracy of 44.4% and top–10 accuracy of 69.8% on compounds containing 6 to 13 heavy atoms. When solely predicting scaffolds, the model accurately predicts the top–1 scaffold in 84.5% and among the top–10 in 93.0% of cases. Infrared spectroscopy stands out as an analytical tool for its affordability, simplicity, and accessibility, however, its use has been limited to the identification of a select few functional groups, as most peaks lie beyond human interpretation. Here, the authors use a transformer model that enables chemists to leverage all information contained within an IR spectrum to directly predict the molecular structure.\",\"PeriodicalId\":10529,\"journal\":{\"name\":\"Communications Chemistry\",\"volume\":\" \",\"pages\":\"1-11\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2024-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.nature.com/articles/s42004-024-01341-w.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications Chemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://www.nature.com/articles/s42004-024-01341-w\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications Chemistry","FirstCategoryId":"92","ListUrlMain":"https://www.nature.com/articles/s42004-024-01341-w","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Leveraging infrared spectroscopy for automated structure elucidation
The application of machine learning models in chemistry has made remarkable strides in recent years. While analytical chemistry has received considerable interest from machine learning practitioners, its adoption into everyday use remains limited. Among the available analytical methods, Infrared (IR) spectroscopy stands out in terms of affordability, simplicity, and accessibility. However, its use has been limited to the identification of a selected few functional groups, as most peaks lie beyond human interpretation. We present a transformer model that enables chemists to leverage the complete information contained within an IR spectrum to directly predict the molecular structure. To cover a large chemical space, we pretrain the model using 634,585 simulated IR spectra and fine-tune it on 3,453 experimental spectra. Our approach achieves a top–1 accuracy of 44.4% and top–10 accuracy of 69.8% on compounds containing 6 to 13 heavy atoms. When solely predicting scaffolds, the model accurately predicts the top–1 scaffold in 84.5% and among the top–10 in 93.0% of cases. Infrared spectroscopy stands out as an analytical tool for its affordability, simplicity, and accessibility, however, its use has been limited to the identification of a select few functional groups, as most peaks lie beyond human interpretation. Here, the authors use a transformer model that enables chemists to leverage all information contained within an IR spectrum to directly predict the molecular structure.
期刊介绍:
Communications Chemistry is an open access journal from Nature Research publishing high-quality research, reviews and commentary in all areas of the chemical sciences. Research papers published by the journal represent significant advances bringing new chemical insight to a specialized area of research. We also aim to provide a community forum for issues of importance to all chemists, regardless of sub-discipline.