{"title":"Property Prediction for Complex Compounds Using Structure-Free Mendeleev Encoding and Machine Learning.","authors":"Zixin Zhuang, Amanda S Barnard","doi":"10.1021/acs.jcim.4c01343","DOIUrl":null,"url":null,"abstract":"<p><p>Predicting the properties for unseen materials exclusively on the basis of the chemical formula before synthesis and characterization has advantages for research and resource planning. This can be achieved using suitable structure-free encoding and machine learning methods, but additional processing decisions are required. In this study, we compare a variety of structure-free materials encodings and machine learning algorithms to predict the structure/property relationships of battery materials. It was found that the physical units used to measure the property labels have an important impact on the predictive ability of the models, regardless of the computational approach. Property labels with respect to weight give excellent performance, but property labels with respect to volume cannot be predicted with confidence using only chemical information, even when the underlying physical characteristics are the same. These results contrast with previous studies of unsupervised learning and classification, where structure-free encoding excelled, and highlight how the structural features or property labels of materials are represented plays an important role in the predictive ability of machine learning models.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c01343","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
Predicting the properties for unseen materials exclusively on the basis of the chemical formula before synthesis and characterization has advantages for research and resource planning. This can be achieved using suitable structure-free encoding and machine learning methods, but additional processing decisions are required. In this study, we compare a variety of structure-free materials encodings and machine learning algorithms to predict the structure/property relationships of battery materials. It was found that the physical units used to measure the property labels have an important impact on the predictive ability of the models, regardless of the computational approach. Property labels with respect to weight give excellent performance, but property labels with respect to volume cannot be predicted with confidence using only chemical information, even when the underlying physical characteristics are the same. These results contrast with previous studies of unsupervised learning and classification, where structure-free encoding excelled, and highlight how the structural features or property labels of materials are represented plays an important role in the predictive ability of machine learning models.
期刊介绍:
The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery.
Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field.
As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.