Brent Vela, Trevor Hastings, Marshall Allen and Raymundo Arróyave
Multi-Principal Element Alloys (MPEAs) have emerged as an exciting area of research in materials science in the 2020s, owing to the vast potential for discovering alloys with unique and tailored properties enabled by the combinations of elements. However, the chemical complexity of MPEAs poses a significant challenge in visualizing composition–property relationships in high-dimensional design spaces. Without effective visualization techniques, designing chemically complex alloys is practically impossible. In this methods article, we present a suite of visualization techniques that allow for meaningful and insightful visualizations of MPEA composition spaces and property spaces. Our contribution to this suite are projections of entire alloy spaces for the purposes of design. We deploy this of visualization techniques on the following MPEA case studies: (1) constraint-satisfaction alloy design scheme, (2) Bayesian optimization alloy design campaigns, (3) and various other scenarios in the ESI. Furthermore, we show how this method can be applied to any barycentric design space. While there is no one-size-fits-all visualization technique, our toolbox offers a range of methods and best practices that can be tailored to specific MPEA research needs. This article is intended for materials scientists interested in performing research on multi-principal element alloys, chemically complex alloys, or high entropy alloys and is expected to facilitate the discovery of novel and tailored properties in MPEAs.
{"title":"Visualizing high entropy alloy spaces: methods and best practices†","authors":"Brent Vela, Trevor Hastings, Marshall Allen and Raymundo Arróyave","doi":"10.1039/D4DD00262H","DOIUrl":"https://doi.org/10.1039/D4DD00262H","url":null,"abstract":"<p >Multi-Principal Element Alloys (MPEAs) have emerged as an exciting area of research in materials science in the 2020s, owing to the vast potential for discovering alloys with unique and tailored properties enabled by the combinations of elements. However, the chemical complexity of MPEAs poses a significant challenge in visualizing composition–property relationships in high-dimensional design spaces. Without effective visualization techniques, designing chemically complex alloys is practically impossible. In this methods article, we present a suite of visualization techniques that allow for meaningful and insightful visualizations of MPEA composition spaces and property spaces. Our contribution to this suite are projections of entire alloy spaces for the purposes of design. We deploy this of visualization techniques on the following MPEA case studies: (1) constraint-satisfaction alloy design scheme, (2) Bayesian optimization alloy design campaigns, (3) and various other scenarios in the ESI. Furthermore, we show how this method can be applied to any barycentric design space. While there is no one-size-fits-all visualization technique, our toolbox offers a range of methods and best practices that can be tailored to specific MPEA research needs. This article is intended for materials scientists interested in performing research on multi-principal element alloys, chemically complex alloys, or high entropy alloys and is expected to facilitate the discovery of novel and tailored properties in MPEAs.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 181-194"},"PeriodicalIF":6.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00262h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Utkarsh Pratiush, Hiroshi Funakubo, Rama Vasudevan, Sergei V. Kalinin and Yongtao Liu
Microscopy plays a foundational role in materials science, biology, and nanotechnology, offering high-resolution imaging and detailed insights into properties at the nanoscale and atomic level. Microscopy automation via active machine learning approaches is a transformative advancement, offering increased efficiency, reproducibility, and the capability to perform complex experiments. Our previous work on autonomous experimentation with scanning probe microscopy (SPM) demonstrated an active learning framework using deep kernel learning (DKL) for structure–property relationship discovery. Here we extend this approach to a multi-stage decision process to incorporate prior knowledge and human interest into DKL-based workflows, we operationalize these workflows in SPM. By integrating expected rewards from structure libraries or spectroscopic features, we enhanced the exploration efficiency of autonomous microscopy, demonstrating more efficient and targeted exploration in autonomous microscopy. These methods can be seamlessly applied to other microscopy and imaging techniques. Furthermore, the concept can be adapted for general Bayesian optimization in material discovery across a broad range of autonomous experimental fields.
{"title":"Scientific exploration with expert knowledge (SEEK) in autonomous scanning probe microscopy with active learning†","authors":"Utkarsh Pratiush, Hiroshi Funakubo, Rama Vasudevan, Sergei V. Kalinin and Yongtao Liu","doi":"10.1039/D4DD00277F","DOIUrl":"https://doi.org/10.1039/D4DD00277F","url":null,"abstract":"<p >Microscopy plays a foundational role in materials science, biology, and nanotechnology, offering high-resolution imaging and detailed insights into properties at the nanoscale and atomic level. Microscopy automation <em>via</em> active machine learning approaches is a transformative advancement, offering increased efficiency, reproducibility, and the capability to perform complex experiments. Our previous work on autonomous experimentation with scanning probe microscopy (SPM) demonstrated an active learning framework using deep kernel learning (DKL) for structure–property relationship discovery. Here we extend this approach to a multi-stage decision process to incorporate prior knowledge and human interest into DKL-based workflows, we operationalize these workflows in SPM. By integrating expected rewards from structure libraries or spectroscopic features, we enhanced the exploration efficiency of autonomous microscopy, demonstrating more efficient and targeted exploration in autonomous microscopy. These methods can be seamlessly applied to other microscopy and imaging techniques. Furthermore, the concept can be adapted for general Bayesian optimization in material discovery across a broad range of autonomous experimental fields.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 252-263"},"PeriodicalIF":6.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arash Khajeh, Xiangyun Lei, Weike Ye, Zhenze Yang, Linda Hung, Daniel Schweigert and Ha-Kyung Kwon
In this work, we introduce a computational polymer discovery framework that efficiently designs polymers with tailored properties. The framework comprises three core components—a conditioned generative model, a computational evaluation module, and a feedback mechanism—all integrated into an iterative framework for material innovation. To demonstrate the efficacy of this framework, we used it to design polymer electrolyte materials with high ionic conductivity. A conditional generative model based on the minGPT architecture can generate candidate polymers that exhibit a mean ionic conductivity that is greater than that of the original training set. This approach, coupled with molecular dynamics (MD) simulations for testing and a specifically planned acquisition mechanism, allows the framework to refine its output iteratively. Notably, we observe an increase in both the mean and the lower bound of the ionic conductivity of the new polymer candidates. The framework's effectiveness is underscored by its identification of 14 distinct polymer repeating units that display a computed ionic conductivity surpassing that of polyethylene oxide (PEO).
{"title":"A materials discovery framework based on conditional generative models applied to the design of polymer electrolytes†","authors":"Arash Khajeh, Xiangyun Lei, Weike Ye, Zhenze Yang, Linda Hung, Daniel Schweigert and Ha-Kyung Kwon","doi":"10.1039/D4DD00293H","DOIUrl":"https://doi.org/10.1039/D4DD00293H","url":null,"abstract":"<p >In this work, we introduce a computational polymer discovery framework that efficiently designs polymers with tailored properties. The framework comprises three core components—a conditioned generative model, a computational evaluation module, and a feedback mechanism—all integrated into an iterative framework for material innovation. To demonstrate the efficacy of this framework, we used it to design polymer electrolyte materials with high ionic conductivity. A conditional generative model based on the minGPT architecture can generate candidate polymers that exhibit a mean ionic conductivity that is greater than that of the original training set. This approach, coupled with molecular dynamics (MD) simulations for testing and a specifically planned acquisition mechanism, allows the framework to refine its output iteratively. Notably, we observe an increase in both the mean and the lower bound of the ionic conductivity of the new polymer candidates. The framework's effectiveness is underscored by its identification of 14 distinct polymer repeating units that display a computed ionic conductivity surpassing that of polyethylene oxide (PEO).</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 11-20"},"PeriodicalIF":6.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00293h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Active learning and design–build–test–learn strategies are increasingly employed to accelerate materials discovery and characterization. Many data-driven materials design campaigns require that materials are synthesizable, stable, soluble, recyclable, or non-toxic. Resources are wasted when materials are recommended that do not satisfy these constraints. Acquiring this knowledge during the design campaign is inefficient, and many materials constraints transcend specific design objectives. However, there is no consensus on the most data-efficient algorithm for classifying whether a material satisfies a constraint. To address this gap, we comprehensively compare the performance of 100 strategies for classifying chemical and materials behavior. Performance is assessed across 31 classification tasks sourced from the literature in chemical and materials science. From these results, we recommend best practices for building data-efficient classifiers, showing the neural network- and random forest-based active learning algorithms are most efficient across tasks. We also show that classification task complexity can be quantified by task metafeatures, most notably the noise-to-signal ratio. These metafeatures are then used to rationalize the data efficiency of different molecular representations and the impact of domain size on task complexity. Overall, this work provides a comprehensive survey of data-efficient classification strategies, identifies attributes of top-performing strategies, and suggests avenues for further study.
{"title":"Data efficiency of classification strategies for chemical and materials design†","authors":"Quinn M. Gallagher and Michael A. Webb","doi":"10.1039/D4DD00298A","DOIUrl":"https://doi.org/10.1039/D4DD00298A","url":null,"abstract":"<p >Active learning and design–build–test–learn strategies are increasingly employed to accelerate materials discovery and characterization. Many data-driven materials design campaigns require that materials are synthesizable, stable, soluble, recyclable, or non-toxic. Resources are wasted when materials are recommended that do not satisfy these constraints. Acquiring this knowledge during the design campaign is inefficient, and many materials constraints transcend specific design objectives. However, there is no consensus on the most data-efficient algorithm for classifying whether a material satisfies a constraint. To address this gap, we comprehensively compare the performance of 100 strategies for classifying chemical and materials behavior. Performance is assessed across 31 classification tasks sourced from the literature in chemical and materials science. From these results, we recommend best practices for building data-efficient classifiers, showing the neural network- and random forest-based active learning algorithms are most efficient across tasks. We also show that classification task complexity can be quantified by task metafeatures, most notably the noise-to-signal ratio. These metafeatures are then used to rationalize the data efficiency of different molecular representations and the impact of domain size on task complexity. Overall, this work provides a comprehensive survey of data-efficient classification strategies, identifies attributes of top-performing strategies, and suggests avenues for further study.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 135-148"},"PeriodicalIF":6.2,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00298a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joseph D Clark, Xuenan Mi, Douglas A Mitchell, Diwakar Shukla
Ribosomally synthesized and post-translationally modified peptide (RiPP) biosynthetic enzymes often exhibit promiscuous substrate preferences that cannot be reduced to simple rules. Large language models are promising tools for predicting the specificity of RiPP biosynthetic enzymes. However, state-of-the-art protein language models are trained on relatively few peptide sequences. A previous study comprehensively profiled the peptide substrate preferences of LazBF (a two-component serine dehydratase) and LazDEF (a three-component azole synthetase) from the lactazole biosynthetic pathway. We demonstrated that masked language modeling of LazBF substrate preferences produced language model embeddings that improved downstream prediction of both LazBF and LazDEF substrates. Similarly, masked language modeling of LazDEF substrate preferences produced embeddings that improved prediction of both LazBF and LazDEF substrates. Our results suggest that the models learned functional forms that are transferable between distinct enzymatic transformations that act within the same biosynthetic pathway. We found that a single high-quality data set of substrates and non-substrates for a RiPP biosynthetic enzyme improved substrate prediction for distinct enzymes in data-scarce scenarios. We then fine-tuned models on each data set and showed that the fine-tuned models provided interpretable insight that we anticipate will facilitate the design of substrate libraries that are compatible with desired RiPP biosynthetic pathways.
{"title":"Substrate prediction for RiPP biosynthetic enzymes <i>via</i> masked language modeling and transfer learning.","authors":"Joseph D Clark, Xuenan Mi, Douglas A Mitchell, Diwakar Shukla","doi":"10.1039/d4dd00170b","DOIUrl":"10.1039/d4dd00170b","url":null,"abstract":"<p><p>Ribosomally synthesized and post-translationally modified peptide (RiPP) biosynthetic enzymes often exhibit promiscuous substrate preferences that cannot be reduced to simple rules. Large language models are promising tools for predicting the specificity of RiPP biosynthetic enzymes. However, state-of-the-art protein language models are trained on relatively few peptide sequences. A previous study comprehensively profiled the peptide substrate preferences of LazBF (a two-component serine dehydratase) and LazDEF (a three-component azole synthetase) from the lactazole biosynthetic pathway. We demonstrated that masked language modeling of LazBF substrate preferences produced language model embeddings that improved downstream prediction of both LazBF and LazDEF substrates. Similarly, masked language modeling of LazDEF substrate preferences produced embeddings that improved prediction of both LazBF and LazDEF substrates. Our results suggest that the models learned functional forms that are transferable between distinct enzymatic transformations that act within the same biosynthetic pathway. We found that a single high-quality data set of substrates and non-substrates for a RiPP biosynthetic enzyme improved substrate prediction for distinct enzymes in data-scarce scenarios. We then fine-tuned models on each data set and showed that the fine-tuned models provided interpretable insight that we anticipate will facilitate the design of substrate libraries that are compatible with desired RiPP biosynthetic pathways.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" ","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622008/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brittany C. Haas, Melissa A. Hardy, Shree Sowndarya S. V., Keir Adams, Connor W. Coley, Robert S. Paton and Matthew S. Sigman
Data-driven reaction discovery and development is a growing field that relies on the use of molecular descriptors to capture key information about substrates, ligands, and targets. Broad adaptation of this strategy is hindered by the associated computational cost of descriptor calculation, especially when considering conformational flexibility. Descriptor libraries can be precomputed agnostic of application to reduce the computational burden of data-driven reaction development. However, as one often applies these models to evaluate novel hypothetical structures, it would be ideal to predict the descriptors of compounds on-the-fly. Herein, we report DFT-level descriptor libraries for conformational ensembles of 8528 carboxylic acids and 8172 alkyl amines towards this goal. Employing 2D and 3D graph neural network architectures trained on these libraries culminated in the development of predictive models for molecule-level descriptors, as well as the bond- and atom-level descriptors for the conserved reactive site (carboxylic acid or amine). The predictions were confirmed to be robust for an external validation set of medicinally-relevant carboxylic acids and alkyl amines. Additionally, a retrospective study correlating the rate of amide coupling reactions demonstrated the suitability of the predicted DFT-level descriptors for downstream applications. Ultimately, these models enable high-fidelity predictions for a vast number of potential substrates, greatly increasing accessibility to the field of data-driven reaction development.
{"title":"Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines†","authors":"Brittany C. Haas, Melissa A. Hardy, Shree Sowndarya S. V., Keir Adams, Connor W. Coley, Robert S. Paton and Matthew S. Sigman","doi":"10.1039/D4DD00284A","DOIUrl":"10.1039/D4DD00284A","url":null,"abstract":"<p >Data-driven reaction discovery and development is a growing field that relies on the use of molecular descriptors to capture key information about substrates, ligands, and targets. Broad adaptation of this strategy is hindered by the associated computational cost of descriptor calculation, especially when considering conformational flexibility. Descriptor libraries can be precomputed agnostic of application to reduce the computational burden of data-driven reaction development. However, as one often applies these models to evaluate novel hypothetical structures, it would be ideal to predict the descriptors of compounds on-the-fly. Herein, we report DFT-level descriptor libraries for conformational ensembles of 8528 carboxylic acids and 8172 alkyl amines towards this goal. Employing 2D and 3D graph neural network architectures trained on these libraries culminated in the development of predictive models for molecule-level descriptors, as well as the bond- and atom-level descriptors for the conserved reactive site (carboxylic acid or amine). The predictions were confirmed to be robust for an external validation set of medicinally-relevant carboxylic acids and alkyl amines. Additionally, a retrospective study correlating the rate of amide coupling reactions demonstrated the suitability of the predicted DFT-level descriptors for downstream applications. Ultimately, these models enable high-fidelity predictions for a vast number of potential substrates, greatly increasing accessibility to the field of data-driven reaction development.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 222-233"},"PeriodicalIF":6.2,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11626426/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiajun Zhou, Yijie Yang, Austin M. Mroz and Kim E. Jelfs
Polymers play a crucial role in a wide array of applications due to their diverse and tunable properties. Establishing the relationship between polymer representations and their properties is crucial to the computational design and screening of potential polymers via machine learning. The quality of the representation significantly influences the effectiveness of these computational methods. Here, we present a self-supervised contrastive learning paradigm, PolyCL, for learning robust and high-quality polymer representation without the need for labels. Our model combines explicit and implicit augmentation strategies for improved learning performance. The results demonstrate that our model achieves either better, or highly competitive, performances on transfer learning tasks as a feature extractor without an overcomplicated training strategy or hyperparameter optimisation. Further enhancing the efficacy of our model, we conducted extensive analyses on various augmentation combinations used in contrastive learning. This led to identifying the most effective combination to maximise PolyCL's performance.
{"title":"PolyCL: contrastive learning for polymer representation learning via explicit and implicit augmentations†","authors":"Jiajun Zhou, Yijie Yang, Austin M. Mroz and Kim E. Jelfs","doi":"10.1039/D4DD00236A","DOIUrl":"10.1039/D4DD00236A","url":null,"abstract":"<p >Polymers play a crucial role in a wide array of applications due to their diverse and tunable properties. Establishing the relationship between polymer representations and their properties is crucial to the computational design and screening of potential polymers <em>via</em> machine learning. The quality of the representation significantly influences the effectiveness of these computational methods. Here, we present a self-supervised contrastive learning paradigm, PolyCL, for learning robust and high-quality polymer representation without the need for labels. Our model combines explicit and implicit augmentation strategies for improved learning performance. The results demonstrate that our model achieves either better, or highly competitive, performances on transfer learning tasks as a feature extractor without an overcomplicated training strategy or hyperparameter optimisation. Further enhancing the efficacy of our model, we conducted extensive analyses on various augmentation combinations used in contrastive learning. This led to identifying the most effective combination to maximise PolyCL's performance.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 149-160"},"PeriodicalIF":6.2,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11616009/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular dynamics simulations offer detailed insights into atomic motions but face timescale limitations. Enhanced sampling methods have addressed these challenges but even with machine learning, they often rely on pre-selected expert-based features. In this work, we present a Graph Neural Network-State Predictive Information Bottleneck (GNN-SPIB) framework, which combines graph neural networks and the state predictive information bottleneck to automatically learn low-dimensional representations directly from atomic coordinates. Tested on three benchmark systems, our approach predicts essential structural, thermodynamic and kinetic information for slow processes, demonstrating robustness across diverse systems. The method shows promise for complex systems, enabling effective enhanced sampling without requiring pre-defined reaction coordinates or input features.
{"title":"A graph neural network-state predictive information bottleneck (GNN-SPIB) approach for learning molecular thermodynamics and kinetics†","authors":"Ziyue Zou, Dedi Wang and Pratyush Tiwary","doi":"10.1039/D4DD00315B","DOIUrl":"https://doi.org/10.1039/D4DD00315B","url":null,"abstract":"<p >Molecular dynamics simulations offer detailed insights into atomic motions but face timescale limitations. Enhanced sampling methods have addressed these challenges but even with machine learning, they often rely on pre-selected expert-based features. In this work, we present a Graph Neural Network-State Predictive Information Bottleneck (GNN-SPIB) framework, which combines graph neural networks and the state predictive information bottleneck to automatically learn low-dimensional representations directly from atomic coordinates. Tested on three benchmark systems, our approach predicts essential structural, thermodynamic and kinetic information for slow processes, demonstrating robustness across diverse systems. The method shows promise for complex systems, enabling effective enhanced sampling without requiring pre-defined reaction coordinates or input features.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 211-221"},"PeriodicalIF":6.2,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00315b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Polymer informatics, which involves applying data-driven science to polymers, has attracted considerable research interest. However, developing adequate descriptors for polymers, particularly copolymers, to facilitate machine learning (ML) models with limited datasets remains a challenge. To address this issue, we computed sets of parameters, including reaction energies and activation barriers of elementary reactions in the early stage of radical polymerization, for 2500 radical–monomer pairs derived from 50 commercially available monomers and constructed an open database named “Copolymer Descriptor Database”. Furthermore, we built ML models using our descriptors as explanatory variables and physical properties such as the reactivity ratio, monomer conversion, monomer composition ratio, and molecular weight as objective variables. These models achieved high predictive accuracy, demonstrating the potential of our descriptors to advance the field of polymer informatics.
{"title":"CopDDB: a descriptor database for copolymers and its applications to machine learning†","authors":"Takayoshi Yoshimura, Hiromoto Kato, Shunto Oikawa, Taichi Inagaki, Shigehito Asano, Tetsunori Sugawara, Tomoyuki Miyao, Takamitsu Matsubara, Hiroharu Ajiro, Mikiya Fujii, Yu-ya Ohnishi and Miho Hatanaka","doi":"10.1039/D4DD00266K","DOIUrl":"https://doi.org/10.1039/D4DD00266K","url":null,"abstract":"<p >Polymer informatics, which involves applying data-driven science to polymers, has attracted considerable research interest. However, developing adequate descriptors for polymers, particularly copolymers, to facilitate machine learning (ML) models with limited datasets remains a challenge. To address this issue, we computed sets of parameters, including reaction energies and activation barriers of elementary reactions in the early stage of radical polymerization, for 2500 radical–monomer pairs derived from 50 commercially available monomers and constructed an open database named “Copolymer Descriptor Database”. Furthermore, we built ML models using our descriptors as explanatory variables and physical properties such as the reactivity ratio, monomer conversion, monomer composition ratio, and molecular weight as objective variables. These models achieved high predictive accuracy, demonstrating the potential of our descriptors to advance the field of polymer informatics.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 195-203"},"PeriodicalIF":6.2,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00266k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zihe Li, Mengke Li, Yufeng Luo, Haibin Cao, Huijun Liu and Ying Fang
Efficient evaluation of lattice thermal conductivity (κL) is critical for applications ranging from thermal management to energy conversion. In this work, we propose a neural network (NN) model that allows ready and accurate prediction of the κL of crystalline materials at arbitrary temperature. It is found that the data-driven model exhibits a high coefficient of determination between the real and predicted κL. Beyond the initial dataset, the strong predictive power of the NN model is further demonstrated by checking several systems randomly selected from previous first-principles studies. Most importantly, our model can realize high-throughput screening on countless systems either inside or beyond the existing databases, which is very beneficial for accelerated discovery or design of new materials with desired κL.
{"title":"Machine learning for accelerated prediction of lattice thermal conductivity at arbitrary temperature","authors":"Zihe Li, Mengke Li, Yufeng Luo, Haibin Cao, Huijun Liu and Ying Fang","doi":"10.1039/D4DD00286E","DOIUrl":"https://doi.org/10.1039/D4DD00286E","url":null,"abstract":"<p >Efficient evaluation of lattice thermal conductivity (<em>κ</em><small><sub>L</sub></small>) is critical for applications ranging from thermal management to energy conversion. In this work, we propose a neural network (NN) model that allows ready and accurate prediction of the <em>κ</em><small><sub>L</sub></small> of crystalline materials at arbitrary temperature. It is found that the data-driven model exhibits a high coefficient of determination between the real and predicted <em>κ</em><small><sub>L</sub></small>. Beyond the initial dataset, the strong predictive power of the NN model is further demonstrated by checking several systems randomly selected from previous first-principles studies. Most importantly, our model can realize high-throughput screening on countless systems either inside or beyond the existing databases, which is very beneficial for accelerated discovery or design of new materials with desired <em>κ</em><small><sub>L</sub></small>.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 204-210"},"PeriodicalIF":6.2,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00286e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}