Sanghoon Lee, Kevin Cruse, Samuel P. Gleason, A. Paul Alivisatos, Gerbrand Ceder and Anubhav Jain
Gold nanoparticles (AuNPs) are widely used functional nanomaterials that exhibit adjustable properties depending on their shapes and sizes. Creating a comprehensive dataset of AuNP syntheses is useful for understanding how to control their morphology and size. Here, we employed search-based algorithms and fine-tuned the Llama-2 large language model to extract 492 multi-sourced seed-mediated AuNP synthesis recipes from the literature. With this dataset which we share online, we verified that the type of seed capping agent such as CTAB or citrate plays a crucial role in determining the morphology of the AuNPs, aligning with established findings in the field. We also observe a weak correlation between the final AuNR aspect ratio and silver concentration, although a large variance reduces the significance of this relationship. Overall, our work demonstrates the value of literature-based datasets in advancing knowledge in the field of nanomaterial synthesis for further exploration and better reproducibility.
{"title":"Data-driven analysis of text-mined seed-mediated syntheses of gold nanoparticles†","authors":"Sanghoon Lee, Kevin Cruse, Samuel P. Gleason, A. Paul Alivisatos, Gerbrand Ceder and Anubhav Jain","doi":"10.1039/D4DD00158C","DOIUrl":"https://doi.org/10.1039/D4DD00158C","url":null,"abstract":"<p >Gold nanoparticles (AuNPs) are widely used functional nanomaterials that exhibit adjustable properties depending on their shapes and sizes. Creating a comprehensive dataset of AuNP syntheses is useful for understanding how to control their morphology and size. Here, we employed search-based algorithms and fine-tuned the Llama-2 large language model to extract 492 multi-sourced seed-mediated AuNP synthesis recipes from the literature. With this dataset which we share online, we verified that the type of seed capping agent such as CTAB or citrate plays a crucial role in determining the morphology of the AuNPs, aligning with established findings in the field. We also observe a weak correlation between the final AuNR aspect ratio and silver concentration, although a large variance reduces the significance of this relationship. Overall, our work demonstrates the value of literature-based datasets in advancing knowledge in the field of nanomaterial synthesis for further exploration and better reproducibility.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 93-104"},"PeriodicalIF":6.2,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00158c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142994104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frédéric Beaupré, Anthony Bilodeau, Theresa Wiesner, Gabriel Leclerc, Mado Lemieux, Gabriel Nadeau, Katrine Castonguay, Bolin Fan, Simon Labrecque, Renée Hložek, Paul De Koninck, Christian Gagné and Flavie Lavoie-Cardinal
Ca2+ imaging methods are widely used for studying cellular activity in the brain, allowing detailed analysis of dynamic processes across various scales. Enhanced by high-contrast optical microscopy and fluorescent Ca2+ sensors, this technique can be used to reveal localized Ca2+ fluctuations within neurons, including in sub-cellular structures, such as the dendritic shaft or spines. Despite advances in Ca2+ sensors, the analysis of miniature Synaptic Calcium Transients (mSCTs), characterized by variability in morphology and low signal-to-noise ratios, remains challenging. Traditional threshold-based methods struggle with the detection and segmentation of these small, dynamic events. Deep learning (DL) approaches offer promising solutions but are limited by the need for large annotated datasets. Positive Unlabeled (PU) learning addresses this limitation by leveraging unlabeled instances to increase dataset size and enhance performance. This approach is particularly useful in the case of mSCTs that are scarce and small, associated with a very small proportion of the foreground pixels. PU learning significantly increases the effective size of the training dataset, improving model performance. Here, we present a PU learning-based strategy for detecting and segmenting mSCTs in cultured rat hippocampal neurons. We evaluate the performance of two 3D deep learning models, StarDist-3D and 3D U-Net, which are well established for the segmentation of small volumetric structures in microscopy datasets. By integrating PU learning, we enhance the 3D U-Net's performance, demonstrating significant gains over traditional methods. This work pioneers the application of PU learning in Ca2+ imaging analysis, offering a robust framework for mSCT detection and segmentation. We also demonstrate how this quantitative analysis pipeline can be used for subsequent mSCTs feature analysis. We characterize morphological and kinetic changes of mSCTs associated with the application of chemical long-term potentiation (cLTP) stimulation in cultured rat hippocampal neurons. Our data-driven approach shows that a cLTP-inducing stimulus leads to the emergence of new active dendritic regions and differently affects mSCTs subtypes.
{"title":"Quantitative analysis of miniature synaptic calcium transients using positive unlabeled deep learning†","authors":"Frédéric Beaupré, Anthony Bilodeau, Theresa Wiesner, Gabriel Leclerc, Mado Lemieux, Gabriel Nadeau, Katrine Castonguay, Bolin Fan, Simon Labrecque, Renée Hložek, Paul De Koninck, Christian Gagné and Flavie Lavoie-Cardinal","doi":"10.1039/D4DD00197D","DOIUrl":"https://doi.org/10.1039/D4DD00197D","url":null,"abstract":"<p >Ca<small><sup>2+</sup></small> imaging methods are widely used for studying cellular activity in the brain, allowing detailed analysis of dynamic processes across various scales. Enhanced by high-contrast optical microscopy and fluorescent Ca<small><sup>2+</sup></small> sensors, this technique can be used to reveal localized Ca<small><sup>2+</sup></small> fluctuations within neurons, including in sub-cellular structures, such as the dendritic shaft or spines. Despite advances in Ca<small><sup>2+</sup></small> sensors, the analysis of miniature Synaptic Calcium Transients (mSCTs), characterized by variability in morphology and low signal-to-noise ratios, remains challenging. Traditional threshold-based methods struggle with the detection and segmentation of these small, dynamic events. Deep learning (DL) approaches offer promising solutions but are limited by the need for large annotated datasets. Positive Unlabeled (PU) learning addresses this limitation by leveraging unlabeled instances to increase dataset size and enhance performance. This approach is particularly useful in the case of mSCTs that are scarce and small, associated with a very small proportion of the foreground pixels. PU learning significantly increases the effective size of the training dataset, improving model performance. Here, we present a PU learning-based strategy for detecting and segmenting mSCTs in cultured rat hippocampal neurons. We evaluate the performance of two 3D deep learning models, StarDist-3D and 3D U-Net, which are well established for the segmentation of small volumetric structures in microscopy datasets. By integrating PU learning, we enhance the 3D U-Net's performance, demonstrating significant gains over traditional methods. This work pioneers the application of PU learning in Ca<small><sup>2+</sup></small> imaging analysis, offering a robust framework for mSCT detection and segmentation. We also demonstrate how this quantitative analysis pipeline can be used for subsequent mSCTs feature analysis. We characterize morphological and kinetic changes of mSCTs associated with the application of chemical long-term potentiation (cLTP) stimulation in cultured rat hippocampal neurons. Our data-driven approach shows that a cLTP-inducing stimulus leads to the emergence of new active dendritic regions and differently affects mSCTs subtypes.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 105-119"},"PeriodicalIF":6.2,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00197d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With advancements in computational molecular modeling and powerful structure search methods, it is now possible to systematically screen crystal structures for small organic molecules. In this context, we introduce the Python package High-Throughput Organic Crystal Structure Prediction (HTOCSP), which enables the prediction and screening of crystal packing for small organic molecules in an automated, high-throughput manner. Specifically, we describe the workflow, which encompasses molecular analysis, force field generation, and crystal generation and sampling, all within customized constraints based on user input. We demonstrate the application of HTOCSP by systematically screening organic crystals for 100 molecules using different sampling strategies and force field options. Furthermore, we analyze the benchmark results to understand the underlying factors that may influence the complexity of the crystal energy landscape. Finally, we discuss the current limitations of the package and potential future extensions.
{"title":"Automated high-throughput organic crystal structure prediction via population-based sampling","authors":"Qiang Zhu and Shinnosuke Hattori","doi":"10.1039/D4DD00264D","DOIUrl":"https://doi.org/10.1039/D4DD00264D","url":null,"abstract":"<p >With advancements in computational molecular modeling and powerful structure search methods, it is now possible to systematically screen crystal structures for small organic molecules. In this context, we introduce the Python package High-Throughput Organic Crystal Structure Prediction (HTOCSP), which enables the prediction and screening of crystal packing for small organic molecules in an automated, high-throughput manner. Specifically, we describe the workflow, which encompasses molecular analysis, force field generation, and crystal generation and sampling, all within customized constraints based on user input. We demonstrate the application of HTOCSP by systematically screening organic crystals for 100 molecules using different sampling strategies and force field options. Furthermore, we analyze the benchmark results to understand the underlying factors that may influence the complexity of the crystal energy landscape. Finally, we discuss the current limitations of the package and potential future extensions.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 120-134"},"PeriodicalIF":6.2,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00264d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The accurate determination of a molecule's accessible conformations is key to the success of studying its properties. Traditional computational methods for exploring the conformational space of molecules such as molecular dynamics simulations, however, require substantial computational resources and time. Recently, deep generative models have made significant progress in various fields, harnessing their powerful learning capabilities for complex data distributions. This makes them highly applicable in molecular conformation generation. In this study, we developed ConfGAN, a conformation generation model based on conditional generative adversarial networks. We designed an efficient molecular-motif graph representation, treating molecules composed of functional groups, capturing interactions between groups, and providing rich chemical prior knowledge for conformation generation. During adversarial training, the generator network takes molecular graphs as input and attempts to generate stable conformations with minimal potential energy. The discriminator provides feedback based on energy differences, guiding the generation of conformations that comply with chemical rules. This model explicitly encodes molecular knowledge, ensuring the physical plausibility of generated conformations. Through extensive evaluation, ConfGAN has demonstrated superior performance compared to existing deep learning-based models. Furthermore, conformations generated by ConfGAN have demonstrated potential applications in related fields such as molecular docking and electronic property calculations.
{"title":"Generation of molecular conformations using generative adversarial neural networks†","authors":"Congsheng Xu, Xiaomei Deng, Yi Lu and Peiyuan Yu","doi":"10.1039/D4DD00179F","DOIUrl":"https://doi.org/10.1039/D4DD00179F","url":null,"abstract":"<p >The accurate determination of a molecule's accessible conformations is key to the success of studying its properties. Traditional computational methods for exploring the conformational space of molecules such as molecular dynamics simulations, however, require substantial computational resources and time. Recently, deep generative models have made significant progress in various fields, harnessing their powerful learning capabilities for complex data distributions. This makes them highly applicable in molecular conformation generation. In this study, we developed ConfGAN, a conformation generation model based on conditional generative adversarial networks. We designed an efficient molecular-motif graph representation, treating molecules composed of functional groups, capturing interactions between groups, and providing rich chemical prior knowledge for conformation generation. During adversarial training, the generator network takes molecular graphs as input and attempts to generate stable conformations with minimal potential energy. The discriminator provides feedback based on energy differences, guiding the generation of conformations that comply with chemical rules. This model explicitly encodes molecular knowledge, ensuring the physical plausibility of generated conformations. Through extensive evaluation, ConfGAN has demonstrated superior performance compared to existing deep learning-based models. Furthermore, conformations generated by ConfGAN have demonstrated potential applications in related fields such as molecular docking and electronic property calculations.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 161-171"},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00179f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sherif Abdulkader Tawfik, Tri Minh Nguyen, Salvy P. Russo, Truyen Tran, Sunil Gupta and Svetha Venkatesh
At the heart of the flourishing field of machine learning potentials are graph neural networks, where deep learning is interwoven with physics-informed machine learning (PIML) architectures. Various PIML models, upon training with density functional theory (DFT) material structure–property datasets, have achieved unprecedented prediction accuracy for a range of molecular and material properties. A critical component in the learned graph representation of crystal structures in PIMLs is how the various fragments of the structure's graph are embedded in a neural network. Several of the state-of-art PIML models apply spherical harmonic functions. Such functions are based on the assumption that DFT computes the Coulomb potential of atom–atom interactions. However, DFT does not directly compute such potentials, but integrates the electron–atom potentials. We introduce the direct integration of the external potential (DIEP) methods which more faithfully reflects that actual computational workflow in DFT. DIEP integrates the external (electron–atom) potential and uses these quantities to embed the structure graph into a deep learning model. We demonstrate the enhanced accuracy of the DIEP model in predicting the energies of pristine and defective materials. By training DIEP to predict the potential energy surface, we show the ability of the model in predicting the onset of fracture of pristine and defective carbon nanotubes.
{"title":"Embedding material graphs using the electron-ion potential: application to material fracture†","authors":"Sherif Abdulkader Tawfik, Tri Minh Nguyen, Salvy P. Russo, Truyen Tran, Sunil Gupta and Svetha Venkatesh","doi":"10.1039/D4DD00246F","DOIUrl":"https://doi.org/10.1039/D4DD00246F","url":null,"abstract":"<p >At the heart of the flourishing field of machine learning potentials are graph neural networks, where deep learning is interwoven with physics-informed machine learning (PIML) architectures. Various PIML models, upon training with density functional theory (DFT) material structure–property datasets, have achieved unprecedented prediction accuracy for a range of molecular and material properties. A critical component in the learned graph representation of crystal structures in PIMLs is how the various fragments of the structure's graph are embedded in a neural network. Several of the state-of-art PIML models apply spherical harmonic functions. Such functions are based on the assumption that DFT computes the Coulomb potential of atom–atom interactions. However, DFT does not directly compute such potentials, but integrates the electron–atom potentials. We introduce the direct integration of the external potential (DIEP) methods which more faithfully reflects that actual computational workflow in DFT. DIEP integrates the external (electron–atom) potential and uses these quantities to embed the structure graph into a deep learning model. We demonstrate the enhanced accuracy of the DIEP model in predicting the energies of pristine and defective materials. By training DIEP to predict the potential energy surface, we show the ability of the model in predicting the onset of fracture of pristine and defective carbon nanotubes.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2618-2627"},"PeriodicalIF":6.2,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00246f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142778040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jean-Charles Cousty, Tanguy Cavagna, Alec Schmidt, Edy Mariano, Keyan Villat, Florian de Nanteuil and Pascal Miéville
This paper presents GLAS (Git-based Lab Automated Scheduler or Get Lab Automation Simplified), an open-source, robust, and highly expandable Git-based architecture designed for laboratory automation. GLAS can be deployed in both partially and fully automated experimental science laboratories, enabling the development of a multi-layer scheduling system while maintaining a systematic architecture grounded in a Git repository. We demonstrate the applicability of GLAS through case studies from the Swiss Cat+ automated chemistry laboratory, showcasing its versatility and potential for widespread applicability in various laboratory automation contexts. By offering an open-source scheduling environment, our aim is to foster the development of accessible and adaptable laboratory automation solutions within the scientific community.
{"title":"GLAS: an open-source easily expandable Git-based scheduling architecture for integral lab automation†","authors":"Jean-Charles Cousty, Tanguy Cavagna, Alec Schmidt, Edy Mariano, Keyan Villat, Florian de Nanteuil and Pascal Miéville","doi":"10.1039/D4DD00253A","DOIUrl":"https://doi.org/10.1039/D4DD00253A","url":null,"abstract":"<p >This paper presents GLAS (Git-based Lab Automated Scheduler or Get Lab Automation Simplified), an open-source, robust, and highly expandable Git-based architecture designed for laboratory automation. GLAS can be deployed in both partially and fully automated experimental science laboratories, enabling the development of a multi-layer scheduling system while maintaining a systematic architecture grounded in a Git repository. We demonstrate the applicability of GLAS through case studies from the Swiss Cat+ automated chemistry laboratory, showcasing its versatility and potential for widespread applicability in various laboratory automation contexts. By offering an open-source scheduling environment, our aim is to foster the development of accessible and adaptable laboratory automation solutions within the scientific community.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2434-2447"},"PeriodicalIF":6.2,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00253a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Experimentally validated routes to synthetic compounds can be compared to each other by quantitative metrics (step count, yield, atom economy), or by qualitative assessments (strategy, novelty). AI-predicted routes are typically compared to experimental syntheses to check for an exact match among the top-ranked predictions (top-N accuracy). This method is ideal for the evaluation of retrosynthetic algorithms on large datasets (>106 routes), but it cannot assess a degree of similarity between routes, which would be desirable for small datasets (<102 routes). Here, we present a simple method to calculate a similarity score between any two synthetic routes to a given molecule. The score is based on two concepts: which bonds are formed during the synthesis; and how the atoms of the final compound are grouped together throughout the synthesis. As a result, the similarity score overlaps well with chemists' intuition and provides a finer assessment of prediction accuracy.
{"title":"A simple similarity metric for comparing synthetic routes†","authors":"Samuel Genheden and Jason D. Shields","doi":"10.1039/D4DD00292J","DOIUrl":"https://doi.org/10.1039/D4DD00292J","url":null,"abstract":"<p >Experimentally validated routes to synthetic compounds can be compared to each other by quantitative metrics (step count, yield, atom economy), or by qualitative assessments (strategy, novelty). AI-predicted routes are typically compared to experimental syntheses to check for an exact match among the top-ranked predictions (top-<em>N</em> accuracy). This method is ideal for the evaluation of retrosynthetic algorithms on large datasets (>10<small><sup>6</sup></small> routes), but it cannot assess a degree of similarity between routes, which would be desirable for small datasets (<10<small><sup>2</sup></small> routes). Here, we present a simple method to calculate a similarity score between any two synthetic routes to a given molecule. The score is based on two concepts: which bonds are formed during the synthesis; and how the atoms of the final compound are grouped together throughout the synthesis. As a result, the similarity score overlaps well with chemists' intuition and provides a finer assessment of prediction accuracy.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 46-53"},"PeriodicalIF":6.2,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00292j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefano Racioppi, Alberto Otero-de-la-Roza, Samad Hajinazar and Eva Zurek
Experimentally obtained powder X-ray diffraction (PXRD) patterns can be difficult to solve, precluding the full characterization of materials, pharmaceuticals, and geological compounds. Herein, we propose a method based upon a multi-objective evolutionary search that uses both a structure's enthalpy and similarity to a reference PXRD pattern (constituted by a list of peak positions and their intensities) to facilitate structure solution of inorganic systems. Because the similarity index is computed for locally optimized cells that are subsequently distorted to find the best match with the reference, this process transcends both computational (e.g., choice of theoretical method, and 0 K approximation) and experimental (e.g., external stimuli, and metastability) limitations. We illustrate how the proposed methodology can be employed to successfully uncover complex crystal structures by applying it to a range of test cases, including inorganic minerals, elements ramp-compressed to extreme conditions, and molecular crystals. The results demonstrate that our approach not only improves the accuracy of structure prediction, but also significantly reduces the time required to achieve reliable solutions, thus providing a powerful tool for the advancement of materials science and related fields.
{"title":"Powder X-ray diffraction assisted evolutionary algorithm for crystal structure prediction†","authors":"Stefano Racioppi, Alberto Otero-de-la-Roza, Samad Hajinazar and Eva Zurek","doi":"10.1039/D4DD00269E","DOIUrl":"https://doi.org/10.1039/D4DD00269E","url":null,"abstract":"<p >Experimentally obtained powder X-ray diffraction (PXRD) patterns can be difficult to solve, precluding the full characterization of materials, pharmaceuticals, and geological compounds. Herein, we propose a method based upon a multi-objective evolutionary search that uses both a structure's enthalpy and similarity to a reference PXRD pattern (constituted by a list of peak positions and their intensities) to facilitate structure solution of inorganic systems. Because the similarity index is computed for locally optimized cells that are subsequently distorted to find the best match with the reference, this process transcends both computational (<em>e.g.</em>, choice of theoretical method, and 0 K approximation) and experimental (<em>e.g.</em>, external stimuli, and metastability) limitations. We illustrate how the proposed methodology can be employed to successfully uncover complex crystal structures by applying it to a range of test cases, including inorganic minerals, elements ramp-compressed to extreme conditions, and molecular crystals. The results demonstrate that our approach not only improves the accuracy of structure prediction, but also significantly reduces the time required to achieve reliable solutions, thus providing a powerful tool for the advancement of materials science and related fields.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 73-83"},"PeriodicalIF":6.2,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00269e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142994102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jin Da Tan, Andre K. Y. Low, Shannon Thoi Rui Ying, Sze Yu Tan, Wenguang Zhao, Yee-Fun Lim, Qianxiao Li, Saif A. Khan, Balamurugan Ramalingam and Kedar Hippalgaonkar
The properties of polymers are primarily influenced by their monomer constituents, functional groups, and their mode of linkages. Copolymers, synthesized from multiple monomers, offer unique material properties compared to their homopolymers. Optimizing the synthesis of terpolymers is a complex and labor-intensive task due to variations in monomer reactivity and their compositional shifts throughout the polymerization process. The present work focuses on synthesizing a new terpolymer from styrene, myrcene, and dibutyl itaconate (DBI) monomers with the goal of achieving a high glass transition temperature (Tg) in the resulting terpolymer. While the copolymerization of pairwise combinations of styrene, myrcene, and DBI have been previously investigated, the terpolymerization of all three at once remains unexplored. Terpolymers with monomers like styrene would provide high glass transition temperatures as the resultant polymers exhibit a rigid glassy state at ambient temperatures. Conversely, minimizing styrene incorporation also reduces reliance on petrochemical-derived monomer sources for terpolymer synthesis, thus enhancing the sustainability of terpolymer usage. To balance the objectives of maximizing Tg while minimizing styrene incorporation, we employ multi-objective Bayesian optimization to efficiently sample in a design space comprising 5 experimental parameters. We perform two iterations of optimization for a total of 89 terpolymers, reporting terpolymers with a Tg above ambient temperature while retaining less than 50% styrene incorporation. This underscores the potential for exploring and utilizing renewable monomers such as myrcene and DBI, to foster sustainability in polymer synthesis. Additionally, the dataset enables the calculation of ternary reactivity ratios using a system of ordinary differential equations based on the terminal model, providing valuable insights into the reactivity of monomers in complex ternary systems compared to binary copolymer systems. This approach reveals the nuanced kinetics of terpolymerization, further informing the synthesis of polymers with desired properties.
{"title":"Multi-objective synthesis optimization and kinetics of a sustainable terpolymer†","authors":"Jin Da Tan, Andre K. Y. Low, Shannon Thoi Rui Ying, Sze Yu Tan, Wenguang Zhao, Yee-Fun Lim, Qianxiao Li, Saif A. Khan, Balamurugan Ramalingam and Kedar Hippalgaonkar","doi":"10.1039/D4DD00233D","DOIUrl":"https://doi.org/10.1039/D4DD00233D","url":null,"abstract":"<p >The properties of polymers are primarily influenced by their monomer constituents, functional groups, and their mode of linkages. Copolymers, synthesized from multiple monomers, offer unique material properties compared to their homopolymers. Optimizing the synthesis of terpolymers is a complex and labor-intensive task due to variations in monomer reactivity and their compositional shifts throughout the polymerization process. The present work focuses on synthesizing a new terpolymer from styrene, myrcene, and dibutyl itaconate (DBI) monomers with the goal of achieving a high glass transition temperature (<em>T</em><small><sub>g</sub></small>) in the resulting terpolymer. While the copolymerization of pairwise combinations of styrene, myrcene, and DBI have been previously investigated, the terpolymerization of all three at once remains unexplored. Terpolymers with monomers like styrene would provide high glass transition temperatures as the resultant polymers exhibit a rigid glassy state at ambient temperatures. Conversely, minimizing styrene incorporation also reduces reliance on petrochemical-derived monomer sources for terpolymer synthesis, thus enhancing the sustainability of terpolymer usage. To balance the objectives of maximizing <em>T</em><small><sub>g</sub></small> while minimizing styrene incorporation, we employ multi-objective Bayesian optimization to efficiently sample in a design space comprising 5 experimental parameters. We perform two iterations of optimization for a total of 89 terpolymers, reporting terpolymers with a <em>T</em><small><sub>g</sub></small> above ambient temperature while retaining less than 50% styrene incorporation. This underscores the potential for exploring and utilizing renewable monomers such as myrcene and DBI, to foster sustainability in polymer synthesis. Additionally, the dataset enables the calculation of ternary reactivity ratios using a system of ordinary differential equations based on the terminal model, providing valuable insights into the reactivity of monomers in complex ternary systems compared to binary copolymer systems. This approach reveals the nuanced kinetics of terpolymerization, further informing the synthesis of polymers with desired properties.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2628-2636"},"PeriodicalIF":6.2,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00233d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142778041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Last year, a preprint gained notoriety, proposing that a k-nearest neighbour classifier is able to outperform large-language models using compressed text as input and normalised compression distance (NCD) as a metric. In chemistry and biochemistry, molecules are often represented as strings, such as SMILES for small molecules or single-letter amino acid sequences for proteins. Here, we extend the previously introduced approach with support for regression and multitask classification and subsequently apply it to the prediction of molecular properties and protein–ligand binding affinities. We further propose converting numerical descriptors into string representations, enabling the integration of text input with domain-informed numerical descriptors. Finally, we show that the method can achieve performance competitive with chemical fingerprint- and GNN-based methodologies in general, and perform better than comparable methods on quantum chemistry and protein–ligand binding affinity prediction tasks.
{"title":"Learning on compressed molecular representations","authors":"Jan Weinreich and Daniel Probst","doi":"10.1039/D4DD00162A","DOIUrl":"https://doi.org/10.1039/D4DD00162A","url":null,"abstract":"<p >Last year, a preprint gained notoriety, proposing that a <em>k</em>-nearest neighbour classifier is able to outperform large-language models using compressed text as input and normalised compression distance (NCD) as a metric. In chemistry and biochemistry, molecules are often represented as strings, such as SMILES for small molecules or single-letter amino acid sequences for proteins. Here, we extend the previously introduced approach with support for regression and multitask classification and subsequently apply it to the prediction of molecular properties and protein–ligand binding affinities. We further propose converting numerical descriptors into string representations, enabling the integration of text input with domain-informed numerical descriptors. Finally, we show that the method can achieve performance competitive with chemical fingerprint- and GNN-based methodologies in general, and perform better than comparable methods on quantum chemistry and protein–ligand binding affinity prediction tasks.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 84-92"},"PeriodicalIF":6.2,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00162a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142994103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}