Yusaku Nakajima, Kai Kawasaki, Yasuo Takeichi, Masashi Hamaya, Yoshitaka Ushiku and Kanta Ono
We demonstrate a novel mechanochemical synthesis method using a robotic powder grinding system that applies a precisely controlled and constant mechanical force. This approach significantly enhances reproducibility and enables detailed analysis of reaction pathways. Our results indicate that robotic force control can alter the reaction rate and influence the reaction pathway, highlighting its potential for elucidating chemical reaction mechanisms and fostering the discovery of new chemical reactions. Despite its significance, the application of a controllable constant force in macroscale mechanochemical synthesis remains challenging. To address this gap, we compared the reproducibilities of various mechanochemical syntheses using conventional manual grinding, ball milling, and our novel robotic approach with perovskite materials. Our findings indicate that the robotic approach provides significantly higher reproducibility than conventional methods, facilitating the analysis of reaction pathways. By manipulating the grinding force and speed, we revealed that robotic force control can alter both the reaction rate and pathway. Consequently, robotic mechanochemical synthesis has significant potential for advancing the understanding of chemical reaction mechanisms and discovering new reactions.
{"title":"Force-controlled robotic mechanochemical synthesis†","authors":"Yusaku Nakajima, Kai Kawasaki, Yasuo Takeichi, Masashi Hamaya, Yoshitaka Ushiku and Kanta Ono","doi":"10.1039/D4DD00189C","DOIUrl":"10.1039/D4DD00189C","url":null,"abstract":"<p >We demonstrate a novel mechanochemical synthesis method using a robotic powder grinding system that applies a precisely controlled and constant mechanical force. This approach significantly enhances reproducibility and enables detailed analysis of reaction pathways. Our results indicate that robotic force control can alter the reaction rate and influence the reaction pathway, highlighting its potential for elucidating chemical reaction mechanisms and fostering the discovery of new chemical reactions. Despite its significance, the application of a controllable constant force in macroscale mechanochemical synthesis remains challenging. To address this gap, we compared the reproducibilities of various mechanochemical syntheses using conventional manual grinding, ball milling, and our novel robotic approach with perovskite materials. Our findings indicate that the robotic approach provides significantly higher reproducibility than conventional methods, facilitating the analysis of reaction pathways. By manipulating the grinding force and speed, we revealed that robotic force control can alter both the reaction rate and pathway. Consequently, robotic mechanochemical synthesis has significant potential for advancing the understanding of chemical reaction mechanisms and discovering new reactions.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2130-2136"},"PeriodicalIF":6.2,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00189c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aleksandar Kondinski, Pavlo Rutkevych, Laura Pascazio, Dan N. Tran, Feroz Farazi, Srishti Ganguly and Markus Kraft
Zeolites are complex and porous crystalline inorganic materials that serve as hosts for a variety of molecular, ionic and cluster species. Formal, machine-actionable representation of this chemistry presents a challenge as a variety of concepts need to be semantically interlinked. This work demonstrates the potential of knowledge engineering in overcoming this challenge. We develop ontologies OntoCrystal and OntoZeolite, enabling the representation and instantiation of crystalline zeolite information into a dynamic, interoperable knowledge graph called The World Avatar (TWA). In TWA, crystalline zeolite instances are semantically interconnected with chemical species that act as guests in these materials. Information can be obtained via custom or templated SPARQL queries administered through a user-friendly web interface. Unstructured exploration is facilitated through natural language processing using the Marie System, showcasing promise for the blended large language model – knowledge graph approach in providing accurate responses on zeolite chemistry in natural language.
沸石是一种复杂多孔的结晶无机材料,可作为各种分子、离子和团簇物种的宿主。由于各种概念需要在语义上相互关联,因此对这种化学性质进行正式的、机器可操作的表述是一项挑战。这项工作展示了知识工程在克服这一挑战方面的潜力。我们开发了本体论 OntoCrystal 和 OntoZeolite,使结晶沸石信息的表示和实例化成为一个动态、可互操作的知识图谱,称为 "世界阿凡达"(TWA)。在 TWA 中,结晶沸石实例与作为这些材料客体的化学物种在语义上相互关联。可通过用户友好的网络界面管理自定义或模板 SPARQL 查询来获取信息。通过使用玛丽系统进行自然语言处理,可以方便地进行非结构化探索,从而展示了混合大型语言模型-知识图谱方法在用自然语言提供沸石化学准确回复方面的前景。
{"title":"Knowledge graph representation of zeolitic crystalline materials†","authors":"Aleksandar Kondinski, Pavlo Rutkevych, Laura Pascazio, Dan N. Tran, Feroz Farazi, Srishti Ganguly and Markus Kraft","doi":"10.1039/D4DD00166D","DOIUrl":"10.1039/D4DD00166D","url":null,"abstract":"<p >Zeolites are complex and porous crystalline inorganic materials that serve as hosts for a variety of molecular, ionic and cluster species. Formal, machine-actionable representation of this chemistry presents a challenge as a variety of concepts need to be semantically interlinked. This work demonstrates the potential of knowledge engineering in overcoming this challenge. We develop ontologies OntoCrystal and OntoZeolite, enabling the representation and instantiation of crystalline zeolite information into a dynamic, interoperable knowledge graph called The World Avatar (TWA). In TWA, crystalline zeolite instances are semantically interconnected with chemical species that act as guests in these materials. Information can be obtained <em>via</em> custom or templated SPARQL queries administered through a user-friendly web interface. Unstructured exploration is facilitated through natural language processing using the Marie System, showcasing promise for the blended large language model – knowledge graph approach in providing accurate responses on zeolite chemistry in natural language.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2070-2084"},"PeriodicalIF":6.2,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00166d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Dalland, Linden Schrecker and King Kuok (Mimi) Hii
The ability and desire to collect kinetic data has greatly increased in recent years, requiring more automated and quantitative methods for analysis. In this work, an automated program (Auto-VTNA) is developed, to simplify the kinetic analysis workflow. Auto-VTNA allows all the reaction orders to be determined concurrently, expediting the process of kinetic analysis. Auto-VTNA performs well on noisy or sparse data sets and can handle complex reactions involving multiple reaction orders. Quantitative error analysis and facile visualisation allows users to numerically justify and robustly present their findings. Auto-VTNA can be used through a free graphical user interface (GUI), requiring no coding or expert kinetic model input from the user, and can be customised and built on if required.
{"title":"Auto-VTNA: an automatic VTNA platform for determination of global rate laws†‡","authors":"Daniel Dalland, Linden Schrecker and King Kuok (Mimi) Hii","doi":"10.1039/D4DD00111G","DOIUrl":"10.1039/D4DD00111G","url":null,"abstract":"<p >The ability and desire to collect kinetic data has greatly increased in recent years, requiring more automated and quantitative methods for analysis. In this work, an automated program (Auto-VTNA) is developed, to simplify the kinetic analysis workflow. Auto-VTNA allows all the reaction orders to be determined concurrently, expediting the process of kinetic analysis. Auto-VTNA performs well on noisy or sparse data sets and can handle complex reactions involving multiple reaction orders. Quantitative error analysis and facile visualisation allows users to numerically justify and robustly present their findings. Auto-VTNA can be used through a free graphical user interface (GUI), requiring no coding or expert kinetic model input from the user, and can be customised and built on if required.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2118-2129"},"PeriodicalIF":6.2,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00111g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Barakati, Hui Yuan, Amit Goyal and S. V. Kalinin
The rise of electron microscopy has expanded our ability to acquire nanometer and atomically resolved images of complex materials. The resulting vast datasets are typically analyzed by human operators, an intrinsically challenging process due to the multiple possible analysis steps and the corresponding need to build and optimize complex analysis workflows. We present a methodology based on the concept of a Reward Function coupled with Bayesian Optimization, to optimize image analysis workflows dynamically. The Reward Function is engineered to closely align with the experimental objectives and broader context and is quantifiable upon completion of the analysis. Here, cross-section, high-angle annular dark field (HAADF) images of ion-irradiated (Y, Dy)Ba2Cu3O7−δ thin-films were used as a model system. The reward functions were formed based on the expected materials density and atomic spacings and used to drive multi-objective optimization of the classical Laplacian-of-Gaussian (LoG) method. These results can be benchmarked against the DCNN segmentation. This optimized LoG* compares favorably against DCNN in the presence of the additional noise. We further extend the reward function approach towards the identification of partially-disordered regions, creating a physics-driven reward function and action space of high-dimensional clustering. We pose that with correct definition, the reward function approach allows real-time optimization of complex analysis workflows at much higher speeds and lower computational costs than classical DCNN-based inference, ensuring the attainment of results that are both precise and aligned with the human-defined objectives.
{"title":"Physics-based reward driven image analysis in microscopy","authors":"K. Barakati, Hui Yuan, Amit Goyal and S. V. Kalinin","doi":"10.1039/D4DD00132J","DOIUrl":"10.1039/D4DD00132J","url":null,"abstract":"<p >The rise of electron microscopy has expanded our ability to acquire nanometer and atomically resolved images of complex materials. The resulting vast datasets are typically analyzed by human operators, an intrinsically challenging process due to the multiple possible analysis steps and the corresponding need to build and optimize complex analysis workflows. We present a methodology based on the concept of a Reward Function coupled with Bayesian Optimization, to optimize image analysis workflows dynamically. The Reward Function is engineered to closely align with the experimental objectives and broader context and is quantifiable upon completion of the analysis. Here, cross-section, high-angle annular dark field (HAADF) images of ion-irradiated (Y, Dy)Ba<small><sub>2</sub></small>Cu<small><sub>3</sub></small>O<small><sub>7−<em>δ</em></sub></small> thin-films were used as a model system. The reward functions were formed based on the expected materials density and atomic spacings and used to drive multi-objective optimization of the classical Laplacian-of-Gaussian (LoG) method. These results can be benchmarked against the DCNN segmentation. This optimized LoG* compares favorably against DCNN in the presence of the additional noise. We further extend the reward function approach towards the identification of partially-disordered regions, creating a physics-driven reward function and action space of high-dimensional clustering. We pose that with correct definition, the reward function approach allows real-time optimization of complex analysis workflows at much higher speeds and lower computational costs than classical DCNN-based inference, ensuring the attainment of results that are both precise and aligned with the human-defined objectives.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2061-2069"},"PeriodicalIF":6.2,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00132j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Amin Ghanavati, Soroush Ahmadi and Sohrab Rohani
The effectiveness of drug treatments depends significantly on the water solubility of compounds, influencing bioavailability and therapeutic outcomes. A reliable predictive solubility tool enables drug developers to swiftly identify drugs with low solubility and implement proactive solubility enhancement techniques. The current research proposes three predictive models based on four solubility datasets (ESOL, AQUA, PHYS, OCHEM), encompassing 3942 unique molecules. Three different molecular representations were obtained, including electrostatic potential (ESP) maps, molecular graph, and tabular features (extracted from ESP maps and tabular Mordred descriptors). We conducted 3942 DFT calculations to acquire ESP maps and extract features from them. Subsequently, we applied two deep learning models, EdgeConv and Graph Convolutional Network (GCN), to the point cloud (ESP) and graph modalities of molecules. In addition, we utilized a random forest-based feature selection on tabular features, followed by mapping with XGBoost. A t-SNE analysis visualized chemical space across datasets and unique molecules, providing valuable insights for model evaluation. The proposed machine learning (ML)-based models, trained on 80% of each dataset and evaluated on the remaining 20%, showcased superior performance, particularly with XGBoost utilizing the extracted and selected tabular features. This yielded average test data Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R2) values of 0.458, 0.613, and 0.918, respectively. Furthermore, an ensemble of the three models showed improvement in error metrics across all datasets, consistently outperforming each individual model. This Ensemble model was also tested on the Solubility Challenge 2019, achieving an RMSE of 0.865 and outperforming 37 models with an average RMSE of 1.62. Transferability analysis of our work further indicated robust performance across different datasets. Additionally, SHAP explainability for the feature-based XGBoost model provided transparency in solubility predictions, enhancing the interpretability of the results.
{"title":"A machine learning approach for the prediction of aqueous solubility of pharmaceuticals: a comparative model and dataset analysis†","authors":"Mohammad Amin Ghanavati, Soroush Ahmadi and Sohrab Rohani","doi":"10.1039/D4DD00065J","DOIUrl":"10.1039/D4DD00065J","url":null,"abstract":"<p >The effectiveness of drug treatments depends significantly on the water solubility of compounds, influencing bioavailability and therapeutic outcomes. A reliable predictive solubility tool enables drug developers to swiftly identify drugs with low solubility and implement proactive solubility enhancement techniques. The current research proposes three predictive models based on four solubility datasets (ESOL, AQUA, PHYS, OCHEM), encompassing 3942 unique molecules. Three different molecular representations were obtained, including electrostatic potential (ESP) maps, molecular graph, and tabular features (extracted from ESP maps and tabular Mordred descriptors). We conducted 3942 DFT calculations to acquire ESP maps and extract features from them. Subsequently, we applied two deep learning models, EdgeConv and Graph Convolutional Network (GCN), to the point cloud (ESP) and graph modalities of molecules. In addition, we utilized a random forest-based feature selection on tabular features, followed by mapping with XGBoost. A t-SNE analysis visualized chemical space across datasets and unique molecules, providing valuable insights for model evaluation. The proposed machine learning (ML)-based models, trained on 80% of each dataset and evaluated on the remaining 20%, showcased superior performance, particularly with XGBoost utilizing the extracted and selected tabular features. This yielded average test data Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and <em>R</em>-squared (<em>R</em><small><sup>2</sup></small>) values of 0.458, 0.613, and 0.918, respectively. Furthermore, an ensemble of the three models showed improvement in error metrics across all datasets, consistently outperforming each individual model. This Ensemble model was also tested on the Solubility Challenge 2019, achieving an RMSE of 0.865 and outperforming 37 models with an average RMSE of 1.62. Transferability analysis of our work further indicated robust performance across different datasets. Additionally, SHAP explainability for the feature-based XGBoost model provided transparency in solubility predictions, enhancing the interpretability of the results.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2085-2104"},"PeriodicalIF":6.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00065j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We would like to take this opportunity to thank all of Digital Discovery’s reviewers for helping to preserve quality and integrity in chemical science literature. We would also like to highlight the Outstanding Reviewers for Digital Discovery in 2023.
{"title":"Outstanding Reviewers for Digital Discovery in 2023","authors":"","doi":"10.1039/D4DD90037E","DOIUrl":"10.1039/D4DD90037E","url":null,"abstract":"<p >We would like to take this opportunity to thank all of <em>Digital Discovery</em>’s reviewers for helping to preserve quality and integrity in chemical science literature. We would also like to highlight the Outstanding Reviewers for <em>Digital Discovery</em> in 2023.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 1922-1922"},"PeriodicalIF":6.2,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd90037e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jan Obořil, Christian P. Haas, Maximilian Lübbesmeyer, Rachel Nicholls, Thorsten Gressling, Klavs F. Jensen, Giulio Volpin and Julius Hillenbrand
Reaction screening and high-throughput experimentation (HTE) coupled with liquid chromatography (HPLC and UHPLC) are becoming more important than ever in synthetic chemistry. With a growing number of experiments, it is increasingly difficult to ensure correct peak identification and integration, especially due to unknown side components which often overlap with the peaks of interest. We developed an improved version of the MOCCA Python package with a web-based graphical user interface (GUI) for automated processing of chromatograms, including baseline correction, intelligent peak picking, peak purity checks, deconvolution of overlapping peaks, and compound tracking. The individual automatic processing steps have been improved compared to the previous version of MOCCA to make the software more dependable and versatile. The algorithm accuracy was benchmarked using three datasets and compared to the previous MOCCA implementation and published results. The processing is fully automated with the possibility to include calibration and internal standards. The software supports chromatograms with photo-diode array detector (DAD) data from most commercial HPLC systems, and the Python package and GUI implementation are open-source to allow addition of new features and further development.
{"title":"Automated processing of chromatograms: a comprehensive python package with a GUI for intelligent peak identification and deconvolution in chemical reaction analysis","authors":"Jan Obořil, Christian P. Haas, Maximilian Lübbesmeyer, Rachel Nicholls, Thorsten Gressling, Klavs F. Jensen, Giulio Volpin and Julius Hillenbrand","doi":"10.1039/D4DD00214H","DOIUrl":"10.1039/D4DD00214H","url":null,"abstract":"<p >Reaction screening and high-throughput experimentation (HTE) coupled with liquid chromatography (HPLC and UHPLC) are becoming more important than ever in synthetic chemistry. With a growing number of experiments, it is increasingly difficult to ensure correct peak identification and integration, especially due to unknown side components which often overlap with the peaks of interest. We developed an improved version of the MOCCA Python package with a web-based graphical user interface (GUI) for automated processing of chromatograms, including baseline correction, intelligent peak picking, peak purity checks, deconvolution of overlapping peaks, and compound tracking. The individual automatic processing steps have been improved compared to the previous version of MOCCA to make the software more dependable and versatile. The algorithm accuracy was benchmarked using three datasets and compared to the previous MOCCA implementation and published results. The processing is fully automated with the possibility to include calibration and internal standards. The software supports chromatograms with photo-diode array detector (DAD) data from most commercial HPLC systems, and the Python package and GUI implementation are open-source to allow addition of new features and further development.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2041-2051"},"PeriodicalIF":6.2,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00214h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural networks (NNs) accelerate simulations of quantum dissipative dynamics. Ensuring that these simulations adhere to fundamental physical laws is crucial, but has been largely ignored in the state-of-the-art NN approaches. We show that this may lead to implausible results measured by violation of the trace conservation. To recover the correct physical behavior, we develop physics-informed NNs (PINNs) that mitigate the violations to a good extent. Beyond that, we propose a novel uncertainty-aware approach that enforces perfect trace conservation by design, surpassing PINNs.
{"title":"Physics-informed neural networks and beyond: enforcing physical constraints in quantum dissipative dynamics†","authors":"Arif Ullah, Yu Huang, Ming Yang and Pavlo O. Dral","doi":"10.1039/D4DD00153B","DOIUrl":"10.1039/D4DD00153B","url":null,"abstract":"<p >Neural networks (NNs) accelerate simulations of quantum dissipative dynamics. Ensuring that these simulations adhere to fundamental physical laws is crucial, but has been largely ignored in the state-of-the-art NN approaches. We show that this may lead to implausible results measured by violation of the trace conservation. To recover the correct physical behavior, we develop physics-informed NNs (PINNs) that mitigate the violations to a good extent. Beyond that, we propose a novel uncertainty-aware approach that enforces perfect trace conservation by design, surpassing PINNs.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2052-2060"},"PeriodicalIF":6.2,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00153b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baochen Li, Yuru Liu, Haibin Sun, Rentao Zhang, Yongli Xie, Klement Foo, Frankie S. Mak, Ruimao Zhang, Tianshu Yu, Sen Lin, Peng Wang and Xiaoxue Wang
As a fundamental problem in organic chemistry, synthesis planning aims at designing energy and cost-efficient reaction pathways for target compounds. In synthesis planning, it is crucial to understand regioselectivity, or the preference of a reaction over competing reaction sites. Precisely predicting regioselectivity enables early exclusion of unproductive reactions and paves the way to designing high-yielding synthetic routes with minimal separation and material costs. However, it is still at the emerging state to combine chemical knowledge and data-driven methods to make practical predictions for regioselectivity. At the same time, metal-catalyzed cross-coupling reactions have profoundly transformed medicinal chemistry, and thus become one of the most frequently encountered types of reactions in synthesis planning. In this work, we for the first time introduce a chemical knowledge informed message passing neural network (MPNN) framework that directly identifies the intrinsic major products for metal-catalyzed cross-coupling reactions with regioselective ambiguity. Integrating both first principles methods and data-driven methods, our model achieves an overall accuracy of 96.51% on the test set of eight typical metal-catalyzed cross-coupling reaction types, including Suzuki–Miyaura, Stille, Sonogashira, Buchwald–Hartwig, Hiyama, Kumada, Negishi, and Heck reactions, outperforming other commonly used model types. To integrate electronic effects with steric effects in regioselectivity prediction, we propose a quantitative method to measure the steric hindrance effect. Our steric hindrance checker can successfully identify regioselectivity induced solely by steric hindrance. Notably under practical scenarios, our model outperforms 6 experimental organic chemists with an average working experience of over 10 years in the organic synthesis industry in terms of predicting major products in regioselective cases. We have also exemplified the practical usage of our model by fixing routes designed by open-access synthesis planning software and improving reactions by identifying low-cost starting materials. To assist general chemists in making prompt decisions about regioselectivity, we have developed a free web-based AI-empowered tool. Our code and web tool have been made available at https://github.com/Chemlex-AI/regioselectivity and https://ai.tools.chemlex.com/region-choose, respectively.
{"title":"Regio-MPNN: predicting regioselectivity for general metal-catalyzed cross-coupling reactions using a chemical knowledge informed message passing neural network†","authors":"Baochen Li, Yuru Liu, Haibin Sun, Rentao Zhang, Yongli Xie, Klement Foo, Frankie S. Mak, Ruimao Zhang, Tianshu Yu, Sen Lin, Peng Wang and Xiaoxue Wang","doi":"10.1039/D4DD00244J","DOIUrl":"10.1039/D4DD00244J","url":null,"abstract":"<p >As a fundamental problem in organic chemistry, synthesis planning aims at designing energy and cost-efficient reaction pathways for target compounds. In synthesis planning, it is crucial to understand regioselectivity, or the preference of a reaction over competing reaction sites. Precisely predicting regioselectivity enables early exclusion of unproductive reactions and paves the way to designing high-yielding synthetic routes with minimal separation and material costs. However, it is still at the emerging state to combine chemical knowledge and data-driven methods to make practical predictions for regioselectivity. At the same time, metal-catalyzed cross-coupling reactions have profoundly transformed medicinal chemistry, and thus become one of the most frequently encountered types of reactions in synthesis planning. In this work, we for the first time introduce a chemical knowledge informed message passing neural network (MPNN) framework that directly identifies the intrinsic major products for metal-catalyzed cross-coupling reactions with regioselective ambiguity. Integrating both first principles methods and data-driven methods, our model achieves an overall accuracy of 96.51% on the test set of eight typical metal-catalyzed cross-coupling reaction types, including Suzuki–Miyaura, Stille, Sonogashira, Buchwald–Hartwig, Hiyama, Kumada, Negishi, and Heck reactions, outperforming other commonly used model types. To integrate electronic effects with steric effects in regioselectivity prediction, we propose a quantitative method to measure the steric hindrance effect. Our steric hindrance checker can successfully identify regioselectivity induced solely by steric hindrance. Notably under practical scenarios, our model outperforms 6 experimental organic chemists with an average working experience of over 10 years in the organic synthesis industry in terms of predicting major products in regioselective cases. We have also exemplified the practical usage of our model by fixing routes designed by open-access synthesis planning software and improving reactions by identifying low-cost starting materials. To assist general chemists in making prompt decisions about regioselectivity, we have developed a free web-based AI-empowered tool. Our code and web tool have been made available at https://github.com/Chemlex-AI/regioselectivity and https://ai.tools.chemlex.com/region-choose, respectively.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2019-2031"},"PeriodicalIF":6.2,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00244j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michaela K. Loveless, Minwei Che, Alec J. Sanchez, Vikrant Tripathy, Bo W. Laursen, Sudhakar Pamidighantam, Krishnan Raghavachari and Amar H. Flood
Redox and optical data of organic fluorophores are essential for using design rules and property screening to identify new candidate dyes capable of forming optical materials. One such optical material is small-molecule, ionic isolation lattices (SMILES), which have properties defined by the optical and electrochemical properties of the fluorophores used. While optical data are available and readily extracted, the promise of digital discovery to mine the data and identify new dye candidates for making new fluorescent compounds is limited by experimental electrochemical data, which is reported with varying quality. We report methods to extract data from 20 000+ literature-reported dyes for generating a library of both redox and optical data constituted by 206 dye-solvent entries. Wide heterogeneity in data collection and reporting practices predicated use of a workflow involving manual data extraction, expert annotations of data quality and validation. Chemometric analysis shows distributions of solvents, electrolytes, and reference electrodes used in electrochemistry and the distributions of dye families and molecular weights. Data were extracted and screened to identify fluorophores predicted to form fluorescent solids based on SMILES. Screening used three design rules requiring dyes to be cationic, have a redox window within −1.9 and +1.5 V (vs. ferrocene), and a size less than 2 nm. A set of 47 dyes are compliant with all design rules showcasing the potential for using paired electrochemical-optical data in a workflow for designing optical materials.
{"title":"Extracting recalcitrant redox data on fluorophores to pair with optical data for predicting small-molecule, ionic isolation lattices†","authors":"Michaela K. Loveless, Minwei Che, Alec J. Sanchez, Vikrant Tripathy, Bo W. Laursen, Sudhakar Pamidighantam, Krishnan Raghavachari and Amar H. Flood","doi":"10.1039/D4DD00137K","DOIUrl":"10.1039/D4DD00137K","url":null,"abstract":"<p >Redox and optical data of organic fluorophores are essential for using design rules and property screening to identify new candidate dyes capable of forming optical materials. One such optical material is small-molecule, ionic isolation lattices (SMILES), which have properties defined by the optical and electrochemical properties of the fluorophores used. While optical data are available and readily extracted, the promise of digital discovery to mine the data and identify new dye candidates for making new fluorescent compounds is limited by experimental electrochemical data, which is reported with varying quality. We report methods to extract data from 20 000+ literature-reported dyes for generating a library of both redox and optical data constituted by 206 dye-solvent entries. Wide heterogeneity in data collection and reporting practices predicated use of a workflow involving manual data extraction, expert annotations of data quality and validation. Chemometric analysis shows distributions of solvents, electrolytes, and reference electrodes used in electrochemistry and the distributions of dye families and molecular weights. Data were extracted and screened to identify fluorophores predicted to form fluorescent solids based on SMILES. Screening used three design rules requiring dyes to be cationic, have a redox window within −1.9 and +1.5 V (<em>vs.</em> ferrocene), and a size less than 2 nm. A set of 47 dyes are compliant with all design rules showcasing the potential for using paired electrochemical-optical data in a workflow for designing optical materials.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2105-2117"},"PeriodicalIF":6.2,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00137k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}