With the breakthroughs in protein structure prediction technology, constructing atomic structures from cryo-electron microscopy (cryo-EM) density maps through structural fitting has become increasingly critical. However, the accuracy of the constructed models heavily relies on the precision of the structure-to-map fitting. In this study, we introduce DEMO-EMfit, a progressive method that integrates deep learning-based backbone map extraction with a global-local structural pose search to fit atomic structures into density maps. DEMO-EMfit was extensively evaluated on a benchmark data set comprising both cryo-electron tomography (cryo-ET) and cryo-EM maps of protein and nucleic acid complexes. The results demonstrate that DEMO-EMfit outperforms state-of-the-art approaches, offering an efficient and accurate tool for fitting atomic structures into density maps.
{"title":"Fitting Atomic Structures into Cryo-EM Maps by Coupling Deep Learning-Enhanced Map Processing with Global-Local Optimization","authors":"Yaxian Cai, Ziying Zhang, Xiangyu Xu, Liang Xu, Yu Chen, Guijun Zhang* and Xiaogen Zhou*, ","doi":"10.1021/acs.jcim.5c0000410.1021/acs.jcim.5c00004","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00004https://doi.org/10.1021/acs.jcim.5c00004","url":null,"abstract":"<p >With the breakthroughs in protein structure prediction technology, constructing atomic structures from cryo-electron microscopy (cryo-EM) density maps through structural fitting has become increasingly critical. However, the accuracy of the constructed models heavily relies on the precision of the structure-to-map fitting. In this study, we introduce DEMO-EMfit, a progressive method that integrates deep learning-based backbone map extraction with a global-local structural pose search to fit atomic structures into density maps. DEMO-EMfit was extensively evaluated on a benchmark data set comprising both cryo-electron tomography (cryo-ET) and cryo-EM maps of protein and nucleic acid complexes. The results demonstrate that DEMO-EMfit outperforms state-of-the-art approaches, offering an efficient and accurate tool for fitting atomic structures into density maps.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 7","pages":"3800–3811 3800–3811"},"PeriodicalIF":5.6,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143825260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the breakthroughs in protein structure prediction technology, constructing atomic structures from cryo-electron microscopy (cryo-EM) density maps through structural fitting has become increasingly critical. However, the accuracy of the constructed models heavily relies on the precision of the structure-to-map fitting. In this study, we introduce DEMO-EMfit, a progressive method that integrates deep learning-based backbone map extraction with a global-local structural pose search to fit atomic structures into density maps. DEMO-EMfit was extensively evaluated on a benchmark data set comprising both cryo-electron tomography (cryo-ET) and cryo-EM maps of protein and nucleic acid complexes. The results demonstrate that DEMO-EMfit outperforms state-of-the-art approaches, offering an efficient and accurate tool for fitting atomic structures into density maps.
{"title":"Fitting Atomic Structures into Cryo-EM Maps by Coupling Deep Learning-Enhanced Map Processing with Global-Local Optimization.","authors":"Yaxian Cai, Ziying Zhang, Xiangyu Xu, Liang Xu, Yu Chen, Guijun Zhang, Xiaogen Zhou","doi":"10.1021/acs.jcim.5c00004","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00004","url":null,"abstract":"<p><p>With the breakthroughs in protein structure prediction technology, constructing atomic structures from cryo-electron microscopy (cryo-EM) density maps through structural fitting has become increasingly critical. However, the accuracy of the constructed models heavily relies on the precision of the structure-to-map fitting. In this study, we introduce DEMO-EMfit, a progressive method that integrates deep learning-based backbone map extraction with a global-local structural pose search to fit atomic structures into density maps. DEMO-EMfit was extensively evaluated on a benchmark data set comprising both cryo-electron tomography (cryo-ET) and cryo-EM maps of protein and nucleic acid complexes. The results demonstrate that DEMO-EMfit outperforms state-of-the-art approaches, offering an efficient and accurate tool for fitting atomic structures into density maps.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143727027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-28DOI: 10.1021/acs.jcim.5c0016710.1021/acs.jcim.5c00167
Sargol Mazraedoost, Hadi Sedigh Malekroodi, Petar Žuvela, Myunggi Yi and J. Jay Liu*,
Accurate retention time (RT) prediction in liquid chromatography remains a significant consideration in molecular analysis. In this study, we explore the use of a transformer-based language model to predict RTs by treating simplified molecular input line entry system (SMILES) sequences as textual input, an approach that has not been previously utilized in this field. Our architecture combines a pretrained RoBERTa (robustly optimized BERT approach, a variant of BERT) with bidirectional long short-term memory (BiLSTM) networks to predict retention times in reversed-phase high-performance liquid chromatography (RP-HPLC). The METLIN small molecule retention time (SMRT) data set comprising 77,980 small molecules after preprocessing, was encoded using SMILES notation and processed through a tokenizer to enable molecular representation as sequential data. The proposed transformer-LSTM architecture incorporates layer fusion from multiple transformer layers and bidirectional sequence processing, achieving superior performance compared to existing methods with a mean absolute error (MAE) of 26.23 s, a mean absolute percentage error (MAPE) of 3.25%, and R-squared (R2) value of 0.91. The model’s explainability was demonstrated through attention visualization, revealing its focus on key molecular features that can influence RT. Furthermore, we evaluated the model’s transfer learning capabilities across ten data sets from the PredRet database, demonstrating robust performance across different chromatographic conditions with consistent improvement over previous approaches. Our results suggest that the hybrid model presents a valuable approach for predicting RT in liquid chromatography, with potential applications in metabolomics and small molecule analysis.
{"title":"Prediction of Chromatographic Retention Time of a Small Molecule from SMILES Representation Using a Hybrid Transformer-LSTM Model","authors":"Sargol Mazraedoost, Hadi Sedigh Malekroodi, Petar Žuvela, Myunggi Yi and J. Jay Liu*, ","doi":"10.1021/acs.jcim.5c0016710.1021/acs.jcim.5c00167","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00167https://doi.org/10.1021/acs.jcim.5c00167","url":null,"abstract":"<p >Accurate retention time (RT) prediction in liquid chromatography remains a significant consideration in molecular analysis. In this study, we explore the use of a transformer-based language model to predict RTs by treating simplified molecular input line entry system (SMILES) sequences as textual input, an approach that has not been previously utilized in this field. Our architecture combines a pretrained RoBERTa (robustly optimized BERT approach, a variant of BERT) with bidirectional long short-term memory (BiLSTM) networks to predict retention times in reversed-phase high-performance liquid chromatography (RP-HPLC). The METLIN small molecule retention time (SMRT) data set comprising 77,980 small molecules after preprocessing, was encoded using SMILES notation and processed through a tokenizer to enable molecular representation as sequential data. The proposed transformer-LSTM architecture incorporates layer fusion from multiple transformer layers and bidirectional sequence processing, achieving superior performance compared to existing methods with a mean absolute error (MAE) of 26.23 s, a mean absolute percentage error (MAPE) of 3.25%, and <i>R</i>-squared (<i>R</i><sup>2</sup>) value of 0.91. The model’s explainability was demonstrated through attention visualization, revealing its focus on key molecular features that can influence RT. Furthermore, we evaluated the model’s transfer learning capabilities across ten data sets from the PredRet database, demonstrating robust performance across different chromatographic conditions with consistent improvement over previous approaches. Our results suggest that the hybrid model presents a valuable approach for predicting RT in liquid chromatography, with potential applications in metabolomics and small molecule analysis.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 7","pages":"3343–3356 3343–3356"},"PeriodicalIF":5.6,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143825103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-28DOI: 10.1021/acs.jcim.5c00167
Sargol Mazraedoost, Hadi Sedigh Malekroodi, Petar Žuvela, Myunggi Yi, J Jay Liu
Accurate retention time (RT) prediction in liquid chromatography remains a significant consideration in molecular analysis. In this study, we explore the use of a transformer-based language model to predict RTs by treating simplified molecular input line entry system (SMILES) sequences as textual input, an approach that has not been previously utilized in this field. Our architecture combines a pretrained RoBERTa (robustly optimized BERT approach, a variant of BERT) with bidirectional long short-term memory (BiLSTM) networks to predict retention times in reversed-phase high-performance liquid chromatography (RP-HPLC). The METLIN small molecule retention time (SMRT) data set comprising 77,980 small molecules after preprocessing, was encoded using SMILES notation and processed through a tokenizer to enable molecular representation as sequential data. The proposed transformer-LSTM architecture incorporates layer fusion from multiple transformer layers and bidirectional sequence processing, achieving superior performance compared to existing methods with a mean absolute error (MAE) of 26.23 s, a mean absolute percentage error (MAPE) of 3.25%, and R-squared (R2) value of 0.91. The model's explainability was demonstrated through attention visualization, revealing its focus on key molecular features that can influence RT. Furthermore, we evaluated the model's transfer learning capabilities across ten data sets from the PredRet database, demonstrating robust performance across different chromatographic conditions with consistent improvement over previous approaches. Our results suggest that the hybrid model presents a valuable approach for predicting RT in liquid chromatography, with potential applications in metabolomics and small molecule analysis.
{"title":"Prediction of Chromatographic Retention Time of a Small Molecule from SMILES Representation Using a Hybrid Transformer-LSTM Model.","authors":"Sargol Mazraedoost, Hadi Sedigh Malekroodi, Petar Žuvela, Myunggi Yi, J Jay Liu","doi":"10.1021/acs.jcim.5c00167","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00167","url":null,"abstract":"<p><p>Accurate retention time (RT) prediction in liquid chromatography remains a significant consideration in molecular analysis. In this study, we explore the use of a transformer-based language model to predict RTs by treating simplified molecular input line entry system (SMILES) sequences as textual input, an approach that has not been previously utilized in this field. Our architecture combines a pretrained RoBERTa (robustly optimized BERT approach, a variant of BERT) with bidirectional long short-term memory (BiLSTM) networks to predict retention times in reversed-phase high-performance liquid chromatography (RP-HPLC). The METLIN small molecule retention time (SMRT) data set comprising 77,980 small molecules after preprocessing, was encoded using SMILES notation and processed through a tokenizer to enable molecular representation as sequential data. The proposed transformer-LSTM architecture incorporates layer fusion from multiple transformer layers and bidirectional sequence processing, achieving superior performance compared to existing methods with a mean absolute error (MAE) of 26.23 s, a mean absolute percentage error (MAPE) of 3.25%, and <i>R</i>-squared (<i>R</i><sup>2</sup>) value of 0.91. The model's explainability was demonstrated through attention visualization, revealing its focus on key molecular features that can influence RT. Furthermore, we evaluated the model's transfer learning capabilities across ten data sets from the PredRet database, demonstrating robust performance across different chromatographic conditions with consistent improvement over previous approaches. Our results suggest that the hybrid model presents a valuable approach for predicting RT in liquid chromatography, with potential applications in metabolomics and small molecule analysis.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143727032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-27DOI: 10.1021/acs.jcim.4c02364
Adrian Racki, Kamil Paduszyński
This paper reviews the recent and most impactful advancements in the application of artificial neural networks in modeling the properties of ionic liquids. As salts that are liquid at temperatures below 100 °C, ionic liquids possess unique properties beneficial for various industrial applications such as carbon capture, catalytic solvents, and lubricant additives. The study emphasizes the challenges in selecting appropriate ILs due to the vast variability in their properties, which depend significantly on their cation and anion structures. The review discusses the advantages of using ANNs, including feed-forward, cascade-forward, convolutional, recurrent, and graph neural networks, over traditional machine learning algorithms for predicting the thermodynamic and physical properties of ILs. The paper also highlights the importance of data preparation, including data collection, feature engineering, and data cleaning, in developing accurate predictive models. Additionally, the review covers the interpretability of these models using techniques such as SHapley Additive exPlanations to understand feature importance. The authors conclude by discussing future opportunities and the potential of combining ANNs with other computational methods to design new ILs with targeted properties.
{"title":"Recent Advances in the Modeling of Ionic Liquids Using Artificial Neural Networks.","authors":"Adrian Racki, Kamil Paduszyński","doi":"10.1021/acs.jcim.4c02364","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02364","url":null,"abstract":"<p><p>This paper reviews the recent and most impactful advancements in the application of artificial neural networks in modeling the properties of ionic liquids. As salts that are liquid at temperatures below 100 °C, ionic liquids possess unique properties beneficial for various industrial applications such as carbon capture, catalytic solvents, and lubricant additives. The study emphasizes the challenges in selecting appropriate ILs due to the vast variability in their properties, which depend significantly on their cation and anion structures. The review discusses the advantages of using ANNs, including feed-forward, cascade-forward, convolutional, recurrent, and graph neural networks, over traditional machine learning algorithms for predicting the thermodynamic and physical properties of ILs. The paper also highlights the importance of data preparation, including data collection, feature engineering, and data cleaning, in developing accurate predictive models. Additionally, the review covers the interpretability of these models using techniques such as SHapley Additive exPlanations to understand feature importance. The authors conclude by discussing future opportunities and the potential of combining ANNs with other computational methods to design new ILs with targeted properties.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143717627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-27DOI: 10.1021/acs.jcim.5c0002210.1021/acs.jcim.5c00022
Lin Feng, Xiangzheng Fu, Zhenya Du, Yuting Guo, Linlin Zhuo*, Yan Yang, Dongsheng Cao* and Xiaojun Yao*,
Cardiotoxicity refers to the inhibitory effects of drugs on cardiac ion channels. Accurate prediction of cardiotoxicity is crucial yet challenging, as it directly impacts the evaluation of cardiac drug efficacy and safety. Numerous methods have been developed to predict cardiotoxicity, yet their performance remains limited. A key limitation is that these methods often rely solely on single-modal data, making multimodal data integration challenging. As a result, we present a multimodal method integrating molecular SMILES, structure, and fingerprint to enhance cardiotoxicity prediction. First, we designed a fusion layer to unify representations from different modalities. During training, the model maximizes intramodal similarity for the same molecule while minimizing intermolecular similarity, ensuring consistent cross-modal representations. This study evaluates the inhibitory effects of candidate drugs on voltage-gated potassium (hERG), sodium (Nav1.5), and calcium (Cav1.2) channels. Experimental results demonstrate that the proposed model significantly outperforms existing state-of-the-art methods in cardiotoxicity prediction. We anticipate that this model will contribute significantly to the development and safety evaluation of cardiac drugs, reducing cardiotoxicity-related risks.
{"title":"MultiCTox: Empowering Accurate Cardiotoxicity Prediction through Adaptive Multimodal Learning","authors":"Lin Feng, Xiangzheng Fu, Zhenya Du, Yuting Guo, Linlin Zhuo*, Yan Yang, Dongsheng Cao* and Xiaojun Yao*, ","doi":"10.1021/acs.jcim.5c0002210.1021/acs.jcim.5c00022","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00022https://doi.org/10.1021/acs.jcim.5c00022","url":null,"abstract":"<p >Cardiotoxicity refers to the inhibitory effects of drugs on cardiac ion channels. Accurate prediction of cardiotoxicity is crucial yet challenging, as it directly impacts the evaluation of cardiac drug efficacy and safety. Numerous methods have been developed to predict cardiotoxicity, yet their performance remains limited. A key limitation is that these methods often rely solely on single-modal data, making multimodal data integration challenging. As a result, we present a multimodal method integrating molecular SMILES, structure, and fingerprint to enhance cardiotoxicity prediction. First, we designed a fusion layer to unify representations from different modalities. During training, the model maximizes intramodal similarity for the same molecule while minimizing intermolecular similarity, ensuring consistent cross-modal representations. This study evaluates the inhibitory effects of candidate drugs on voltage-gated potassium (hERG), sodium (Nav1.5), and calcium (Cav1.2) channels. Experimental results demonstrate that the proposed model significantly outperforms existing state-of-the-art methods in cardiotoxicity prediction. We anticipate that this model will contribute significantly to the development and safety evaluation of cardiac drugs, reducing cardiotoxicity-related risks.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 7","pages":"3517–3528 3517–3528"},"PeriodicalIF":5.6,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143825065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-27DOI: 10.1021/acs.jcim.4c02420
Audrey V Conner, Lauren M Kim, Patrick A Fagan, Drew P Harding, Steven E Wheeler
Stacking interactions contribute significantly to the interaction of small molecules with RNA, and harnessing the power of these interactions will likely prove important in the development of RNA-targeting inhibitors. To this end, we present a comprehensive computational analysis of stacking interactions between a set of 54 druglike heterocycles and the natural nucleobases. We first show that heterocycle choice can tune the strength of stacking interactions with nucleobases over a large range and that heterocycles favor stacked geometries that cluster around a discrete set of stacking loci characteristic of each nucleobase. Symmetry-adapted perturbation theory results indicate that the strengths of these interactions are modulated primarily by electrostatic and dispersion effects. Based on this, we present a multivariate predictive model of the maximum strength of stacking interactions between a given heterocycle and nucleobase that depends on molecular descriptors derived from the electrostatic potential. These descriptors can be readily computed using density functional theory or predicted directly from atom connectivity (e.g., SMILES). This model is used to predict the maximum possible stacking interactions of a set of 1854 druglike heterocycles with the natural nucleobases. Finally, we show that trivial modifications of standard (fixed-charge) molecular mechanics force fields reduce errors in predicted stacking interaction energies from around 2 kcal/mol to below 1 kcal/mol, providing a pragmatic means of predicting more reliable stacking interaction energies using existing computational workflows. We also analyze the stacking interactions between ribocil and a bacterial riboswitch, showing that two of the three aromatic heterocyclic components engage in near-optimal stacking interactions with binding site nucleobases.
{"title":"Stacking Interactions of Druglike Heterocycles with Nucleobases.","authors":"Audrey V Conner, Lauren M Kim, Patrick A Fagan, Drew P Harding, Steven E Wheeler","doi":"10.1021/acs.jcim.4c02420","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02420","url":null,"abstract":"<p><p>Stacking interactions contribute significantly to the interaction of small molecules with RNA, and harnessing the power of these interactions will likely prove important in the development of RNA-targeting inhibitors. To this end, we present a comprehensive computational analysis of stacking interactions between a set of 54 druglike heterocycles and the natural nucleobases. We first show that heterocycle choice can tune the strength of stacking interactions with nucleobases over a large range and that heterocycles favor stacked geometries that cluster around a discrete set of stacking loci characteristic of each nucleobase. Symmetry-adapted perturbation theory results indicate that the strengths of these interactions are modulated primarily by electrostatic and dispersion effects. Based on this, we present a multivariate predictive model of the maximum strength of stacking interactions between a given heterocycle and nucleobase that depends on molecular descriptors derived from the electrostatic potential. These descriptors can be readily computed using density functional theory or predicted directly from atom connectivity (e.g., SMILES). This model is used to predict the maximum possible stacking interactions of a set of 1854 druglike heterocycles with the natural nucleobases. Finally, we show that trivial modifications of standard (fixed-charge) molecular mechanics force fields reduce errors in predicted stacking interaction energies from around 2 kcal/mol to below 1 kcal/mol, providing a pragmatic means of predicting more reliable stacking interaction energies using existing computational workflows. We also analyze the stacking interactions between ribocil and a bacterial riboswitch, showing that two of the three aromatic heterocyclic components engage in near-optimal stacking interactions with binding site nucleobases.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143727083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-27DOI: 10.1021/acs.jcim.4c0236410.1021/acs.jcim.4c02364
Adrian Racki, and , Kamil Paduszyński*,
This paper reviews the recent and most impactful advancements in the application of artificial neural networks in modeling the properties of ionic liquids. As salts that are liquid at temperatures below 100 °C, ionic liquids possess unique properties beneficial for various industrial applications such as carbon capture, catalytic solvents, and lubricant additives. The study emphasizes the challenges in selecting appropriate ILs due to the vast variability in their properties, which depend significantly on their cation and anion structures. The review discusses the advantages of using ANNs, including feed-forward, cascade-forward, convolutional, recurrent, and graph neural networks, over traditional machine learning algorithms for predicting the thermodynamic and physical properties of ILs. The paper also highlights the importance of data preparation, including data collection, feature engineering, and data cleaning, in developing accurate predictive models. Additionally, the review covers the interpretability of these models using techniques such as SHapley Additive exPlanations to understand feature importance. The authors conclude by discussing future opportunities and the potential of combining ANNs with other computational methods to design new ILs with targeted properties.
{"title":"Recent Advances in the Modeling of Ionic Liquids Using Artificial Neural Networks","authors":"Adrian Racki, and , Kamil Paduszyński*, ","doi":"10.1021/acs.jcim.4c0236410.1021/acs.jcim.4c02364","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02364https://doi.org/10.1021/acs.jcim.4c02364","url":null,"abstract":"<p >This paper reviews the recent and most impactful advancements in the application of artificial neural networks in modeling the properties of ionic liquids. As salts that are liquid at temperatures below 100 °C, ionic liquids possess unique properties beneficial for various industrial applications such as carbon capture, catalytic solvents, and lubricant additives. The study emphasizes the challenges in selecting appropriate ILs due to the vast variability in their properties, which depend significantly on their cation and anion structures. The review discusses the advantages of using ANNs, including feed-forward, cascade-forward, convolutional, recurrent, and graph neural networks, over traditional machine learning algorithms for predicting the thermodynamic and physical properties of ILs. The paper also highlights the importance of data preparation, including data collection, feature engineering, and data cleaning, in developing accurate predictive models. Additionally, the review covers the interpretability of these models using techniques such as SHapley Additive exPlanations to understand feature importance. The authors conclude by discussing future opportunities and the potential of combining ANNs with other computational methods to design new ILs with targeted properties.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 7","pages":"3161–3175 3161–3175"},"PeriodicalIF":5.6,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jcim.4c02364","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143825058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-27DOI: 10.1021/acs.jcim.4c0244110.1021/acs.jcim.4c02441
Jinzhe Zeng, Timothy J. Giese, Duo Zhang, Han Wang and Darrin M. York*,
Machine learning potentials (MLPs) have revolutionized molecular simulation by providing efficient and accurate models for predicting atomic interactions. MLPs continue to advance and have had profound impact in applications that include drug discovery, enzyme catalysis, and materials design. The current landscape of MLP software presents challenges due to the limited interoperability between packages, which can lead to inconsistent benchmarking practices and necessitates separate interfaces with molecular dynamics (MD) software. To address these issues, we present DeePMD-GNN, a plugin for the DeePMD-kit framework that extends its capabilities to support external graph neural network (GNN) potentials.DeePMD-GNN enables the seamless integration of popular GNN-based models, such as NequIP and MACE, within the DeePMD-kit ecosystem. Furthermore, the new software infrastructure allows GNN models to be used within combined quantum mechanical/molecular mechanical (QM/MM) applications using the range corrected ΔMLP formalism.We demonstrate the application of DeePMD-GNN by performing benchmark calculations of NequIP, MACE, and DPA-2 models developed under consistent training conditions to ensure fair comparison.
{"title":"DeePMD-GNN: A DeePMD-kit Plugin for External Graph Neural Network Potentials","authors":"Jinzhe Zeng, Timothy J. Giese, Duo Zhang, Han Wang and Darrin M. York*, ","doi":"10.1021/acs.jcim.4c0244110.1021/acs.jcim.4c02441","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02441https://doi.org/10.1021/acs.jcim.4c02441","url":null,"abstract":"<p >Machine learning potentials (MLPs) have revolutionized molecular simulation by providing efficient and accurate models for predicting atomic interactions. MLPs continue to advance and have had profound impact in applications that include drug discovery, enzyme catalysis, and materials design. The current landscape of MLP software presents challenges due to the limited interoperability between packages, which can lead to inconsistent benchmarking practices and necessitates separate interfaces with molecular dynamics (MD) software. To address these issues, we present DeePMD-GNN, a plugin for the DeePMD-kit framework that extends its capabilities to support external graph neural network (GNN) potentials.DeePMD-GNN enables the seamless integration of popular GNN-based models, such as NequIP and MACE, within the DeePMD-kit ecosystem. Furthermore, the new software infrastructure allows GNN models to be used within combined quantum mechanical/molecular mechanical (QM/MM) applications using the range corrected ΔMLP formalism.We demonstrate the application of DeePMD-GNN by performing benchmark calculations of NequIP, MACE, and DPA-2 models developed under consistent training conditions to ensure fair comparison.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 7","pages":"3154–3160 3154–3160"},"PeriodicalIF":5.6,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143825055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-27DOI: 10.1021/acs.jcim.5c0001110.1021/acs.jcim.5c00011
Shreya Bhattacharya, and , Priyadarshi Satpati*,
The energetic basis for the enhanced PAM (protospacer adjacent motif) readability in engineered Cas9-NG (a variant of Cas9 from Streptococcus pyogenes (SpCas9)) with seven mutations: (R1335V, E1219F, D1135V, L1111R, T1337R, G1218R, and A1322R) remains a fundamental unsolved problem. Utilizing the X-ray structure of the precatalytic complex (SpCas9:sgRNA:dsDNA) as a template, we calculated the changes in PAM (TGG, TGA, TGT, or TGC) binding affinity (ΔΔG) associated with each of the seven mutations in SpCas9 through rigorous alchemical simulations (sampling ∼ 53 μs). The underlying thermodynamics (ΔΔG) accounts for the experimentally observed differences in DNA cleavage activity between SpCas9 and Cas9-NG across various DNA substrates. The interaction energies between SpCas9 and DNA are significantly influenced by the type and location of the amino acid mutations. Notably, the R1335V mutation disfavors DNA binding by disrupting critical interactions with the PAM. However, the destabilizing effect of the R1335V mutation is mitigated by four advantageous mutations (E1219F, D1135V, L1111R, and T1337R), which primarily introduce nonbase-specific interactions and enhance PAM readability. The hydrophobic substitutions (E1219F and D1135V) are particularly impactful, as they exclude solvent from the PAM binding pocket, strengthening electrostatic interactions in the low dielectric medium and increasing the stability of the noncognate PAM complexes by ∼2–5 kcal/mol. Additionally, L1111R and T1337R facilitate DNA binding by forming direct electrostatic contacts. In contrast, the charge mutations G1218R and A1322R do not effectively promote interactions with the negatively charged DNA, clearly demonstrating that the location of mutations is crucial in shaping these interaction energetics. We demonstrated that stabilization of the Cas9-NG: noncognate PAM complexes enables broader PAM recognition. This is primarily achieved through two mechanisms: (1) the establishment of new nonbase-specific interactions between the protein and nucleotides and (2) the enhancement of electrostatic interactions within a relatively dry and hydrophobic pocket. The findings revealed that mutation-induced desolvation can improve the recognition of noncognate PAMs, paving the way for the rational and innovative design of SpCas9 mutants.
工程化Cas9-NG(化脓性链球菌Cas9(SpCas9)的变体)的PAM(protospacer adjacent motif)可读性增强的能量基础仍然是一个未解决的基本问题。利用前催化复合物(SpCas9:sgRNA:dsDNA)的 X 射线结构作为模板,我们通过严格的炼金术模拟(取样 ∼ 53 μs)计算了与 SpCas9 中七种突变中每一种突变相关的 PAM(TGG、TGA、TGT 或 TGC)结合亲和力(ΔΔG)的变化。基本热力学(ΔΔG)解释了实验观察到的 SpCas9 和 Cas9-NG 在不同 DNA 底物上的 DNA 裂解活性差异。SpCas9 与 DNA 之间的相互作用能受到氨基酸突变类型和位置的显著影响。值得注意的是,R1335V 突变会破坏与 PAM 的关键相互作用,从而不利于 DNA 结合。然而,R1335V 突变的不稳定性影响被四个有利的突变(E1219F、D1135V、L1111R 和 T1337R)所缓解,这四个突变主要引入了非碱基特异性相互作用并提高了 PAM 的可读性。疏水取代(E1219F 和 D1135V)的影响尤为显著,因为它们将溶剂排除在 PAM 结合袋之外,加强了低介电常数介质中的静电相互作用,并将非识别 PAM 复合物的稳定性提高了 2-5 kcal/mol。此外,L1111R 和 T1337R 通过形成直接静电接触促进了 DNA 的结合。与此相反,电荷突变 G1218R 和 A1322R 并不能有效促进与带负电荷 DNA 的相互作用,这清楚地表明突变的位置在形成这些相互作用能量方面至关重要。我们证明,稳定 Cas9-NG:非识别 PAM 复合物可实现更广泛的 PAM 识别。这主要是通过两种机制实现的:(1)在蛋白质和核苷酸之间建立新的非碱基特异性相互作用;(2)在相对干燥的疏水口袋内增强静电相互作用。研究结果表明,突变诱导的脱溶可以改善对非识别 PAM 的识别,从而为合理、创新地设计 SpCas9 突变体铺平道路。
{"title":"Energetics of Expanded PAM Readability by Engineered Cas9-NG","authors":"Shreya Bhattacharya, and , Priyadarshi Satpati*, ","doi":"10.1021/acs.jcim.5c0001110.1021/acs.jcim.5c00011","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00011https://doi.org/10.1021/acs.jcim.5c00011","url":null,"abstract":"<p >The energetic basis for the enhanced PAM (protospacer adjacent motif) readability in engineered Cas9-NG (a variant of Cas9 from <i>Streptococcus pyogenes</i> (<i>Sp</i>Cas9)) with seven mutations: (R1335V, E1219F, D1135V, L1111R, T1337R, G1218R, and A1322R) remains a fundamental unsolved problem. Utilizing the X-ray structure of the precatalytic complex (<i>Sp</i>Cas9:sgRNA:dsDNA) as a template, we calculated the changes in PAM (TGG, TGA, TGT, or TGC) binding affinity (ΔΔ<i>G</i>) associated with each of the seven mutations in <i>Sp</i>Cas9 through rigorous alchemical simulations (sampling ∼ 53 μs). The underlying thermodynamics (ΔΔ<i>G</i>) accounts for the experimentally observed differences in DNA cleavage activity between <i>Sp</i>Cas9 and Cas9-NG across various DNA substrates. The interaction energies between <i>Sp</i>Cas9 and DNA are significantly influenced by the type and location of the amino acid mutations. Notably, the R1335V mutation disfavors DNA binding by disrupting critical interactions with the PAM. However, the destabilizing effect of the R1335V mutation is mitigated by four advantageous mutations (E1219F, D1135V, L1111R, and T1337R), which primarily introduce nonbase-specific interactions and enhance PAM readability. The hydrophobic substitutions (E1219F and D1135V) are particularly impactful, as they exclude solvent from the PAM binding pocket, strengthening electrostatic interactions in the low dielectric medium and increasing the stability of the noncognate PAM complexes by ∼2–5 kcal/mol. Additionally, L1111R and T1337R facilitate DNA binding by forming direct electrostatic contacts. In contrast, the charge mutations G1218R and A1322R do not effectively promote interactions with the negatively charged DNA, clearly demonstrating that the location of mutations is crucial in shaping these interaction energetics. We demonstrated that stabilization of the Cas9-NG: noncognate PAM complexes enables broader PAM recognition. This is primarily achieved through two mechanisms: (1) the establishment of new nonbase-specific interactions between the protein and nucleotides and (2) the enhancement of electrostatic interactions within a relatively dry and hydrophobic pocket. The findings revealed that mutation-induced desolvation can improve the recognition of noncognate PAMs, paving the way for the rational and innovative design of <i>Sp</i>Cas9 mutants.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 7","pages":"3628–3639 3628–3639"},"PeriodicalIF":5.6,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143825056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}