Pub Date : 2026-03-19DOI: 10.1021/acs.jcim.5c03055
Philippe Gantzer,Micke Kuwahara,Keisuke Takahashi,Pavel Sidorov
Quantitative structure-property relationship (QSPR) modeling often requires navigating fragmented tools for descriptor calculation and model optimization. We present a major evolution of the CADS platform through the seamless integration of DOPtools, a specialized Python library for molecular descriptor calculation and model building. These additions streamline the handling of molecular data and QSPR modeling, allowing users to input both numerical features and text-encoded chemical structures to build predictive models. Key enhancements include automated hyperparameter optimization; bulk prediction capabilities; and, especially, model transparency via ColorAtom, which provides intuitive, atom-centered visualizations of model logic. By bridging this gap, the platform now offers an accessible yet powerful environment for leveraging both public and proprietary chemical data.
{"title":"Integration of DOPtools and CADS in a Web-Based User Interface for Structural Descriptor Calculation, Model Optimization, and Prediction.","authors":"Philippe Gantzer,Micke Kuwahara,Keisuke Takahashi,Pavel Sidorov","doi":"10.1021/acs.jcim.5c03055","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c03055","url":null,"abstract":"Quantitative structure-property relationship (QSPR) modeling often requires navigating fragmented tools for descriptor calculation and model optimization. We present a major evolution of the CADS platform through the seamless integration of DOPtools, a specialized Python library for molecular descriptor calculation and model building. These additions streamline the handling of molecular data and QSPR modeling, allowing users to input both numerical features and text-encoded chemical structures to build predictive models. Key enhancements include automated hyperparameter optimization; bulk prediction capabilities; and, especially, model transparency via ColorAtom, which provides intuitive, atom-centered visualizations of model logic. By bridging this gap, the platform now offers an accessible yet powerful environment for leveraging both public and proprietary chemical data.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"13 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Metallocene catalysts, distinguished by their well-defined active centers and tunable coordination geometries, are pivotal in the homopolymerization of propylene to produce polypropylene with tailored properties. However, the rational design of such catalysts remains challenging due to the complex coupling between ligand structures and polymerization conditions. Conventional trial-and-error approaches are inefficient, while existing machine learning (ML) models often overlook critical ligand descriptors, limiting their generalization for industrial use. To address this, we developed a hybrid ML framework that integrates both reaction parameters and catalyst structural features. A dual-path neural network processes numerical and categorical inputs separately to avoid feature semantic distortion, enabling accurate predictions of catalyst activity (R2 = 0.9201) and number-average molecular weight (R2 = 0.9133). For the narrow molecular weight distribution typical of metallocene-derived polypropylene─a characteristic leading to compact, locally correlated data─a k-nearest neighbor regression model achieved superior performance (R2 = 0.9766) by effectively capturing local sample relationships. Both models outperformed eight other benchmark ML algorithms across all metrics. This work provides a robust, interpretable computational strategy for linking catalyst chemistry to polymer properties, offering a practical tool for the targeted design and scalable application of high-performance polypropylene materials.
{"title":"Accurate Prediction of Polymerization Performance for Metallocene Catalysts via a Dual-Path Neural Network and Local Feature Learning.","authors":"Jingyu Feng,Yao Qin,Tao Yang,Yufan Fan,Yiyi Zhang,Guifa Huang,Xiang Xiao,Dechao Chen,Shuangliang Zhao,Zengxi Wei","doi":"10.1021/acs.jcim.5c03182","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c03182","url":null,"abstract":"Metallocene catalysts, distinguished by their well-defined active centers and tunable coordination geometries, are pivotal in the homopolymerization of propylene to produce polypropylene with tailored properties. However, the rational design of such catalysts remains challenging due to the complex coupling between ligand structures and polymerization conditions. Conventional trial-and-error approaches are inefficient, while existing machine learning (ML) models often overlook critical ligand descriptors, limiting their generalization for industrial use. To address this, we developed a hybrid ML framework that integrates both reaction parameters and catalyst structural features. A dual-path neural network processes numerical and categorical inputs separately to avoid feature semantic distortion, enabling accurate predictions of catalyst activity (R2 = 0.9201) and number-average molecular weight (R2 = 0.9133). For the narrow molecular weight distribution typical of metallocene-derived polypropylene─a characteristic leading to compact, locally correlated data─a k-nearest neighbor regression model achieved superior performance (R2 = 0.9766) by effectively capturing local sample relationships. Both models outperformed eight other benchmark ML algorithms across all metrics. This work provides a robust, interpretable computational strategy for linking catalyst chemistry to polymer properties, offering a practical tool for the targeted design and scalable application of high-performance polypropylene materials.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"57 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-19DOI: 10.1021/acs.jcim.6c00023
Rui Wu,Hui Zhang,Li-Rong Zhang,Zheng Zhang,Quan Zou,Li Liu
i-Motif (iM), a quadruplex structure formed by C-rich DNA sequences under acidic conditions, is significant for gene expression regulation, telomere stability, and cancer development. Traditional experimental methods for detecting iMs, such as circular dichroism (CD) spectroscopy and nuclear magnetic resonance (NMR), are limited by high costs and low throughput. Existing computational models relying on manual feature extraction struggle to capture complex sequence-structure relationships underlying iM formation. We introduce DeepIM, a novel deep learning model that integrates a channel-spatial attention (CSA) mechanism with a Transformer architecture to predict iM folding status with high accuracy and interpretability. DeepIM encodes DNA sequences into k-mers, using embedding and positional encoding layers to retain semantic and spatial sequence information. The CSA mechanism, where channel attention focuses on C-tracts and spatial attention targets on flanking regions─extracts local features, while the Transformer models long-range dependencies. Trained and tested on a data set of over 750,000 sequences, DeepIM achieves 92.6% accuracy, outperforming traditional methods such as XGBoost (86.0%) and random forest (87.0%), as well as the state-of-the-art computational tool, iM-Seeker (90.3%). DeepIM also demonstrates strong cross-cell-line generalization and the ability to identify distinctive iM sequence patterns, as proven by attention weight analysis and ablation experiments. Overall, DeepIM advances DNA secondary structure prediction by leveraging deep learning to understand complex sequence-structure relationships.
{"title":"DeepIM: Integrating Channel-Spatial Attention with Transformer for DNA i-Motif Folding Status Prediction.","authors":"Rui Wu,Hui Zhang,Li-Rong Zhang,Zheng Zhang,Quan Zou,Li Liu","doi":"10.1021/acs.jcim.6c00023","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00023","url":null,"abstract":"i-Motif (iM), a quadruplex structure formed by C-rich DNA sequences under acidic conditions, is significant for gene expression regulation, telomere stability, and cancer development. Traditional experimental methods for detecting iMs, such as circular dichroism (CD) spectroscopy and nuclear magnetic resonance (NMR), are limited by high costs and low throughput. Existing computational models relying on manual feature extraction struggle to capture complex sequence-structure relationships underlying iM formation. We introduce DeepIM, a novel deep learning model that integrates a channel-spatial attention (CSA) mechanism with a Transformer architecture to predict iM folding status with high accuracy and interpretability. DeepIM encodes DNA sequences into k-mers, using embedding and positional encoding layers to retain semantic and spatial sequence information. The CSA mechanism, where channel attention focuses on C-tracts and spatial attention targets on flanking regions─extracts local features, while the Transformer models long-range dependencies. Trained and tested on a data set of over 750,000 sequences, DeepIM achieves 92.6% accuracy, outperforming traditional methods such as XGBoost (86.0%) and random forest (87.0%), as well as the state-of-the-art computational tool, iM-Seeker (90.3%). DeepIM also demonstrates strong cross-cell-line generalization and the ability to identify distinctive iM sequence patterns, as proven by attention weight analysis and ablation experiments. Overall, DeepIM advances DNA secondary structure prediction by leveraging deep learning to understand complex sequence-structure relationships.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"20 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-19DOI: 10.1021/acs.jcim.6c00133
Matias Chiappinelli,Tadeo E Saldaño,Silvio C E Tosatto,Sergei Grudinin,Gustavo Parisi,Sebastian Fernandez-Alberti
Tandem repeat proteins (TRPs) are composed of arrays of repeating structural units that assemble into extended, superhelical, or horseshoe-shaped architectures stabilized primarily by short-range interactions. The unique sequence-structure-dynamics-function relationships of TRPs have been the subject of extensive investigation, aiming to elucidate the molecular principles that distinguish them from globular proteins. Here we explore the effects of mutations on conformational mechanics of PR65, the HEAT-repeat scaffold of phosphatase PP2A that acts as an elastic connector between catalytic and regulatory subunits. We found that the effect of mutations on dynamics, that is associated with the collective conformational changes experienced by PR65 in its binding to the catalytic subunit, correlates with its evolutionary conservation. Besides, our study reveals a common pattern among repeat units in how mutations influence these dynamics, but it also highlights functional differences among the individual units. That is, mutations on individual units preserve a common influence on the collective dynamics of the TRP but their individual participation in function introduces additional differences in their corresponding effects of mutations. Finally, none of these aspects are observed for the subsequent conformational changes experienced during the binding of the dimer PR65-catalytic subunit complex with the regulatory subunit. We believe this work highlights both the similarities and differences between repeat units in how mutations affect their dynamics─insights that may advance our understanding of TRP mechanisms in pathogenicity─enable scaffold modifications for engineered ligand binding with diverse applications, and broadly expand our knowledge of TRP function.
{"title":"Effects of Mutations on Tandem-Repeat Proteins Conformation Mechanisms. Application to the Phosphatase PP2A.","authors":"Matias Chiappinelli,Tadeo E Saldaño,Silvio C E Tosatto,Sergei Grudinin,Gustavo Parisi,Sebastian Fernandez-Alberti","doi":"10.1021/acs.jcim.6c00133","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00133","url":null,"abstract":"Tandem repeat proteins (TRPs) are composed of arrays of repeating structural units that assemble into extended, superhelical, or horseshoe-shaped architectures stabilized primarily by short-range interactions. The unique sequence-structure-dynamics-function relationships of TRPs have been the subject of extensive investigation, aiming to elucidate the molecular principles that distinguish them from globular proteins. Here we explore the effects of mutations on conformational mechanics of PR65, the HEAT-repeat scaffold of phosphatase PP2A that acts as an elastic connector between catalytic and regulatory subunits. We found that the effect of mutations on dynamics, that is associated with the collective conformational changes experienced by PR65 in its binding to the catalytic subunit, correlates with its evolutionary conservation. Besides, our study reveals a common pattern among repeat units in how mutations influence these dynamics, but it also highlights functional differences among the individual units. That is, mutations on individual units preserve a common influence on the collective dynamics of the TRP but their individual participation in function introduces additional differences in their corresponding effects of mutations. Finally, none of these aspects are observed for the subsequent conformational changes experienced during the binding of the dimer PR65-catalytic subunit complex with the regulatory subunit. We believe this work highlights both the similarities and differences between repeat units in how mutations affect their dynamics─insights that may advance our understanding of TRP mechanisms in pathogenicity─enable scaffold modifications for engineered ligand binding with diverse applications, and broadly expand our knowledge of TRP function.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"12 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-19DOI: 10.1021/acs.jcim.5c02883
Keumseok Kang,Mingyeol Kim,Juseong Kim,Sanghun Sel,Giltae Song
Identifying protein-ligand binding residues is fundamental to unlocking molecular recognition and advancing therapeutic development. Sequence-based deep learning models for predicting protein-ligand binding residues have gained attention due to their scalability and ability to operate without relying on structural information. However, most existing methods primarily focus on protein sequence information without considering ligand information, even though binding residues are inherently defined through interactions with specific ligands. To address this, we propose a ligand-aware sequence-based binding residue prediction model that explicitly incorporates both residue-level information from protein sequences and ligand information. The proposed model achieved significant improvements in the prediction of ligand-binding residues, outperforming both existing sequence-based and structure-based baselines. Furthermore, pockets defined by the ligand-binding residues predicted by our model led to a stronger and more stable binding affinity compared to existing tools. These results demonstrate that our model shows significant potential for applications in virtual screening and drug discovery. Our source code is publicly available at https://github.com/GoldRiver0/LiBRe.
{"title":"LiBRe: A Ligand-Aware Sequence-Based Binding Residue Prediction Model for Virtual Screening.","authors":"Keumseok Kang,Mingyeol Kim,Juseong Kim,Sanghun Sel,Giltae Song","doi":"10.1021/acs.jcim.5c02883","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02883","url":null,"abstract":"Identifying protein-ligand binding residues is fundamental to unlocking molecular recognition and advancing therapeutic development. Sequence-based deep learning models for predicting protein-ligand binding residues have gained attention due to their scalability and ability to operate without relying on structural information. However, most existing methods primarily focus on protein sequence information without considering ligand information, even though binding residues are inherently defined through interactions with specific ligands. To address this, we propose a ligand-aware sequence-based binding residue prediction model that explicitly incorporates both residue-level information from protein sequences and ligand information. The proposed model achieved significant improvements in the prediction of ligand-binding residues, outperforming both existing sequence-based and structure-based baselines. Furthermore, pockets defined by the ligand-binding residues predicted by our model led to a stronger and more stable binding affinity compared to existing tools. These results demonstrate that our model shows significant potential for applications in virtual screening and drug discovery. Our source code is publicly available at https://github.com/GoldRiver0/LiBRe.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"13 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-18DOI: 10.1021/acs.jcim.6c00007
Marta S P Batista,Miguel Machuqueiro,Bruno L Victor
Molecular dynamics (MD) simulations are a powerful tool for characterizing membrane-protein dynamics, yet their predictive accuracy critically depends on the choice of force field and membrane representation. Here, we present a systematic benchmark of the AMBER 14SB and CHARMM 36 m force fields across multiple bilayer sizes, using human aquaporin-7 (aquaglyceroporin-7; hAQP7) as a representative membrane protein system. Both force fields maintained global structural integrity, but differed markedly in their dynamic profiles: CHARMM 36 m sampled a broader conformational space and produced more hydrated pore profiles, whereas AMBER 14SB favored conformations closer to the crystallographic structure. Lipid organization and packing also diverged, with CHARMM generating more compact bilayers and AMBER yielding larger areas per lipid. The membrane size exerted minimal influence on the structural or functional descriptors, supporting the use of smaller, computationally efficient membrane patches for equilibrium simulations. The hAQP7 monomers functioned independently, without detectable cooperativity under the simulated conditions. Collectively, these results highlight the substantial impact of force-field selection on aquaporin dynamics and provide practical guidance for designing accurate MD simulations of transmembrane protein channels.
{"title":"Force Field and Membrane Patch Size Effects on Atomistic Models of Aquaporin-7.","authors":"Marta S P Batista,Miguel Machuqueiro,Bruno L Victor","doi":"10.1021/acs.jcim.6c00007","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00007","url":null,"abstract":"Molecular dynamics (MD) simulations are a powerful tool for characterizing membrane-protein dynamics, yet their predictive accuracy critically depends on the choice of force field and membrane representation. Here, we present a systematic benchmark of the AMBER 14SB and CHARMM 36 m force fields across multiple bilayer sizes, using human aquaporin-7 (aquaglyceroporin-7; hAQP7) as a representative membrane protein system. Both force fields maintained global structural integrity, but differed markedly in their dynamic profiles: CHARMM 36 m sampled a broader conformational space and produced more hydrated pore profiles, whereas AMBER 14SB favored conformations closer to the crystallographic structure. Lipid organization and packing also diverged, with CHARMM generating more compact bilayers and AMBER yielding larger areas per lipid. The membrane size exerted minimal influence on the structural or functional descriptors, supporting the use of smaller, computationally efficient membrane patches for equilibrium simulations. The hAQP7 monomers functioned independently, without detectable cooperativity under the simulated conditions. Collectively, these results highlight the substantial impact of force-field selection on aquaporin dynamics and provide practical guidance for designing accurate MD simulations of transmembrane protein channels.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"44 1 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147478789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-18DOI: 10.1021/acs.jcim.5c02917
Ruben Sharma, Ross D King
We introduce the first formal large-scale assessment of the utility of traditional chemical functional groups as used in chemical explanations. Our assessment employs a fundamental principle from computational learning theory: a good compression of data should reveal a good explanation. We introduce an unsupervised learning algorithm based on the Minimum Message Length (MML) principle that searches for substructures that compress around three million biologically relevant molecules. We demonstrate that the discovered substructures contain most human-curated functional groups as well as novel larger patterns with more specific functions. We also run our algorithm on 24 specific bioactivity prediction data sets to discover data set-specific functional groups. Fingerprints constructed from data set-specific functional groups are shown to significantly outperform other fingerprint representations, including the MACCS and Morgan fingerprint, when training ridge regression models on bioactivity regression tasks.
{"title":"Compressing Chemistry Reveals Functional Groups.","authors":"Ruben Sharma, Ross D King","doi":"10.1021/acs.jcim.5c02917","DOIUrl":"10.1021/acs.jcim.5c02917","url":null,"abstract":"<p><p>We introduce the first formal large-scale assessment of the utility of traditional chemical functional groups as used in chemical explanations. Our assessment employs a fundamental principle from computational learning theory: a good compression of data should reveal a good explanation. We introduce an unsupervised learning algorithm based on the Minimum Message Length (MML) principle that searches for substructures that compress around three million biologically relevant molecules. We demonstrate that the discovered substructures contain most human-curated functional groups as well as novel larger patterns with more specific functions. We also run our algorithm on 24 specific bioactivity prediction data sets to discover data set-specific functional groups. Fingerprints constructed from data set-specific functional groups are shown to significantly outperform other fingerprint representations, including the MACCS and Morgan fingerprint, when training ridge regression models on bioactivity regression tasks.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147479079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-18DOI: 10.1021/acs.jcim.5c02581
Karina Solórzano-Acevedo,Carlos Zepactonal Gómez-Castro,Emma Martín Rodríguez,Álvaro Artiga,Rosa M Quispe-Siccha,Mónica Corea,Itzia I Padilla-Martínez
The therapeutic potential of many new drugs is limited by poor aqueous solubility. This work addresses the solubilization improvement of novel fluorescent dihydropyrazole-carbohydrazide derivatives (DPCH), with proven antiproliferative activity against human breast cancer, through encapsulation in three distinct methoxy poly(ethylene glycol)-poly(ε-caprolactone) (mPEG-PCL) diblock copolymers. All-atom molecular dynamics simulations (100 ns, CHARMM36 force field, NAMD) of 52 distinct configurations revealed favorable interactions between DPCHs and PCL residues, resulting in the formation of micellar supramolecular assemblies with PEG coronae that facilitate enhanced DPCH water solvation. Systematic evaluation of micelle size, composition (5, 14, and 21 copolymer strands; 20, 30, 40, 56, 84, 112, and 168 drug molecules), and hydrophobic chain length (PCL 1k, 2k, and 5k) through radial distribution functions, radius of gyration, solvent accessibility, RMSD analysis, and interaction energy calculations identified optimal encapsulation conditions. Regardless of the DPCH derivative tested, mPEG2k-PCL5k produced the most stable, monodisperse micelle populations with the highest loading efficiency. Molecular docking calculations further confirmed strong drug-polymer affinity. Experimental validation through nanoparticle synthesis and characterization via dynamic light scattering, zeta potential measurements, cryogenic transmission electron microscopy, and fluorescence microscopy confirmed successful self-assembly with entrapment efficiencies up to 97% and internalization of loaded micelles into cancer cells. These findings demonstrate that mPEG-PCL micelles are potential carriers for DPCH derivatives, as computational predictions closely align with experimental data.
{"title":"Investigation of Novel Antiproliferative Drugs Interaction with mPEG2k-PCLy Copolymers Using Molecular Dynamics Simulation Approach.","authors":"Karina Solórzano-Acevedo,Carlos Zepactonal Gómez-Castro,Emma Martín Rodríguez,Álvaro Artiga,Rosa M Quispe-Siccha,Mónica Corea,Itzia I Padilla-Martínez","doi":"10.1021/acs.jcim.5c02581","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02581","url":null,"abstract":"The therapeutic potential of many new drugs is limited by poor aqueous solubility. This work addresses the solubilization improvement of novel fluorescent dihydropyrazole-carbohydrazide derivatives (DPCH), with proven antiproliferative activity against human breast cancer, through encapsulation in three distinct methoxy poly(ethylene glycol)-poly(ε-caprolactone) (mPEG-PCL) diblock copolymers. All-atom molecular dynamics simulations (100 ns, CHARMM36 force field, NAMD) of 52 distinct configurations revealed favorable interactions between DPCHs and PCL residues, resulting in the formation of micellar supramolecular assemblies with PEG coronae that facilitate enhanced DPCH water solvation. Systematic evaluation of micelle size, composition (5, 14, and 21 copolymer strands; 20, 30, 40, 56, 84, 112, and 168 drug molecules), and hydrophobic chain length (PCL 1k, 2k, and 5k) through radial distribution functions, radius of gyration, solvent accessibility, RMSD analysis, and interaction energy calculations identified optimal encapsulation conditions. Regardless of the DPCH derivative tested, mPEG2k-PCL5k produced the most stable, monodisperse micelle populations with the highest loading efficiency. Molecular docking calculations further confirmed strong drug-polymer affinity. Experimental validation through nanoparticle synthesis and characterization via dynamic light scattering, zeta potential measurements, cryogenic transmission electron microscopy, and fluorescence microscopy confirmed successful self-assembly with entrapment efficiencies up to 97% and internalization of loaded micelles into cancer cells. These findings demonstrate that mPEG-PCL micelles are potential carriers for DPCH derivatives, as computational predictions closely align with experimental data.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"1 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147478791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-18DOI: 10.1021/acs.jcim.6c00633
Stephan Schott-Verdugo,Holger Gohlke
Large-language models (LLMs) have rapidly become essential in software engineering, evolving from simple code suggestion tools to autonomous agents that directly read, modify, compile, and test local code bases. Recent LLMs perform well in software engineering benchmarks, showing good performance on complex multifile projects, generating new options for improving and developing bio- and chemoinformatic tools. We showcase this capability with the AMBER molecular dynamics suite, where the setup program LEaP suffered an O(N2) merge routine and a 32-bit integer overflow, limiting simulation systems to ∼6 million atoms. By using an LLM, we implemented an optimized unit merge algorithm and 64-bit indexing, cutting the parametrization time by more than 10-fold for mid-sized systems and allowing one to parametrize multimillion-molecule systems. This case illustrates how natural scientists can make use of LLM agents to modernize, optimize, and develop computational (bio)chemistry tools while also raising new challenges for software provenance and developer roles.
{"title":"Chat-Driven Computational (Bio)chemistry: Using LLM Agents to Accelerate Bio- and Chemoinformatics.","authors":"Stephan Schott-Verdugo,Holger Gohlke","doi":"10.1021/acs.jcim.6c00633","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00633","url":null,"abstract":"Large-language models (LLMs) have rapidly become essential in software engineering, evolving from simple code suggestion tools to autonomous agents that directly read, modify, compile, and test local code bases. Recent LLMs perform well in software engineering benchmarks, showing good performance on complex multifile projects, generating new options for improving and developing bio- and chemoinformatic tools. We showcase this capability with the AMBER molecular dynamics suite, where the setup program LEaP suffered an O(N2) merge routine and a 32-bit integer overflow, limiting simulation systems to ∼6 million atoms. By using an LLM, we implemented an optimized unit merge algorithm and 64-bit indexing, cutting the parametrization time by more than 10-fold for mid-sized systems and allowing one to parametrize multimillion-molecule systems. This case illustrates how natural scientists can make use of LLM agents to modernize, optimize, and develop computational (bio)chemistry tools while also raising new challenges for software provenance and developer roles.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"13 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147478793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The accurate identification of cytochrome P450 (CYP) substrates is crucial in drug discovery and safety assessment, as these enzymes mediate the metabolism of most clinical drugs. However, existing computational models are often limited by data quality issues and lack the ability to quantify prediction uncertainty, hindering their reliable application. To address these challenges, we present EviCYP, a novel prediction framework that integrates evidential deep learning with vector quantization (VQ). We first constructed a high-quality data set by curating 4388 substrates and 2880 nonsubstrates from 1629 publications, and supplemented it with 3728 pseudonegative samples, resulting in 10,996 samples spanning nine major CYP isoforms. The EviCYP architecture processes multimodal molecular representations and enzyme sequences through dedicated encoders, compresses features via VQ to reduce redundancy, and employs an evidential layer to output both class probabilities and an uncertainty estimate. On an internal test set, EviCYP achieved an average AUROC of 0.9500. Notably, the model's uncertainty quantification is highly reliable, with high-uncertainty predictions strongly correlating with classification errors. This work provides a robust and trustworthy computational tool for CYP substrate prediction.
{"title":"EviCYP: In Silico Prediction of Cytochrome P450 Substrates Based on Vector Quantization and Evidential Deep Learning.","authors":"Yingjie Yang,Yuxin Zhang,Wenxiang Song,Keyun Zhu,Xinmin Li,Mengyu Tong,Guixia Liu,Weihua Li,Yun Tang","doi":"10.1021/acs.jcim.6c00074","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00074","url":null,"abstract":"The accurate identification of cytochrome P450 (CYP) substrates is crucial in drug discovery and safety assessment, as these enzymes mediate the metabolism of most clinical drugs. However, existing computational models are often limited by data quality issues and lack the ability to quantify prediction uncertainty, hindering their reliable application. To address these challenges, we present EviCYP, a novel prediction framework that integrates evidential deep learning with vector quantization (VQ). We first constructed a high-quality data set by curating 4388 substrates and 2880 nonsubstrates from 1629 publications, and supplemented it with 3728 pseudonegative samples, resulting in 10,996 samples spanning nine major CYP isoforms. The EviCYP architecture processes multimodal molecular representations and enzyme sequences through dedicated encoders, compresses features via VQ to reduce redundancy, and employs an evidential layer to output both class probabilities and an uncertainty estimate. On an internal test set, EviCYP achieved an average AUROC of 0.9500. Notably, the model's uncertainty quantification is highly reliable, with high-uncertainty predictions strongly correlating with classification errors. This work provides a robust and trustworthy computational tool for CYP substrate prediction.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"190 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147471640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}