Journal of Chemical Information and Modeling 最新文献_第5页

Integration of DOPtools and CADS in a Web-Based User Interface for Structural Descriptor Calculation, Model Optimization, and Prediction. 基于web的结构描述符计算、模型优化和预测用户界面中DOPtools和CADS的集成。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-03-19 DOI: 10.1021/acs.jcim.5c03055

Philippe Gantzer,Micke Kuwahara,Keisuke Takahashi,Pavel Sidorov

Quantitative structure-property relationship (QSPR) modeling often requires navigating fragmented tools for descriptor calculation and model optimization. We present a major evolution of the CADS platform through the seamless integration of DOPtools, a specialized Python library for molecular descriptor calculation and model building. These additions streamline the handling of molecular data and QSPR modeling, allowing users to input both numerical features and text-encoded chemical structures to build predictive models. Key enhancements include automated hyperparameter optimization; bulk prediction capabilities; and, especially, model transparency via ColorAtom, which provides intuitive, atom-centered visualizations of model logic. By bridging this gap, the platform now offers an accessible yet powerful environment for leveraging both public and proprietary chemical data.

定量结构属性关系（QSPR）建模通常需要导航碎片化工具来进行描述符计算和模型优化。我们通过无缝集成DOPtools（一个专门用于分子描述符计算和模型构建的Python库），对CADS平台进行了重大改进。这些新增功能简化了分子数据和QSPR建模的处理，允许用户输入数值特征和文本编码的化学结构来构建预测模型。关键的增强包括自动超参数优化；批量预测能力；特别是通过ColorAtom实现的模型透明性，它提供了直观的、以原子为中心的模型逻辑可视化。通过缩小这一差距，该平台现在为利用公共和专有化学数据提供了一个可访问但功能强大的环境。

引用次数: 0

Accurate Prediction of Polymerization Performance for Metallocene Catalysts via a Dual-Path Neural Network and Local Feature Learning. 基于双路径神经网络和局部特征学习的茂金属催化剂聚合性能精确预测。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-03-19 DOI: 10.1021/acs.jcim.5c03182

Jingyu Feng,Yao Qin,Tao Yang,Yufan Fan,Yiyi Zhang,Guifa Huang,Xiang Xiao,Dechao Chen,Shuangliang Zhao,Zengxi Wei

Metallocene catalysts, distinguished by their well-defined active centers and tunable coordination geometries, are pivotal in the homopolymerization of propylene to produce polypropylene with tailored properties. However, the rational design of such catalysts remains challenging due to the complex coupling between ligand structures and polymerization conditions. Conventional trial-and-error approaches are inefficient, while existing machine learning (ML) models often overlook critical ligand descriptors, limiting their generalization for industrial use. To address this, we developed a hybrid ML framework that integrates both reaction parameters and catalyst structural features. A dual-path neural network processes numerical and categorical inputs separately to avoid feature semantic distortion, enabling accurate predictions of catalyst activity (R2 = 0.9201) and number-average molecular weight (R2 = 0.9133). For the narrow molecular weight distribution typical of metallocene-derived polypropylene─a characteristic leading to compact, locally correlated data─a k-nearest neighbor regression model achieved superior performance (R2 = 0.9766) by effectively capturing local sample relationships. Both models outperformed eight other benchmark ML algorithms across all metrics. This work provides a robust, interpretable computational strategy for linking catalyst chemistry to polymer properties, offering a practical tool for the targeted design and scalable application of high-performance polypropylene materials.

茂金属催化剂以其明确的活性中心和可调节的配位几何形状而著称，在丙烯均聚生产具有特定性能的聚丙烯中起着关键作用。然而，由于配体结构和聚合条件之间的复杂耦合，这种催化剂的合理设计仍然具有挑战性。传统的试错方法效率低下，而现有的机器学习（ML）模型往往忽略了关键的配体描述符，限制了它们在工业应用中的泛化。为了解决这个问题，我们开发了一个混合ML框架，它集成了反应参数和催化剂结构特征。双路径神经网络分别处理数值和分类输入以避免特征语义失真，从而能够准确预测催化剂活性（R2 = 0.9201）和数平均分子量（R2 = 0.9133）。对于茂金属衍生聚丙烯典型的窄分子量分布──这是导致数据紧凑、局部相关的特征──k-最近邻回归模型通过有效捕获局部样本关系获得了优异的性能（R2 = 0.9766）。这两种模型在所有指标上都优于其他8种基准ML算法。这项工作为将催化剂化学与聚合物性质联系起来提供了一个强大的、可解释的计算策略，为高性能聚丙烯材料的定向设计和可扩展应用提供了一个实用工具。

{"title":"Accurate Prediction of Polymerization Performance for Metallocene Catalysts via a Dual-Path Neural Network and Local Feature Learning.","authors":"Jingyu Feng,Yao Qin,Tao Yang,Yufan Fan,Yiyi Zhang,Guifa Huang,Xiang Xiao,Dechao Chen,Shuangliang Zhao,Zengxi Wei","doi":"10.1021/acs.jcim.5c03182","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c03182","url":null,"abstract":"Metallocene catalysts, distinguished by their well-defined active centers and tunable coordination geometries, are pivotal in the homopolymerization of propylene to produce polypropylene with tailored properties. However, the rational design of such catalysts remains challenging due to the complex coupling between ligand structures and polymerization conditions. Conventional trial-and-error approaches are inefficient, while existing machine learning (ML) models often overlook critical ligand descriptors, limiting their generalization for industrial use. To address this, we developed a hybrid ML framework that integrates both reaction parameters and catalyst structural features. A dual-path neural network processes numerical and categorical inputs separately to avoid feature semantic distortion, enabling accurate predictions of catalyst activity (R2 = 0.9201) and number-average molecular weight (R2 = 0.9133). For the narrow molecular weight distribution typical of metallocene-derived polypropylene─a characteristic leading to compact, locally correlated data─a k-nearest neighbor regression model achieved superior performance (R2 = 0.9766) by effectively capturing local sample relationships. Both models outperformed eight other benchmark ML algorithms across all metrics. This work provides a robust, interpretable computational strategy for linking catalyst chemistry to polymer properties, offering a practical tool for the targeted design and scalable application of high-performance polypropylene materials.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"57 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepIM: Integrating Channel-Spatial Attention with Transformer for DNA i-Motif Folding Status Prediction. 融合通道空间注意与变压器的DNA i-Motif折叠状态预测。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-03-19 DOI: 10.1021/acs.jcim.6c00023

Rui Wu,Hui Zhang,Li-Rong Zhang,Zheng Zhang,Quan Zou,Li Liu

i-Motif (iM), a quadruplex structure formed by C-rich DNA sequences under acidic conditions, is significant for gene expression regulation, telomere stability, and cancer development. Traditional experimental methods for detecting iMs, such as circular dichroism (CD) spectroscopy and nuclear magnetic resonance (NMR), are limited by high costs and low throughput. Existing computational models relying on manual feature extraction struggle to capture complex sequence-structure relationships underlying iM formation. We introduce DeepIM, a novel deep learning model that integrates a channel-spatial attention (CSA) mechanism with a Transformer architecture to predict iM folding status with high accuracy and interpretability. DeepIM encodes DNA sequences into k-mers, using embedding and positional encoding layers to retain semantic and spatial sequence information. The CSA mechanism, where channel attention focuses on C-tracts and spatial attention targets on flanking regions─extracts local features, while the Transformer models long-range dependencies. Trained and tested on a data set of over 750,000 sequences, DeepIM achieves 92.6% accuracy, outperforming traditional methods such as XGBoost (86.0%) and random forest (87.0%), as well as the state-of-the-art computational tool, iM-Seeker (90.3%). DeepIM also demonstrates strong cross-cell-line generalization and the ability to identify distinctive iM sequence patterns, as proven by attention weight analysis and ablation experiments. Overall, DeepIM advances DNA secondary structure prediction by leveraging deep learning to understand complex sequence-structure relationships.

i-Motif （iM）是富含c的DNA序列在酸性条件下形成的四重结构，对基因表达调控、端粒稳定性和癌症发展具有重要意义。传统的检测iMs的实验方法，如圆二色光谱（CD）和核磁共振（NMR），受到高成本和低通量的限制。现有的计算模型依赖于手动特征提取，难以捕获信息背后复杂的序列结构关系。我们介绍了一种新的深度学习模型DeepIM，它将通道空间注意（CSA）机制与Transformer架构集成在一起，以高精度和可解释性预测iM折叠状态。DeepIM将DNA序列编码为k-mers，使用嵌入层和位置编码层来保留语义和空间序列信息。CSA机制（通道注意力集中在c束，空间注意力集中在两侧区域）提取局部特征，而Transformer模型则建立了长期依赖关系。在超过75万个序列的数据集上进行训练和测试，DeepIM达到了92.6%的准确率，优于传统方法，如XGBoost（86.0%）和随机森林（87.0%），以及最先进的计算工具iM-Seeker（90.3%）。正如注意力权重分析和消融实验所证明的那样，DeepIM还显示出强大的跨细胞系泛化和识别独特iM序列模式的能力。总体而言，DeepIM通过利用深度学习来理解复杂的序列结构关系，从而推进了DNA二级结构预测。

{"title":"DeepIM: Integrating Channel-Spatial Attention with Transformer for DNA i-Motif Folding Status Prediction.","authors":"Rui Wu,Hui Zhang,Li-Rong Zhang,Zheng Zhang,Quan Zou,Li Liu","doi":"10.1021/acs.jcim.6c00023","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00023","url":null,"abstract":"i-Motif (iM), a quadruplex structure formed by C-rich DNA sequences under acidic conditions, is significant for gene expression regulation, telomere stability, and cancer development. Traditional experimental methods for detecting iMs, such as circular dichroism (CD) spectroscopy and nuclear magnetic resonance (NMR), are limited by high costs and low throughput. Existing computational models relying on manual feature extraction struggle to capture complex sequence-structure relationships underlying iM formation. We introduce DeepIM, a novel deep learning model that integrates a channel-spatial attention (CSA) mechanism with a Transformer architecture to predict iM folding status with high accuracy and interpretability. DeepIM encodes DNA sequences into k-mers, using embedding and positional encoding layers to retain semantic and spatial sequence information. The CSA mechanism, where channel attention focuses on C-tracts and spatial attention targets on flanking regions─extracts local features, while the Transformer models long-range dependencies. Trained and tested on a data set of over 750,000 sequences, DeepIM achieves 92.6% accuracy, outperforming traditional methods such as XGBoost (86.0%) and random forest (87.0%), as well as the state-of-the-art computational tool, iM-Seeker (90.3%). DeepIM also demonstrates strong cross-cell-line generalization and the ability to identify distinctive iM sequence patterns, as proven by attention weight analysis and ablation experiments. Overall, DeepIM advances DNA secondary structure prediction by leveraging deep learning to understand complex sequence-structure relationships.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"20 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Effects of Mutations on Tandem-Repeat Proteins Conformation Mechanisms. Application to the Phosphatase PP2A. 突变对串联重复序列蛋白构象机制的影响。磷酸酶PP2A的应用。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-03-19 DOI: 10.1021/acs.jcim.6c00133

Matias Chiappinelli,Tadeo E Saldaño,Silvio C E Tosatto,Sergei Grudinin,Gustavo Parisi,Sebastian Fernandez-Alberti

Tandem repeat proteins (TRPs) are composed of arrays of repeating structural units that assemble into extended, superhelical, or horseshoe-shaped architectures stabilized primarily by short-range interactions. The unique sequence-structure-dynamics-function relationships of TRPs have been the subject of extensive investigation, aiming to elucidate the molecular principles that distinguish them from globular proteins. Here we explore the effects of mutations on conformational mechanics of PR65, the HEAT-repeat scaffold of phosphatase PP2A that acts as an elastic connector between catalytic and regulatory subunits. We found that the effect of mutations on dynamics, that is associated with the collective conformational changes experienced by PR65 in its binding to the catalytic subunit, correlates with its evolutionary conservation. Besides, our study reveals a common pattern among repeat units in how mutations influence these dynamics, but it also highlights functional differences among the individual units. That is, mutations on individual units preserve a common influence on the collective dynamics of the TRP but their individual participation in function introduces additional differences in their corresponding effects of mutations. Finally, none of these aspects are observed for the subsequent conformational changes experienced during the binding of the dimer PR65-catalytic subunit complex with the regulatory subunit. We believe this work highlights both the similarities and differences between repeat units in how mutations affect their dynamics─insights that may advance our understanding of TRP mechanisms in pathogenicity─enable scaffold modifications for engineered ligand binding with diverse applications, and broadly expand our knowledge of TRP function.

串联重复序列蛋白（TRPs）由重复结构单元阵列组成，这些重复结构单元组装成扩展的、超螺旋的或马蹄形的结构，主要通过短程相互作用来稳定。TRPs独特的序列-结构-动力学-功能关系已成为广泛研究的主题，旨在阐明区分它们与球状蛋白的分子原理。在这里，我们探索突变对PR65构象机制的影响，PR65是磷酸酶PP2A的HEAT-repeat支架，作为催化和调节亚基之间的弹性连接器。我们发现突变对动力学的影响与PR65与催化亚基结合时所经历的集体构象变化有关，这与它的进化守恒有关。此外，我们的研究揭示了突变如何影响这些动力学的重复单元之间的共同模式，但它也强调了单个单元之间的功能差异。也就是说，单个单位的突变保留了对TRP集体动态的共同影响，但它们对功能的个体参与引入了相应突变效应的额外差异。最后，在二聚体pr65 -催化亚基配合物与调控亚基结合过程中，没有观察到这些方面的后续构象变化。我们相信这项工作强调了重复单元之间在突变如何影响其动力学方面的异同──这些见解可能会促进我们对TRP致病性机制的理解──使支架修饰能够用于多种应用的工程配体结合，并广泛扩展我们对TRP功能的了解。

{"title":"Effects of Mutations on Tandem-Repeat Proteins Conformation Mechanisms. Application to the Phosphatase PP2A.","authors":"Matias Chiappinelli,Tadeo E Saldaño,Silvio C E Tosatto,Sergei Grudinin,Gustavo Parisi,Sebastian Fernandez-Alberti","doi":"10.1021/acs.jcim.6c00133","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00133","url":null,"abstract":"Tandem repeat proteins (TRPs) are composed of arrays of repeating structural units that assemble into extended, superhelical, or horseshoe-shaped architectures stabilized primarily by short-range interactions. The unique sequence-structure-dynamics-function relationships of TRPs have been the subject of extensive investigation, aiming to elucidate the molecular principles that distinguish them from globular proteins. Here we explore the effects of mutations on conformational mechanics of PR65, the HEAT-repeat scaffold of phosphatase PP2A that acts as an elastic connector between catalytic and regulatory subunits. We found that the effect of mutations on dynamics, that is associated with the collective conformational changes experienced by PR65 in its binding to the catalytic subunit, correlates with its evolutionary conservation. Besides, our study reveals a common pattern among repeat units in how mutations influence these dynamics, but it also highlights functional differences among the individual units. That is, mutations on individual units preserve a common influence on the collective dynamics of the TRP but their individual participation in function introduces additional differences in their corresponding effects of mutations. Finally, none of these aspects are observed for the subsequent conformational changes experienced during the binding of the dimer PR65-catalytic subunit complex with the regulatory subunit. We believe this work highlights both the similarities and differences between repeat units in how mutations affect their dynamics─insights that may advance our understanding of TRP mechanisms in pathogenicity─enable scaffold modifications for engineered ligand binding with diverse applications, and broadly expand our knowledge of TRP function.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"12 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LiBRe: A Ligand-Aware Sequence-Based Binding Residue Prediction Model for Virtual Screening. LiBRe：一种基于配体感知序列的虚拟筛选结合残基预测模型。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-03-19 DOI: 10.1021/acs.jcim.5c02883

Keumseok Kang,Mingyeol Kim,Juseong Kim,Sanghun Sel,Giltae Song

Identifying protein-ligand binding residues is fundamental to unlocking molecular recognition and advancing therapeutic development. Sequence-based deep learning models for predicting protein-ligand binding residues have gained attention due to their scalability and ability to operate without relying on structural information. However, most existing methods primarily focus on protein sequence information without considering ligand information, even though binding residues are inherently defined through interactions with specific ligands. To address this, we propose a ligand-aware sequence-based binding residue prediction model that explicitly incorporates both residue-level information from protein sequences and ligand information. The proposed model achieved significant improvements in the prediction of ligand-binding residues, outperforming both existing sequence-based and structure-based baselines. Furthermore, pockets defined by the ligand-binding residues predicted by our model led to a stronger and more stable binding affinity compared to existing tools. These results demonstrate that our model shows significant potential for applications in virtual screening and drug discovery. Our source code is publicly available at https://github.com/GoldRiver0/LiBRe.

鉴定蛋白质配体结合残基是解锁分子识别和推进治疗发展的基础。基于序列的深度学习模型用于预测蛋白质配体结合残基，由于其可扩展性和无需依赖结构信息的操作能力而受到关注。然而，大多数现有方法主要关注蛋白质序列信息，而不考虑配体信息，尽管结合残基本质上是通过与特定配体的相互作用来定义的。为了解决这个问题，我们提出了一个基于配体感知序列的结合残基预测模型，该模型明确地结合了来自蛋白质序列和配体信息的残基水平信息。该模型在预测配体结合残基方面取得了显著的进步，优于现有的基于序列和基于结构的基线。此外，与现有工具相比，由我们的模型预测的配体结合残基定义的口袋具有更强和更稳定的结合亲和力。这些结果表明，我们的模型在虚拟筛选和药物发现方面显示出巨大的应用潜力。我们的源代码可以在https://github.com/GoldRiver0/LiBRe上公开获得。

{"title":"LiBRe: A Ligand-Aware Sequence-Based Binding Residue Prediction Model for Virtual Screening.","authors":"Keumseok Kang,Mingyeol Kim,Juseong Kim,Sanghun Sel,Giltae Song","doi":"10.1021/acs.jcim.5c02883","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02883","url":null,"abstract":"Identifying protein-ligand binding residues is fundamental to unlocking molecular recognition and advancing therapeutic development. Sequence-based deep learning models for predicting protein-ligand binding residues have gained attention due to their scalability and ability to operate without relying on structural information. However, most existing methods primarily focus on protein sequence information without considering ligand information, even though binding residues are inherently defined through interactions with specific ligands. To address this, we propose a ligand-aware sequence-based binding residue prediction model that explicitly incorporates both residue-level information from protein sequences and ligand information. The proposed model achieved significant improvements in the prediction of ligand-binding residues, outperforming both existing sequence-based and structure-based baselines. Furthermore, pockets defined by the ligand-binding residues predicted by our model led to a stronger and more stable binding affinity compared to existing tools. These results demonstrate that our model shows significant potential for applications in virtual screening and drug discovery. Our source code is publicly available at https://github.com/GoldRiver0/LiBRe.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"13 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Force Field and Membrane Patch Size Effects on Atomistic Models of Aquaporin-7. 力场和膜片尺寸对水通道蛋白-7原子模型的影响。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-03-18 DOI: 10.1021/acs.jcim.6c00007

Marta S P Batista,Miguel Machuqueiro,Bruno L Victor

Molecular dynamics (MD) simulations are a powerful tool for characterizing membrane-protein dynamics, yet their predictive accuracy critically depends on the choice of force field and membrane representation. Here, we present a systematic benchmark of the AMBER 14SB and CHARMM 36 m force fields across multiple bilayer sizes, using human aquaporin-7 (aquaglyceroporin-7; hAQP7) as a representative membrane protein system. Both force fields maintained global structural integrity, but differed markedly in their dynamic profiles: CHARMM 36 m sampled a broader conformational space and produced more hydrated pore profiles, whereas AMBER 14SB favored conformations closer to the crystallographic structure. Lipid organization and packing also diverged, with CHARMM generating more compact bilayers and AMBER yielding larger areas per lipid. The membrane size exerted minimal influence on the structural or functional descriptors, supporting the use of smaller, computationally efficient membrane patches for equilibrium simulations. The hAQP7 monomers functioned independently, without detectable cooperativity under the simulated conditions. Collectively, these results highlight the substantial impact of force-field selection on aquaporin dynamics and provide practical guidance for designing accurate MD simulations of transmembrane protein channels.

分子动力学（MD）模拟是表征膜-蛋白动力学的有力工具，但其预测准确性严重依赖于力场和膜表示的选择。在这里，我们以人类水通道蛋白-7 （aquaglyceroporin-7; hAQP7）为代表的膜蛋白系统，提出了AMBER 14SB和CHARMM 36 m力场跨多个双层尺寸的系统基准。两种力场都保持了整体结构的完整性，但在动态剖面上存在明显差异：CHARMM 36 m取样的构象空间更宽，产生的水合孔隙剖面更多，而AMBER 14SB倾向于更接近晶体结构的构象。脂质组织和堆积也有差异，CHARMM产生更紧密的双分子层，而AMBER每个脂质产生更大的面积。膜尺寸对结构或功能描述符的影响最小，支持使用更小、计算效率更高的膜贴片进行平衡模拟。在模拟条件下，hAQP7单体独立发挥作用，没有检测到协同性。总之，这些结果突出了力场选择对水通道蛋白动力学的实质性影响，并为设计跨膜蛋白通道的精确MD模拟提供了实用指导。

{"title":"Force Field and Membrane Patch Size Effects on Atomistic Models of Aquaporin-7.","authors":"Marta S P Batista,Miguel Machuqueiro,Bruno L Victor","doi":"10.1021/acs.jcim.6c00007","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00007","url":null,"abstract":"Molecular dynamics (MD) simulations are a powerful tool for characterizing membrane-protein dynamics, yet their predictive accuracy critically depends on the choice of force field and membrane representation. Here, we present a systematic benchmark of the AMBER 14SB and CHARMM 36 m force fields across multiple bilayer sizes, using human aquaporin-7 (aquaglyceroporin-7; hAQP7) as a representative membrane protein system. Both force fields maintained global structural integrity, but differed markedly in their dynamic profiles: CHARMM 36 m sampled a broader conformational space and produced more hydrated pore profiles, whereas AMBER 14SB favored conformations closer to the crystallographic structure. Lipid organization and packing also diverged, with CHARMM generating more compact bilayers and AMBER yielding larger areas per lipid. The membrane size exerted minimal influence on the structural or functional descriptors, supporting the use of smaller, computationally efficient membrane patches for equilibrium simulations. The hAQP7 monomers functioned independently, without detectable cooperativity under the simulated conditions. Collectively, these results highlight the substantial impact of force-field selection on aquaporin dynamics and provide practical guidance for designing accurate MD simulations of transmembrane protein channels.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"44 1 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147478789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Compressing Chemistry Reveals Functional Groups. 压缩化学揭示官能团。

IF 5.3 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-03-18 DOI: 10.1021/acs.jcim.5c02917

Ruben Sharma, Ross D King

We introduce the first formal large-scale assessment of the utility of traditional chemical functional groups as used in chemical explanations. Our assessment employs a fundamental principle from computational learning theory: a good compression of data should reveal a good explanation. We introduce an unsupervised learning algorithm based on the Minimum Message Length (MML) principle that searches for substructures that compress around three million biologically relevant molecules. We demonstrate that the discovered substructures contain most human-curated functional groups as well as novel larger patterns with more specific functions. We also run our algorithm on 24 specific bioactivity prediction data sets to discover data set-specific functional groups. Fingerprints constructed from data set-specific functional groups are shown to significantly outperform other fingerprint representations, including the MACCS and Morgan fingerprint, when training ridge regression models on bioactivity regression tasks.

我们介绍了第一个正式的大规模评估，传统的化学官能团的效用，用于化学解释。我们的评估采用了计算学习理论的一个基本原则：一个好的数据压缩应该揭示一个好的解释。我们介绍了一种基于最小消息长度（MML）原则的无监督学习算法，该算法搜索压缩约300万个生物相关分子的子结构。我们证明发现的亚结构包含大多数人类策划的官能团以及具有更特定功能的新颖更大的模式。我们还在24个特定的生物活性预测数据集上运行我们的算法，以发现数据集特定的官能团。在生物活性回归任务中训练脊回归模型时，由数据集特定官能团构建的指纹明显优于其他指纹表示，包括MACCS和Morgan指纹。

引用次数: 0

Investigation of Novel Antiproliferative Drugs Interaction with mPEG2k-PCLy Copolymers Using Molecular Dynamics Simulation Approach. 用分子动力学模拟方法研究新型抗增殖药物与mPEG2k-PCLy共聚物的相互作用。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-03-18 DOI: 10.1021/acs.jcim.5c02581

Karina Solórzano-Acevedo,Carlos Zepactonal Gómez-Castro,Emma Martín Rodríguez,Álvaro Artiga,Rosa M Quispe-Siccha,Mónica Corea,Itzia I Padilla-Martínez

The therapeutic potential of many new drugs is limited by poor aqueous solubility. This work addresses the solubilization improvement of novel fluorescent dihydropyrazole-carbohydrazide derivatives (DPCH), with proven antiproliferative activity against human breast cancer, through encapsulation in three distinct methoxy poly(ethylene glycol)-poly(ε-caprolactone) (mPEG-PCL) diblock copolymers. All-atom molecular dynamics simulations (100 ns, CHARMM36 force field, NAMD) of 52 distinct configurations revealed favorable interactions between DPCHs and PCL residues, resulting in the formation of micellar supramolecular assemblies with PEG coronae that facilitate enhanced DPCH water solvation. Systematic evaluation of micelle size, composition (5, 14, and 21 copolymer strands; 20, 30, 40, 56, 84, 112, and 168 drug molecules), and hydrophobic chain length (PCL 1k, 2k, and 5k) through radial distribution functions, radius of gyration, solvent accessibility, RMSD analysis, and interaction energy calculations identified optimal encapsulation conditions. Regardless of the DPCH derivative tested, mPEG2k-PCL5k produced the most stable, monodisperse micelle populations with the highest loading efficiency. Molecular docking calculations further confirmed strong drug-polymer affinity. Experimental validation through nanoparticle synthesis and characterization via dynamic light scattering, zeta potential measurements, cryogenic transmission electron microscopy, and fluorescence microscopy confirmed successful self-assembly with entrapment efficiencies up to 97% and internalization of loaded micelles into cancer cells. These findings demonstrate that mPEG-PCL micelles are potential carriers for DPCH derivatives, as computational predictions closely align with experimental data.

许多新药的治疗潜力受到水溶性差的限制。本研究通过三种不同的甲氧基聚乙二醇-聚ε-己内酯（mPEG-PCL）二嵌段共聚物的包封，研究了新型荧光二氢吡唑-碳肼衍生物（DPCH）的增溶性改善，该衍生物具有抗乳腺癌的活性。52种不同构型的全原子分子动力学模拟（100 ns， CHARMM36力场，NAMD）表明，DPCHs和PCL残基之间存在良好的相互作用，形成具有PEG冕的胶束超分子组装体，促进了DPCH的水溶剂化。通过径向分布函数、旋转半径、溶剂可及性、RMSD分析和相互作用能计算，系统评价胶束大小、组成（5、14和21条共聚物链；20、30、40、56、84、112和168个药物分子）和疏水链长度（PCL 1k、2k和5k），确定最佳包封条件。无论测试何种DPCH衍生物，mPEG2k-PCL5k都能产生最稳定、单分散的胶团群，并具有最高的负载效率。分子对接计算进一步证实了强的药物-聚合物亲和性。通过纳米粒子合成和表征的实验验证，通过动态光散射、zeta电位测量、低温透射电子显微镜和荧光显微镜证实了成功的自组装，其捕获效率高达97%，并将装载的胶束内在化到癌细胞中。这些发现表明mPEG-PCL胶束是DPCH衍生物的潜在载体，因为计算预测与实验数据密切相关。

{"title":"Investigation of Novel Antiproliferative Drugs Interaction with mPEG2k-PCLy Copolymers Using Molecular Dynamics Simulation Approach.","authors":"Karina Solórzano-Acevedo,Carlos Zepactonal Gómez-Castro,Emma Martín Rodríguez,Álvaro Artiga,Rosa M Quispe-Siccha,Mónica Corea,Itzia I Padilla-Martínez","doi":"10.1021/acs.jcim.5c02581","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02581","url":null,"abstract":"The therapeutic potential of many new drugs is limited by poor aqueous solubility. This work addresses the solubilization improvement of novel fluorescent dihydropyrazole-carbohydrazide derivatives (DPCH), with proven antiproliferative activity against human breast cancer, through encapsulation in three distinct methoxy poly(ethylene glycol)-poly(ε-caprolactone) (mPEG-PCL) diblock copolymers. All-atom molecular dynamics simulations (100 ns, CHARMM36 force field, NAMD) of 52 distinct configurations revealed favorable interactions between DPCHs and PCL residues, resulting in the formation of micellar supramolecular assemblies with PEG coronae that facilitate enhanced DPCH water solvation. Systematic evaluation of micelle size, composition (5, 14, and 21 copolymer strands; 20, 30, 40, 56, 84, 112, and 168 drug molecules), and hydrophobic chain length (PCL 1k, 2k, and 5k) through radial distribution functions, radius of gyration, solvent accessibility, RMSD analysis, and interaction energy calculations identified optimal encapsulation conditions. Regardless of the DPCH derivative tested, mPEG2k-PCL5k produced the most stable, monodisperse micelle populations with the highest loading efficiency. Molecular docking calculations further confirmed strong drug-polymer affinity. Experimental validation through nanoparticle synthesis and characterization via dynamic light scattering, zeta potential measurements, cryogenic transmission electron microscopy, and fluorescence microscopy confirmed successful self-assembly with entrapment efficiencies up to 97% and internalization of loaded micelles into cancer cells. These findings demonstrate that mPEG-PCL micelles are potential carriers for DPCH derivatives, as computational predictions closely align with experimental data.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"1 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147478791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Chat-Driven Computational (Bio)chemistry: Using LLM Agents to Accelerate Bio- and Chemoinformatics. 聊天驱动的计算（生物）化学：使用LLM代理加速生物和化学信息学。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-03-18 DOI: 10.1021/acs.jcim.6c00633

Stephan Schott-Verdugo,Holger Gohlke

Large-language models (LLMs) have rapidly become essential in software engineering, evolving from simple code suggestion tools to autonomous agents that directly read, modify, compile, and test local code bases. Recent LLMs perform well in software engineering benchmarks, showing good performance on complex multifile projects, generating new options for improving and developing bio- and chemoinformatic tools. We showcase this capability with the AMBER molecular dynamics suite, where the setup program LEaP suffered an O(N2) merge routine and a 32-bit integer overflow, limiting simulation systems to ∼6 million atoms. By using an LLM, we implemented an optimized unit merge algorithm and 64-bit indexing, cutting the parametrization time by more than 10-fold for mid-sized systems and allowing one to parametrize multimillion-molecule systems. This case illustrates how natural scientists can make use of LLM agents to modernize, optimize, and develop computational (bio)chemistry tools while also raising new challenges for software provenance and developer roles.

大型语言模型（llm）在软件工程中已经迅速变得必不可少，从简单的代码建议工具演变为直接读取、修改、编译和测试本地代码库的自主代理。最近llm在软件工程基准中表现良好，在复杂的多文件项目中表现良好，为改进和开发生物和化学信息学工具提供了新的选择。我们通过AMBER分子动力学套件展示了这种能力，其中设置程序LEaP遭受了O（N2）合并例程和32位整数溢出，将模拟系统限制在~ 600万个原子。通过使用LLM，我们实现了优化的单元合并算法和64位索引，将中型系统的参数化时间缩短了10倍以上，并允许对数百万分子系统进行参数化。这个案例说明了自然科学家如何利用LLM代理来现代化、优化和开发计算（生物）化学工具，同时也为软件来源和开发人员角色提出了新的挑战。

引用次数: 0

EviCYP: In Silico Prediction of Cytochrome P450 Substrates Based on Vector Quantization and Evidential Deep Learning. 基于向量量化和证据深度学习的细胞色素P450底物的计算机预测。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-03-17 DOI: 10.1021/acs.jcim.6c00074

Yingjie Yang,Yuxin Zhang,Wenxiang Song,Keyun Zhu,Xinmin Li,Mengyu Tong,Guixia Liu,Weihua Li,Yun Tang

The accurate identification of cytochrome P450 (CYP) substrates is crucial in drug discovery and safety assessment, as these enzymes mediate the metabolism of most clinical drugs. However, existing computational models are often limited by data quality issues and lack the ability to quantify prediction uncertainty, hindering their reliable application. To address these challenges, we present EviCYP, a novel prediction framework that integrates evidential deep learning with vector quantization (VQ). We first constructed a high-quality data set by curating 4388 substrates and 2880 nonsubstrates from 1629 publications, and supplemented it with 3728 pseudonegative samples, resulting in 10,996 samples spanning nine major CYP isoforms. The EviCYP architecture processes multimodal molecular representations and enzyme sequences through dedicated encoders, compresses features via VQ to reduce redundancy, and employs an evidential layer to output both class probabilities and an uncertainty estimate. On an internal test set, EviCYP achieved an average AUROC of 0.9500. Notably, the model's uncertainty quantification is highly reliable, with high-uncertainty predictions strongly correlating with classification errors. This work provides a robust and trustworthy computational tool for CYP substrate prediction.

细胞色素P450 （CYP）底物的准确鉴定在药物发现和安全性评估中至关重要，因为这些酶介导大多数临床药物的代谢。然而，现有的计算模型往往受到数据质量问题的限制，缺乏量化预测不确定性的能力，阻碍了它们的可靠应用。为了应对这些挑战，我们提出了一种新的预测框架——evyp，它将证据深度学习与向量量化（VQ）相结合。首先，我们从1629份出版物中挑选了4388个底物和2880个非底物，构建了一个高质量的数据集，并补充了3728个假阴性样本，得到了10996个样本，涵盖了9个主要的CYP亚型。evevyp架构通过专用编码器处理多模态分子表示和酶序列，通过VQ压缩特征以减少冗余，并采用证据层输出类概率和不确定性估计。在内部测试集上，EviCYP的平均AUROC为0.9500。值得注意的是，模型的不确定性量化是高度可靠的，高不确定性预测与分类误差密切相关。这项工作为CYP底物预测提供了一个可靠的计算工具。

{"title":"EviCYP: In Silico Prediction of Cytochrome P450 Substrates Based on Vector Quantization and Evidential Deep Learning.","authors":"Yingjie Yang,Yuxin Zhang,Wenxiang Song,Keyun Zhu,Xinmin Li,Mengyu Tong,Guixia Liu,Weihua Li,Yun Tang","doi":"10.1021/acs.jcim.6c00074","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00074","url":null,"abstract":"The accurate identification of cytochrome P450 (CYP) substrates is crucial in drug discovery and safety assessment, as these enzymes mediate the metabolism of most clinical drugs. However, existing computational models are often limited by data quality issues and lack the ability to quantify prediction uncertainty, hindering their reliable application. To address these challenges, we present EviCYP, a novel prediction framework that integrates evidential deep learning with vector quantization (VQ). We first constructed a high-quality data set by curating 4388 substrates and 2880 nonsubstrates from 1629 publications, and supplemented it with 3728 pseudonegative samples, resulting in 10,996 samples spanning nine major CYP isoforms. The EviCYP architecture processes multimodal molecular representations and enzyme sequences through dedicated encoders, compresses features via VQ to reduce redundancy, and employs an evidential layer to output both class probabilities and an uncertainty estimate. On an internal test set, EviCYP achieved an average AUROC of 0.9500. Notably, the model's uncertainty quantification is highly reliable, with high-uncertainty predictions strongly correlating with classification errors. This work provides a robust and trustworthy computational tool for CYP substrate prediction.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"190 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147471640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0