首页 > 最新文献

Journal of Chemical Information and Modeling 最新文献

英文 中文
scII: Dual-Threshold Adaptive Integration of Single-Cell Multiomics Data Driven by Imputation 基于输入驱动的单细胞多组学数据双阈值自适应集成。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-15 DOI: 10.1021/acs.jcim.5c02396
Yi Zhang, , , Yuru Li*, , , Zhicheng Jin, , , Ye Tian, , and , Chen Su, 

Single-cell multiomics technologies provide unprecedented opportunities to dissect cellular heterogeneity by capturing multidimensional information on complex cellular states and regulatory networks. However, challenges such as high dimensionality, extreme data sparsity, and modality-specific discrepancies hinder the accuracy, interpretability, and scalability of the existing integration methods. Existing integration paradigms, including horizontal, vertical, and diagonal strategies, are further limited by their inability to fully capture nonlinear biological relationships, their reliance on high-quality data, and their substantial computational demands. Here, we present scII (Dual-Threshold Adaptive Integration of Single-Cell Multiomics Data Driven by Imputation), an adaptive framework designed to integrate gene expression (scRNA-seq) and chromatin accessibility (scATAC-seq) data. Our approach is built on several key conceptual innovations: (i) scRNA-seq–guided signal imputation to enhance information integrity in scATAC-seq; (ii) a multilayer perceptron with the Maxout activation function to improve the modeling of complex nonlinear relationships and mitigate the vanishing gradient problem; (iii) a dynamic dual-threshold adaptive selection mechanism that jointly evaluates cross-modality feature similarity and classification reliability to select high-quality cells; and (iv) Bayesian Information Criterion (BIC)-based optimization to dynamically determine the number of Gaussian Mixture Model components according to data distribution, thereby eliminating reliance on manually preset parameters. Extensive experiments on multiple real-world and simulated data sets demonstrate that scII not only enables efficient integration of unpaired scRNA-seq and scATAC-seq data but also achieves accurate transfer of cell-type annotations, allowing high-precision cell-type prediction for scATAC-seq.

单细胞多组学技术通过捕获复杂细胞状态和调控网络的多维信息,为剖析细胞异质性提供了前所未有的机会。然而,诸如高维性、极端数据稀疏性和特定于模态的差异等挑战阻碍了现有集成方法的准确性、可解释性和可伸缩性。现有的集成范式,包括水平、垂直和对角策略,由于无法完全捕捉非线性生物关系、依赖高质量数据以及大量的计算需求而进一步受到限制。在这里,我们提出了scII(由Imputation驱动的单细胞多组学数据双阈值自适应集成),这是一个旨在整合基因表达(scRNA-seq)和染色质可及性(scATAC-seq)数据的自适应框架。我们的方法建立在几个关键的概念创新之上:(i) scrna -seq引导的信号输入,以增强scacc -seq中的信息完整性;(ii)具有Maxout激活函数的多层感知器,以改进复杂非线性关系的建模并减轻梯度消失问题;(iii)动态双阈值自适应选择机制,联合评估跨模态特征相似性和分类可靠性,以选择高质量的细胞;(iv)基于贝叶斯信息准则(BIC)的优化,根据数据分布动态确定高斯混合模型的分量个数,从而消除对人工预置参数的依赖。在多个真实和模拟数据集上进行的大量实验表明,scII不仅可以有效地整合未配对的scRNA-seq和scATAC-seq数据,还可以实现细胞类型注释的准确传递,从而实现对scATAC-seq的高精度细胞类型预测。
{"title":"scII: Dual-Threshold Adaptive Integration of Single-Cell Multiomics Data Driven by Imputation","authors":"Yi Zhang,&nbsp;, ,&nbsp;Yuru Li*,&nbsp;, ,&nbsp;Zhicheng Jin,&nbsp;, ,&nbsp;Ye Tian,&nbsp;, and ,&nbsp;Chen Su,&nbsp;","doi":"10.1021/acs.jcim.5c02396","DOIUrl":"10.1021/acs.jcim.5c02396","url":null,"abstract":"<p >Single-cell multiomics technologies provide unprecedented opportunities to dissect cellular heterogeneity by capturing multidimensional information on complex cellular states and regulatory networks. However, challenges such as high dimensionality, extreme data sparsity, and modality-specific discrepancies hinder the accuracy, interpretability, and scalability of the existing integration methods. Existing integration paradigms, including horizontal, vertical, and diagonal strategies, are further limited by their inability to fully capture nonlinear biological relationships, their reliance on high-quality data, and their substantial computational demands. Here, we present scII (Dual-Threshold Adaptive Integration of Single-Cell Multiomics Data Driven by Imputation), an adaptive framework designed to integrate gene expression (scRNA-seq) and chromatin accessibility (scATAC-seq) data. Our approach is built on several key conceptual innovations: (i) scRNA-seq–guided signal imputation to enhance information integrity in scATAC-seq; (ii) a multilayer perceptron with the Maxout activation function to improve the modeling of complex nonlinear relationships and mitigate the vanishing gradient problem; (iii) a dynamic dual-threshold adaptive selection mechanism that jointly evaluates cross-modality feature similarity and classification reliability to select high-quality cells; and (iv) Bayesian Information Criterion (BIC)-based optimization to dynamically determine the number of Gaussian Mixture Model components according to data distribution, thereby eliminating reliance on manually preset parameters. Extensive experiments on multiple real-world and simulated data sets demonstrate that scII not only enables efficient integration of unpaired scRNA-seq and scATAC-seq data but also achieves accurate transfer of cell-type annotations, allowing high-precision cell-type prediction for scATAC-seq.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 3","pages":"1445–1456"},"PeriodicalIF":5.3,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c02396","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conformational Transition of the CARF Domain Driven by Binding Free Energy 结合自由能驱动CARF结构域的构象转变。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-14 DOI: 10.1021/acs.jcim.5c02521
Guodong Hu*, , , Jin Qian, , , Chengfei Cai, , and , Jianzhong Chen*, 

Type III CRISPR systems provide adaptive immunity against invasion of foreign nucleic acids by generating cyclic oligoadenylate (cAn) second messengers, which activate effector proteins containing CRISPR-associated Rossmann fold (CARF) domains. The apo form of CARF adopts a closed state, distinct from its cA4-bound open state conformation. To investigate the conformational transition, we performed multiple type molecular dynamics (MD) simulations, revealing a unidirectional conformational shift toward the closed state. This transition was hindered by reduced flexibility in cA4-binding residues. Notably, the conformational change primarily occurs between the two monomers, with minimal structural rearrangement within individual monomers. Comparative analysis showed that while the number of hydrogen bonds and contacts between CARF and cA4 decreases in the closed state, intermonomer interactions are strengthened. Binding free-energy calculations between the two chains of CARF further confirmed higher affinity in the closed state. Our findings support an energy-driven conformational change model, providing insights for optimizing CRISPR-based genetic manipulation tools.

III型CRISPR系统通过产生环低聚腺苷酸(cAn)第二信使,激活含有CRISPR相关的罗斯曼折叠(CARF)结构域的效应蛋白,提供抗外来核酸入侵的适应性免疫。载脂蛋白形式的CARF采用封闭状态,不同于其ca4结合的开放状态构象。为了研究构象转变,我们进行了多类型分子动力学(MD)模拟,揭示了向封闭状态的单向构象转移。这种转变受到ca4结合残基柔韧性降低的阻碍。值得注意的是,构象变化主要发生在两个单体之间,单个单体内部的结构重排最小。对比分析表明,在封闭状态下,CARF与cA4之间的氢键和接触数减少,单体间相互作用增强。CARF两条链之间的结合自由能计算进一步证实了在闭合状态下具有更高的亲和力。我们的研究结果支持能量驱动的构象变化模型,为优化基于crispr的遗传操作工具提供了见解。
{"title":"Conformational Transition of the CARF Domain Driven by Binding Free Energy","authors":"Guodong Hu*,&nbsp;, ,&nbsp;Jin Qian,&nbsp;, ,&nbsp;Chengfei Cai,&nbsp;, and ,&nbsp;Jianzhong Chen*,&nbsp;","doi":"10.1021/acs.jcim.5c02521","DOIUrl":"10.1021/acs.jcim.5c02521","url":null,"abstract":"<p >Type III CRISPR systems provide adaptive immunity against invasion of foreign nucleic acids by generating cyclic oligoadenylate (cA<sub><i>n</i></sub>) second messengers, which activate effector proteins containing CRISPR-associated Rossmann fold (CARF) domains. The apo form of CARF adopts a closed state, distinct from its cA<sub>4</sub>-bound open state conformation. To investigate the conformational transition, we performed multiple type molecular dynamics (MD) simulations, revealing a unidirectional conformational shift toward the closed state. This transition was hindered by reduced flexibility in cA<sub>4</sub>-binding residues. Notably, the conformational change primarily occurs between the two monomers, with minimal structural rearrangement within individual monomers. Comparative analysis showed that while the number of hydrogen bonds and contacts between CARF and cA<sub>4</sub> decreases in the closed state, intermonomer interactions are strengthened. Binding free-energy calculations between the two chains of CARF further confirmed higher affinity in the closed state. Our findings support an energy-driven conformational change model, providing insights for optimizing CRISPR-based genetic manipulation tools.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"1179–1189"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BOLD-GPCRs: A Transformer-Powered App for Predicting Ligand Bioactivity and Mutational Effects across Class A GPCRs bold - gpcr:用于预测A类gpcr配体生物活性和突变效应的变压器驱动应用程序。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-14 DOI: 10.1021/acs.jcim.5c01858
Davide Provasi, , , Kirill Konovalov, , , Nicholas Riina, , , Olivia Cullen, , and , Marta Filizola*, 

G Protein-Coupled Receptors (GPCRs) are important targets for drug discovery owing to their ability to respond to a broad range of stimuli and their involvement in numerous pathologies. Although traditional ligand-based and structure-based approaches have facilitated the development of effective therapeutics for many GPCRs, these approaches often fall short when applied to receptors with limited ligand or structural data. This limitation highlights the critical need for advanced strategies capable of accurately predicting ligand bioactivity across the entire GPCR family, especially for understudied receptor subtypes. In this study, we introduce BOLD-GPCRs (BERT-Optimized Ligand Discovery for GPCRs), a deep learning framework designed to enhance the prediction of ligand bioactivity across class A GPCRs. Accessible via a user-friendly web interface, BOLD-GPCRs employs transfer learning and leverages curated data sets of known class A GPCR ligands, receptor sequences, and signaling-relevant mutations. By integrating dense neural network classifiers with transformer-based protein language models, BOLD-GPCRs captures complex relationships between receptor sequence/function and ligand activity. Our results demonstrate that BOLD-GPCRs achieves robust predictive performance for both ligand bioactivity and mutational effects across a broad range of class A GPCRs, underscoring its potential as a valuable tool for ligand discovery, especially for poorly characterized receptors.

G蛋白偶联受体(gpcr)是药物发现的重要靶标,因为它们能够对广泛的刺激做出反应,并参与许多病理。尽管传统的基于配体和基于结构的方法促进了许多gpcr有效治疗方法的发展,但这些方法在应用于配体或结构数据有限的受体时往往不足。这一限制突出了对能够准确预测整个GPCR家族配体生物活性的先进策略的迫切需要,特别是对于未充分研究的受体亚型。在本研究中,我们引入了bold - gpcr (BERT-Optimized Ligand Discovery for gpcr),这是一个深度学习框架,旨在增强对a类gpcr配体生物活性的预测。bold -GPCR可通过用户友好的网络界面访问,采用迁移学习,并利用已知a类GPCR配体、受体序列和信号相关突变的精心整理的数据集。通过将密集神经网络分类器与基于转换器的蛋白质语言模型相结合,bold - gpcr捕捉到受体序列/功能与配体活性之间的复杂关系。我们的研究结果表明,bold - gpcr在广泛的a类gpcr中实现了对配体生物活性和突变效应的强大预测性能,强调了其作为配体发现的有价值工具的潜力,特别是对于特征不明确的受体。
{"title":"BOLD-GPCRs: A Transformer-Powered App for Predicting Ligand Bioactivity and Mutational Effects across Class A GPCRs","authors":"Davide Provasi,&nbsp;, ,&nbsp;Kirill Konovalov,&nbsp;, ,&nbsp;Nicholas Riina,&nbsp;, ,&nbsp;Olivia Cullen,&nbsp;, and ,&nbsp;Marta Filizola*,&nbsp;","doi":"10.1021/acs.jcim.5c01858","DOIUrl":"10.1021/acs.jcim.5c01858","url":null,"abstract":"<p >G Protein-Coupled Receptors (GPCRs) are important targets for drug discovery owing to their ability to respond to a broad range of stimuli and their involvement in numerous pathologies. Although traditional ligand-based and structure-based approaches have facilitated the development of effective therapeutics for many GPCRs, these approaches often fall short when applied to receptors with limited ligand or structural data. This limitation highlights the critical need for advanced strategies capable of accurately predicting ligand bioactivity across the entire GPCR family, especially for understudied receptor subtypes. In this study, we introduce BOLD-GPCRs (BERT-Optimized Ligand Discovery for GPCRs), a deep learning framework designed to enhance the prediction of ligand bioactivity across class A GPCRs. Accessible via a user-friendly web interface, BOLD-GPCRs employs transfer learning and leverages curated data sets of known class A GPCR ligands, receptor sequences, and signaling-relevant mutations. By integrating dense neural network classifiers with transformer-based protein language models, BOLD-GPCRs captures complex relationships between receptor sequence/function and ligand activity. Our results demonstrate that BOLD-GPCRs achieves robust predictive performance for both ligand bioactivity and mutational effects across a broad range of class A GPCRs, underscoring its potential as a valuable tool for ligand discovery, especially for poorly characterized receptors.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"855–866"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145964542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Protein–Ligand Binding Affinities Using Atomic Surface Site Interaction Points 利用原子表面相互作用点预测蛋白质与配体的结合亲和力。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-14 DOI: 10.1021/acs.jcim.5c02628
Katarzyna J. Zator, , , Maria Chiara Storer, , and , Christopher A. Hunter*, 

Atom surface site Interaction Points (AIP) which were previously used to predict association constants for synthetic host–guest systems has been extended to protein–ligand complexes. AIP descriptions of protein binding sites were obtained by combining a library of precomputed AIP descriptors for all protein functional groups with a graph-based substructure matching algorithm. The corresponding AIP description of ligands was obtained directly by footprinting the molecular electrostatic potential surface calculated using density functional theory. These AIP descriptions were projected onto X-ray crystal structures of protein–ligand complexes to identify pairs of AIPs that were sufficiently close in space to constitute an intermolecular interaction. The overall free energy of binding was calculated by summing the contributions of each AIP contact and associated desolvation. Application to the 94 complexes involving uncharged ligands in CASF benchmark data set showed that the method achieves a Pearson correlation coefficient of 0.76 and an RMSD of 11 kJ mol–1 for absolute free energies of binding.

原子表面相互作用点(AIP)先前用于预测合成主客体系统的结合常数,现已扩展到蛋白质-配体复合物。通过将预先计算的所有蛋白质功能基团的AIP描述符库与基于图的子结构匹配算法相结合,获得蛋白质结合位点的AIP描述。通过对密度泛函理论计算的分子静电势面进行足迹化处理,直接得到配体的AIP描述。这些AIP描述被投射到蛋白质-配体复合物的x射线晶体结构上,以识别在空间上足够接近以构成分子间相互作用的AIP对。结合的总自由能是通过计算每个AIP接触和相关的脱溶的贡献之和来计算的。将该方法应用于CASF基准数据集中94个含不带电配体的配合物,得到的绝对结合自由能的Pearson相关系数为0.76,RMSD为11 kJ mol-1。
{"title":"Prediction of Protein–Ligand Binding Affinities Using Atomic Surface Site Interaction Points","authors":"Katarzyna J. Zator,&nbsp;, ,&nbsp;Maria Chiara Storer,&nbsp;, and ,&nbsp;Christopher A. Hunter*,&nbsp;","doi":"10.1021/acs.jcim.5c02628","DOIUrl":"10.1021/acs.jcim.5c02628","url":null,"abstract":"<p >Atom surface site Interaction Points (AIP) which were previously used to predict association constants for synthetic host–guest systems has been extended to protein–ligand complexes. AIP descriptions of protein binding sites were obtained by combining a library of precomputed AIP descriptors for all protein functional groups with a graph-based substructure matching algorithm. The corresponding AIP description of ligands was obtained directly by footprinting the molecular electrostatic potential surface calculated using density functional theory. These AIP descriptions were projected onto X-ray crystal structures of protein–ligand complexes to identify pairs of AIPs that were sufficiently close in space to constitute an intermolecular interaction. The overall free energy of binding was calculated by summing the contributions of each AIP contact and associated desolvation. Application to the 94 complexes involving uncharged ligands in CASF benchmark data set showed that the method achieves a Pearson correlation coefficient of 0.76 and an RMSD of 11 kJ mol<sup>–1</sup> for absolute free energies of binding.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"1097–1105"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c02628","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Glycans Modulate the Adsorption of RBD Glycoproteins on Polarizable Surfaces. 聚糖调节RBD糖蛋白在极化表面的吸附。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-14 DOI: 10.1021/acs.jcim.5c02363
Antonio M Bosch-Fernández,Willy Menacho,Rubén Pérez,Horacio V Guzman
Numerous respiratory viruses are transmitted via airborne microdroplets that frequently adhere to fomites. Understanding the behavior of these phenomenologically rich bio-material interfaces remains an open issue. Here, we tackle the complex interplay between glycans and protein conformational dynamics during adsorption onto polarizable surfaces, focusing on the potential of glycans as molecular interaction modulators. We employ molecular dynamics simulations to dissect the interactions of the Receptor Binding Domain (RBD) glycoproteins from different SARS-CoV-2 variants of concern (VoC), in both open and closed conformations, with polarizable planar interfaces. Advanced analysis using 2D space reveals distinct adsorption mechanisms depending on the initial loci of the glycan within the protein wall. Hydrophobic surfaces facilitate stable adsorption for both RBD conformations. Conversely, hydrophilic surfaces exhibit reduced adsorption, particularly for the closed-RBD, where glycans predominantly form hydrogen bonds. Glycans significantly modulate closed-RBD adsorption, either enhancing it by permanent tethering or impeding it depending on the initial conformation and protein mutations (Omicron). Results for the individual RBDs are consistent with scaled-up simulations for the complete spike ectodomain glycoprotein. Our findings unveil novel glycan-mediated adsorption phenomena and provide fundamental insights into glycoprotein-surface interactions, paving the way for understanding glycan roles in glycoprotein-fomite adsorption, protein aggregation, and recognition at polarizable biological interfaces.
许多呼吸道病毒通过空气中的微飞沫传播,这些微飞沫经常附着在污染物上。了解这些现象丰富的生物材料界面的行为仍然是一个悬而未决的问题。在这里,我们解决了在极化表面吸附过程中聚糖和蛋白质构象动力学之间复杂的相互作用,重点关注聚糖作为分子相互作用调节剂的潜力。我们采用分子动力学模拟来解剖来自不同SARS-CoV-2关注变体(VoC)的受体结合域(RBD)糖蛋白在开放和封闭构象下与极化平面界面的相互作用。利用二维空间的高级分析揭示了不同的吸附机制,这取决于蛋白质壁内聚糖的初始位点。疏水表面有利于稳定吸附两种RBD构象。相反,亲水表面表现出减少的吸附,特别是对于封闭的rbd,其中聚糖主要形成氢键。聚糖显著调节封闭rbd吸附,根据初始构象和蛋白质突变,通过永久系住或阻碍它来增强它。单个rbd的结果与完全尖峰外结构域糖蛋白的放大模拟一致。我们的发现揭示了新的聚糖介导的吸附现象,为糖蛋白-表面相互作用提供了基本的见解,为理解聚糖在糖蛋白-表面吸附、蛋白质聚集和极化生物界面识别中的作用铺平了道路。
{"title":"Glycans Modulate the Adsorption of RBD Glycoproteins on Polarizable Surfaces.","authors":"Antonio M Bosch-Fernández,Willy Menacho,Rubén Pérez,Horacio V Guzman","doi":"10.1021/acs.jcim.5c02363","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02363","url":null,"abstract":"Numerous respiratory viruses are transmitted via airborne microdroplets that frequently adhere to fomites. Understanding the behavior of these phenomenologically rich bio-material interfaces remains an open issue. Here, we tackle the complex interplay between glycans and protein conformational dynamics during adsorption onto polarizable surfaces, focusing on the potential of glycans as molecular interaction modulators. We employ molecular dynamics simulations to dissect the interactions of the Receptor Binding Domain (RBD) glycoproteins from different SARS-CoV-2 variants of concern (VoC), in both open and closed conformations, with polarizable planar interfaces. Advanced analysis using 2D space reveals distinct adsorption mechanisms depending on the initial loci of the glycan within the protein wall. Hydrophobic surfaces facilitate stable adsorption for both RBD conformations. Conversely, hydrophilic surfaces exhibit reduced adsorption, particularly for the closed-RBD, where glycans predominantly form hydrogen bonds. Glycans significantly modulate closed-RBD adsorption, either enhancing it by permanent tethering or impeding it depending on the initial conformation and protein mutations (Omicron). Results for the individual RBDs are consistent with scaled-up simulations for the complete spike ectodomain glycoprotein. Our findings unveil novel glycan-mediated adsorption phenomena and provide fundamental insights into glycoprotein-surface interactions, paving the way for understanding glycan roles in glycoprotein-fomite adsorption, protein aggregation, and recognition at polarizable biological interfaces.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"261 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145968605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty Quantification in Molecular Machine Learning for Property Predictions under Data Shifts 数据移位下分子机器学习属性预测的不确定性量化。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-14 DOI: 10.1021/acs.jcim.5c02381
Raquel Parrondo-Pizarro, , , Jessica Lanini, , and , Raquel Rodríguez-Pérez*, 

Drug discovery and medicinal chemistry efforts are increasingly influenced by machine learning (ML), with compound property prediction as a central application. ML models have demonstrated strong performance in predicting various compound properties from chemical structure. However, these models can exhibit varying levels of prediction error, making uncertainty quantification (UQ) essential for informed decisions. Standard UQ metrics include the distance to the molecules in the training set and prediction variance, obtained through methods such as model ensembles or Bayesian modeling. Although several UQ methodologies have been developed in recent years, no single approach consistently outperformed others. Herein, we present a comprehensive benchmark of UQ strategies for ML-based prediction of absorption, distribution, metabolism, and excretion (ADME) properties, using both in-house and public data sets. We employed the recently introduced UNIQUE (UNcertaInty QUantification bEnchmarking) framework and evaluated UQ method performance under data shifts. Our findings indicate data-based UQ metrics (e.g., chemical distance), and model-based UQ metrics (e.g., predicted value and variance) may capture complementary aspects of uncertainty. Their combination through error models, designed to predict the original ML model’s error, yielded higher-quality uncertainty estimates. These error models emerged as a promising strategy for enhancing UQ, showing robustness in under various degrees and types of data shift. Taken together, our work highlights the potential of combining diverse UQ metrics and error modeling to improve reliability in molecular property prediction. By establishing standardized evaluation setups and assessing UQ under data shifts, we provide a foundation for future UQ method development and benchmarking in the field.

药物发现和药物化学工作越来越多地受到机器学习(ML)的影响,其中化合物性质预测是中心应用。ML模型在从化学结构预测各种化合物性质方面表现出很强的性能。然而,这些模型可能表现出不同程度的预测误差,使得不确定性量化(UQ)对于明智的决策至关重要。标准UQ指标包括通过模型集成或贝叶斯建模等方法获得的与训练集中分子的距离和预测方差。虽然近年来开发了几种UQ方法,但没有一种方法始终优于其他方法。在此,我们使用内部和公共数据集,提出了基于ml的吸收、分布、代谢和排泄(ADME)特性预测的UQ策略的综合基准。我们采用了最近引入的UNIQUE(不确定性量化基准)框架,并评估了数据移位下UQ方法的性能。我们的研究结果表明,基于数据的UQ度量(例如,化学距离)和基于模型的UQ度量(例如,预测值和方差)可以捕获不确定性的互补方面。他们通过误差模型的组合,旨在预测原始机器学习模型的误差,产生更高质量的不确定性估计。这些误差模型是一种很有前途的提高UQ的策略,在不同程度和类型的数据移位下显示出鲁棒性。综上所述,我们的工作突出了结合不同UQ指标和误差建模来提高分子性质预测可靠性的潜力。通过建立标准化的评估设置和评估数据变化下的UQ,我们为未来UQ方法的开发和该领域的基准测试奠定了基础。
{"title":"Uncertainty Quantification in Molecular Machine Learning for Property Predictions under Data Shifts","authors":"Raquel Parrondo-Pizarro,&nbsp;, ,&nbsp;Jessica Lanini,&nbsp;, and ,&nbsp;Raquel Rodríguez-Pérez*,&nbsp;","doi":"10.1021/acs.jcim.5c02381","DOIUrl":"10.1021/acs.jcim.5c02381","url":null,"abstract":"<p >Drug discovery and medicinal chemistry efforts are increasingly influenced by machine learning (ML), with compound property prediction as a central application. ML models have demonstrated strong performance in predicting various compound properties from chemical structure. However, these models can exhibit varying levels of prediction error, making uncertainty quantification (UQ) essential for informed decisions. Standard UQ metrics include the distance to the molecules in the training set and prediction variance, obtained through methods such as model ensembles or Bayesian modeling. Although several UQ methodologies have been developed in recent years, no single approach consistently outperformed others. Herein, we present a comprehensive benchmark of UQ strategies for ML-based prediction of absorption, distribution, metabolism, and excretion (ADME) properties, using both in-house and public data sets. We employed the recently introduced UNIQUE (UNcertaInty QUantification bEnchmarking) framework and evaluated UQ method performance under data shifts. Our findings indicate data-based UQ metrics (e.g., chemical distance), and model-based UQ metrics (e.g., predicted value and variance) may capture complementary aspects of uncertainty. Their combination through error models, designed to predict the original ML model’s error, yielded higher-quality uncertainty estimates. These error models emerged as a promising strategy for enhancing UQ, showing robustness in under various degrees and types of data shift. Taken together, our work highlights the potential of combining diverse UQ metrics and error modeling to improve reliability in molecular property prediction. By establishing standardized evaluation setups and assessing UQ under data shifts, we provide a foundation for future UQ method development and benchmarking in the field.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"923–935"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c02381","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovery of a Covalent Small-Molecule eEF1A1 Inhibitor via Structure-Based Virtual Screening 基于结构的虚拟筛选发现共价小分子eEF1A1抑制剂。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-13 DOI: 10.1021/acs.jcim.5c02496
Yangping Deng, , , Sizheng Li, , , Liang Wang, , , Jianping Lin, , , Haohao Fu, , , Jing Li*, , and , Yue Chen*, 

Pancreatic cancer remains a formidable health challenge due to its late-stage diagnosis and limited therapeutic options, underscoring the need for novel targets and modalities. Our previous work revealed that the natural product, BE-43547A2, could effectively inhibit the progression of pancreatic cancer by the covalent binding to eukaryotic translation elongation factor 1 α 1 (eEF1A1) at Cys234 (C234). Considering the critical role in protein synthesis and the association with pancreatic cancer progression, eEF1A1 is a novel promising target for pancreatic cancer. However, the rational drug design methods for eEF1A1 are extremely lacking. Herein, using microsecond-scale molecular dynamics (MD) simulations, we identify a suitable eEF1A1 conformation for structure-based virtual screening (SBVS) by targeting the residue of C234. Through a tailored SBVS pipeline, we identified AKOS-04 as a novel small-molecule covalent inhibitor with nanomolar-level potency (IC50 = 28.5 ± 2.86 nM in the PATU8988T cell line). Notably, cellular thermal shift assays (CETSA), with the treatments of dithiothreitol (DTT) and iodoacetamide (IAM), confirmed the covalent Cys-involved interaction of AKOS-04 and eEF1A1. Further structural modification validated the critical contribution of a double bond in the acrylamide group of AKOS-04 for its covalent binding with eEF1A1, manifested by the abolished inhibitory activity of compound 9 with the changed single bond in the acrylamide group. MST experiments confirmed direct binding of the compounds to eEF1A1 protein. AKOS-04 exhibited the strongest binding among the tested compounds, consistent with effective covalent target. Finally, MD simulations and pair-interaction energy analyses highlighted Lys84, Arg218, and Glu230 of eEF1A1 as key residues for driving its binding interactions to AKOS-04. These results reveal that AKOS-04, screened by SBVS against C234 of eEF1A1, represents a promising lead for eEF1A1-targeted pancreatic cancer therapy, highlighting the power of computational approaches in covalent drug discovery.

由于胰腺癌的晚期诊断和有限的治疗选择,胰腺癌仍然是一个巨大的健康挑战,强调需要新的靶点和模式。我们之前的工作表明,天然产物BE-43547A2可以通过在Cys234 (C234)位点与真核翻译延伸因子1 α 1 (eEF1A1)共价结合,有效抑制胰腺癌的进展。考虑到eEF1A1在蛋白质合成中的关键作用以及与胰腺癌进展的关联,eEF1A1是一个新的有希望的胰腺癌靶点。然而,合理的eEF1A1药物设计方法却极为缺乏。本文利用微秒级分子动力学(MD)模拟,通过靶向C234残基,确定了适合用于基于结构的虚拟筛选(SBVS)的eEF1A1构象。通过量身定制的SBVS管道,我们鉴定出AKOS-04是一种具有纳米级效价的新型小分子共价抑制剂(在PATU8988T细胞系中的IC50 = 28.5±2.86 nM)。值得注意的是,在二硫苏糖醇(DTT)和碘乙酰胺(IAM)处理下,细胞热移试验(CETSA)证实了AKOS-04和eEF1A1共价cys参与的相互作用。进一步的结构修饰验证了AKOS-04的丙烯酰胺基团双键对其与eEF1A1共价结合的关键作用,表现为化合物9的抑制活性随着丙烯酰胺基团单键的改变而消失。MST实验证实了化合物与eEF1A1蛋白的直接结合。AKOS-04在被试化合物中表现出最强的结合,与有效共价靶标一致。最后,MD模拟和对相互作用能分析表明,eEF1A1的Lys84、Arg218和Glu230是驱动其与AKOS-04结合相互作用的关键残基。这些结果表明,SBVS筛选的针对eEF1A1的C234的AKOS-04代表了eEF1A1靶向胰腺癌治疗的有希望的先导,突出了计算方法在共价药物发现中的力量。
{"title":"Discovery of a Covalent Small-Molecule eEF1A1 Inhibitor via Structure-Based Virtual Screening","authors":"Yangping Deng,&nbsp;, ,&nbsp;Sizheng Li,&nbsp;, ,&nbsp;Liang Wang,&nbsp;, ,&nbsp;Jianping Lin,&nbsp;, ,&nbsp;Haohao Fu,&nbsp;, ,&nbsp;Jing Li*,&nbsp;, and ,&nbsp;Yue Chen*,&nbsp;","doi":"10.1021/acs.jcim.5c02496","DOIUrl":"10.1021/acs.jcim.5c02496","url":null,"abstract":"<p >Pancreatic cancer remains a formidable health challenge due to its late-stage diagnosis and limited therapeutic options, underscoring the need for novel targets and modalities. Our previous work revealed that the natural product, BE-43547A<sub>2</sub>, could effectively inhibit the progression of pancreatic cancer by the covalent binding to eukaryotic translation elongation factor 1 α 1 (eEF1A1) at Cys234 (C234). Considering the critical role in protein synthesis and the association with pancreatic cancer progression, eEF1A1 is a novel promising target for pancreatic cancer. However, the rational drug design methods for eEF1A1 are extremely lacking. Herein, using microsecond-scale molecular dynamics (MD) simulations, we identify a suitable eEF1A1 conformation for structure-based virtual screening (SBVS) by targeting the residue of C234. Through a tailored SBVS pipeline, we identified AKOS-04 as a novel small-molecule covalent inhibitor with nanomolar-level potency (IC<sub>50</sub> = 28.5 ± 2.86 nM in the PATU8988T cell line). Notably, cellular thermal shift assays (CETSA), with the treatments of dithiothreitol (DTT) and iodoacetamide (IAM), confirmed the covalent Cys-involved interaction of AKOS-04 and eEF1A1. Further structural modification validated the critical contribution of a double bond in the acrylamide group of AKOS-04 for its covalent binding with eEF1A1, manifested by the abolished inhibitory activity of compound <b>9</b> with the changed single bond in the acrylamide group. MST experiments confirmed direct binding of the compounds to eEF1A1 protein. AKOS-04 exhibited the strongest binding among the tested compounds, consistent with effective covalent target. Finally, MD simulations and pair-interaction energy analyses highlighted Lys84, Arg218, and Glu230 of eEF1A1 as key residues for driving its binding interactions to AKOS-04. These results reveal that AKOS-04, screened by SBVS against C234 of eEF1A1, represents a promising lead for eEF1A1-targeted pancreatic cancer therapy, highlighting the power of computational approaches in covalent drug discovery.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"1083–1096"},"PeriodicalIF":5.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PepFoundry: A Pipeline for Building Machine-Learning Ready Representations of Nonstandard Peptides Containing Cycles, Non-natural Residues, Polymer Units, and More PepFoundry:一个用于构建机器学习准备表示的管道,包含循环,非天然残基,聚合物单元等的非标准肽。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-13 DOI: 10.1021/acs.jcim.5c02629
Daniel Garzon Otero, , , Omid Akbari, , , Aneesh Mandapati, , and , Camille Bilodeau*, 

Peptides featuring synthetic modifications, such as noncanonical amino acids, backbone modifications, cyclic structures, and polymer units have become central to modern drug design due to their enhanced stability and functional diversity. However, current machine learning (ML) approaches are restricted by challenges associated with transforming peptide sequences into atom-level representations, leading ML efforts to focus largely on datasets containing linear peptides comprised of standard residues. Here, we present PepFoundry, a Python package that handles peptide sequences beyond canonical amino acids and linear topologies by using SMILES strings in the CHUCKLES format. PepFoundry generates atom-mapped RDKit molecule objects, enabling the extraction of atom-level features, such as Morgan fingerprints and graph representations. We demonstrate its utility by processing a dataset of peptide sequences containing noncanonical amino acids and generating atomic level features for downstream property prediction. We show that atomic-level representations of peptides containing noncanonical amino acids consistently outperform sequence-level representations, regardless of model type. We additionally explore the representation of noncanonical peptides through latent space visualization and show that models with atomic-level information can effectively learn relationships between analogous sequences of l-peptides, d-peptides, and peptoids. This framework allows for the flexible incorporation of new amino acid chemistries, enabling existing ML methods to be straightforwardly applied to datasets of peptides containing nonstandard features. It also facilitates the rapid construction of customized peptide libraries and provides a scalable platform to accelerate ML-driven peptide discovery and optimization.

具有合成修饰的肽,如非规范氨基酸、主链修饰、环结构和聚合物单元,由于其增强的稳定性和功能多样性,已成为现代药物设计的核心。然而,当前的机器学习(ML)方法受到将肽序列转化为原子级表示的挑战的限制,导致ML的努力主要集中在包含由标准残基组成的线性肽的数据集上。在这里,我们介绍PepFoundry,这是一个Python包,通过使用CHUCKLES格式的SMILES字符串来处理超越规范氨基酸和线性拓扑的肽序列。PepFoundry生成原子映射的RDKit分子对象,支持原子级特征的提取,如摩根指纹和图形表示。我们通过处理包含非规范氨基酸的肽序列数据集并生成用于下游性质预测的原子水平特征来证明其实用性。我们表明,无论模型类型如何,包含非规范氨基酸的肽的原子水平表示始终优于序列水平表示。此外,我们还通过潜在空间可视化探索了非规范肽的表示,并表明具有原子水平信息的模型可以有效地学习l-肽、d-肽和类肽类似序列之间的关系。该框架允许灵活地结合新的氨基酸化学物质,使现有的ML方法能够直接应用于含有非标准特征的肽的数据集。它还促进了定制肽库的快速构建,并提供了一个可扩展的平台来加速机器学习驱动的肽发现和优化。
{"title":"PepFoundry: A Pipeline for Building Machine-Learning Ready Representations of Nonstandard Peptides Containing Cycles, Non-natural Residues, Polymer Units, and More","authors":"Daniel Garzon Otero,&nbsp;, ,&nbsp;Omid Akbari,&nbsp;, ,&nbsp;Aneesh Mandapati,&nbsp;, and ,&nbsp;Camille Bilodeau*,&nbsp;","doi":"10.1021/acs.jcim.5c02629","DOIUrl":"10.1021/acs.jcim.5c02629","url":null,"abstract":"<p >Peptides featuring synthetic modifications, such as noncanonical amino acids, backbone modifications, cyclic structures, and polymer units have become central to modern drug design due to their enhanced stability and functional diversity. However, current machine learning (ML) approaches are restricted by challenges associated with transforming peptide sequences into atom-level representations, leading ML efforts to focus largely on datasets containing linear peptides comprised of standard residues. Here, we present PepFoundry, a Python package that handles peptide sequences beyond canonical amino acids and linear topologies by using SMILES strings in the CHUCKLES format. PepFoundry generates atom-mapped RDKit molecule objects, enabling the extraction of atom-level features, such as Morgan fingerprints and graph representations. We demonstrate its utility by processing a dataset of peptide sequences containing noncanonical amino acids and generating atomic level features for downstream property prediction. We show that atomic-level representations of peptides containing noncanonical amino acids consistently outperform sequence-level representations, regardless of model type. We additionally explore the representation of noncanonical peptides through latent space visualization and show that models with atomic-level information can effectively learn relationships between analogous sequences of <span>l</span>-peptides, <span>d</span>-peptides, and peptoids. This framework allows for the flexible incorporation of new amino acid chemistries, enabling existing ML methods to be straightforwardly applied to datasets of peptides containing nonstandard features. It also facilitates the rapid construction of customized peptide libraries and provides a scalable platform to accelerate ML-driven peptide discovery and optimization.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"1264–1273"},"PeriodicalIF":5.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c02629","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145964575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DNACSE: Enhancing Genomic LLMs with Contrastive Learning for DNA Barcode Identification DNACSE:增强基因组法学硕士与DNA条形码识别的对比学习。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-13 DOI: 10.1021/acs.jcim.5c02747
Jiadong Wang, , , Bin Wang*, , , Shihua Zhou*, , , Ben Cao*, , , Wei Li, , and , Pan Zheng, 

DNA barcoding is a powerful tool for exploring biodiversity, and DNA language models have significantly facilitated its construction and identification. However, since DNA barcodes come from a specific region of mitochondrial DNA and there are structural differences between DNA barcodes and reference genomes used to train existing DNA language models, it is difficult to directly apply the existing DNA language models to the DNA barcoding task. To address this, this paper introduces DNACSE (DNA Contrastive Learning for Sequence Embeddings), an unsupervised noise-contrastive learning framework designed to fine-tune the DNA language foundation model while enhancing the distribution of the embedding space. The results demonstrate that DNACSE outperforms the direct usage of DNA language models in DNA barcoding-related tasks. Specifically, in fine-tuning and linear probe tasks, it achieves accuracy rates of 99.17 and 98.31%, respectively, surpassing the current state-of-the-art BarcodeBERT by 6.44 and 6.44%. In zero-shot clustering tasks, it raises the adjusted mutual information (AMI) score to 92.25%, an improvement of 8.36%. In addition, zero-shot benchmarking and genomic benchmarking tests are evaluated, indicating that DNACSE enhances the performance of DNA language models in generalized genomic tasks. In summary, DNACSE has demonstrated excellent performance in DNA barcode species classification by making full use of multispecies information and DNA barcode information, providing a feasible way to further explore and protect biodiversity. The code repository is available at https://github.com/Kavicy/DNACSE.

DNA条形码是探索生物多样性的有力工具,DNA语言模型为其构建和鉴定提供了重要的便利。然而,由于DNA条形码来自线粒体DNA的特定区域,并且DNA条形码与用于训练现有DNA语言模型的参考基因组存在结构差异,因此很难将现有DNA语言模型直接应用于DNA条形码任务。为了解决这个问题,本文引入了DNACSE (DNA对比学习序列嵌入),这是一个无监督的噪声对比学习框架,旨在微调DNA语言基础模型,同时增强嵌入空间的分布。结果表明,DNACSE在DNA条形码相关任务中优于直接使用DNA语言模型。具体来说,在微调和线性探测任务中,它的准确率分别达到99.17%和98.31%,比目前最先进的BarcodeBERT分别高出6.44%和6.44%。在零次聚类任务中,将调整后的互信息(AMI)得分提高到92.25%,提高了8.36%。此外,对零基准测试和基因组基准测试进行了评估,表明DNACSE提高了DNA语言模型在广义基因组任务中的性能。综上所述,DNACSE充分利用了多物种信息和DNA条形码信息,在DNA条形码物种分类中表现出优异的性能,为进一步探索和保护生物多样性提供了可行的途径。代码存储库可从https://github.com/Kavicy/DNACSE获得。
{"title":"DNACSE: Enhancing Genomic LLMs with Contrastive Learning for DNA Barcode Identification","authors":"Jiadong Wang,&nbsp;, ,&nbsp;Bin Wang*,&nbsp;, ,&nbsp;Shihua Zhou*,&nbsp;, ,&nbsp;Ben Cao*,&nbsp;, ,&nbsp;Wei Li,&nbsp;, and ,&nbsp;Pan Zheng,&nbsp;","doi":"10.1021/acs.jcim.5c02747","DOIUrl":"10.1021/acs.jcim.5c02747","url":null,"abstract":"<p >DNA barcoding is a powerful tool for exploring biodiversity, and DNA language models have significantly facilitated its construction and identification. However, since DNA barcodes come from a specific region of mitochondrial DNA and there are structural differences between DNA barcodes and reference genomes used to train existing DNA language models, it is difficult to directly apply the existing DNA language models to the DNA barcoding task. To address this, this paper introduces DNACSE (DNA Contrastive Learning for Sequence Embeddings), an unsupervised noise-contrastive learning framework designed to fine-tune the DNA language foundation model while enhancing the distribution of the embedding space. The results demonstrate that DNACSE outperforms the direct usage of DNA language models in DNA barcoding-related tasks. Specifically, in fine-tuning and linear probe tasks, it achieves accuracy rates of 99.17 and 98.31%, respectively, surpassing the current state-of-the-art BarcodeBERT by 6.44 and 6.44%. In zero-shot clustering tasks, it raises the adjusted mutual information (AMI) score to 92.25%, an improvement of 8.36%. In addition, zero-shot benchmarking and genomic benchmarking tests are evaluated, indicating that DNACSE enhances the performance of DNA language models in generalized genomic tasks. In summary, DNACSE has demonstrated excellent performance in DNA barcode species classification by making full use of multispecies information and DNA barcode information, providing a feasible way to further explore and protect biodiversity. The code repository is available at https://github.com/Kavicy/DNACSE.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"976–993"},"PeriodicalIF":5.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Blobulator: A Toolkit for Identification and Visual Exploration of Hydrophobic Modularity in Protein Sequences Blobulator:一个用于蛋白质序列中疏水模块性识别和视觉探索的工具包。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-13 DOI: 10.1021/acs.jcim.5c01585
Connor Pitman, , , Ezry Santiago-McRae, , , Ruchi Lohia, , , Ryan Lamb, , , Kaitlin Bassi, , , Lindsey Riggs, , , Thomas T. Joseph, , , Matthew E. B. Hansen, , and , Grace Brannigan*, 

While contiguous subsequences of hydrophobic residues are essential to protein structure and function, as in the hydrophobic core and transmembrane regions, there are no current bioinformatics tools for module identification focused on hydrophobicity. To fill this gap, we created the blobulator toolkit for detecting, visualizing, and characterizing hydrophobic modules in protein sequences. This toolkit uses our previously developed algorithm, blobulation, which was critical in both interpreting intraprotein contacts in a series of intrinsically disordered protein simulations (Lohia et al., 2019) and defining the “local context” around disease-associated mutations across the human proteome (Lohia et al., 2022). The blobulator toolkit provides accessible, interactive, and scalable implementations of blobulation. These are available via a webtool, a visual molecular dynamics (VMD) plugin, and a command line interface. We highlight use cases for visualization, interaction analysis, and modular annotation through three example applications: a globular protein, two orthologous membrane proteins, and an intrinsically disordered protein. The blobulator webtool can be found at www.blobulator.branniganlab.org, and the source code with pip installable command line tool, as well as the VMD plugin with installation instructions, can be found on GitHub at www.GitHub.com/BranniganLab/blobulator.

虽然疏水残基的连续子序列对蛋白质的结构和功能至关重要,如疏水核心和跨膜区域,但目前还没有生物信息学工具来鉴定模块的疏水性。为了填补这一空白,我们创建了blobulator工具包,用于检测、可视化和表征蛋白质序列中的疏水模块。该工具包使用了我们之前开发的blobulation算法,这对于解释一系列内在无序蛋白质模拟中的蛋白质内接触(Lohia等人,2019)以及定义人类蛋白质组中疾病相关突变的“局部背景”至关重要(Lohia等人,2022)。blobulator工具包提供了blobulation的可访问、交互式和可扩展的实现。这些都可以通过webtool、可视化分子动力学(VMD)插件和命令行界面获得。我们通过三个示例应用程序强调可视化、交互分析和模块化注释的用例:一个球状蛋白、两个同源膜蛋白和一个内在无序蛋白。blobulator webtool可以在www.blobulator.branniganlab.org上找到,pip可安装命令行工具的源代码,以及VMD插件的安装说明,可以在GitHub上找到www.GitHub.com/BranniganLab/blobulator。
{"title":"The Blobulator: A Toolkit for Identification and Visual Exploration of Hydrophobic Modularity in Protein Sequences","authors":"Connor Pitman,&nbsp;, ,&nbsp;Ezry Santiago-McRae,&nbsp;, ,&nbsp;Ruchi Lohia,&nbsp;, ,&nbsp;Ryan Lamb,&nbsp;, ,&nbsp;Kaitlin Bassi,&nbsp;, ,&nbsp;Lindsey Riggs,&nbsp;, ,&nbsp;Thomas T. Joseph,&nbsp;, ,&nbsp;Matthew E. B. Hansen,&nbsp;, and ,&nbsp;Grace Brannigan*,&nbsp;","doi":"10.1021/acs.jcim.5c01585","DOIUrl":"10.1021/acs.jcim.5c01585","url":null,"abstract":"<p >While contiguous subsequences of hydrophobic residues are essential to protein structure and function, as in the hydrophobic core and transmembrane regions, there are no current bioinformatics tools for module identification focused on hydrophobicity. To fill this gap, we created the <i>blobulator</i> toolkit for detecting, visualizing, and characterizing hydrophobic modules in protein sequences. This toolkit uses our previously developed algorithm, blobulation, which was critical in both interpreting intraprotein contacts in a series of intrinsically disordered protein simulations (Lohia et al., 2019) and defining the “local context” around disease-associated mutations across the human proteome (Lohia et al., 2022). The <i>blobulator</i> toolkit provides accessible, interactive, and scalable implementations of blobulation. These are available via a webtool, a visual molecular dynamics (VMD) plugin, and a command line interface. We highlight use cases for visualization, interaction analysis, and modular annotation through three example applications: a globular protein, two orthologous membrane proteins, and an intrinsically disordered protein. The <i>blobulator</i> webtool can be found at www.blobulator.branniganlab.org, and the source code with pip installable command line tool, as well as the VMD plugin with installation instructions, can be found on GitHub at www.GitHub.com/BranniganLab/blobulator.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"820–828"},"PeriodicalIF":5.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c01585","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145964578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Chemical Information and Modeling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1