Pub Date : 2026-01-21DOI: 10.1021/acs.jcim.5c02441
Oleksandra Herasymenko,Madhushika Silva,Galen J. Correy,Abd Al-Aziz A. Abu-Saleh,Suzanne Ackloo,Cheryl Arrowsmith,Alan Ashworth,Fuqiang Ban,Hartmut Beck,Kevin P. Bishop,Hugo J. Bohórquez,Albina Bolotokova,Marko Breznik,Irene Chau,Yu Chen,Artem Cherkasov,Wim Dehaen,Dennis Della Corte,Katrin Denzinger,Niklas P. Doering,Kristina Edfeldt,Aled Edwards,Darren Fayne,Francesco Gentile,Elisa Gibson,Ozan Gokdemir,Anders Gunnarsson,Judith Günther,John J. Irwin,Jan Halborg Jensen,Rachel J. Harding,Alexander Hillisch,Laurent Hoffer,Anders Hogner,Ashley Hutchinson,Shubhangi Kandwal,Andrea Karlova,Kushal Koirala,Sergei Kotelnikov,Dima Kozakov,Juyong Lee,Soowon Lee,Uta Lessel,Sijie Liu,Xuefeng Liu,Peter Loppnau,Jens Meiler,Rocco Moretti,Yurii S. Moroz,Charuvaka Muvva,Tudor I. Oprea,Brooks Paige,Amit Pandit,Keunwan Park,Gennady Poda,Mykola V. Protopopov,Vera Pütter,Rahul Ravichandran,Didier Rognan,Edina Rosta,Yogesh Sabnis,Thomas Scott,Almagul Seitova,Purshotam Sharma,François Sindt,Minghu Song,Casper Steinmann,Rick Stevens,Valerij Talagayev,Valentyna V. Tararina,Olga Tarkhanova,Damon Tingey,John F. Trant,Dakota Treleaven,Alexander Tropsha,Patrick Walters,Jude Wells,Yvonne Westermaier,Gerhard Wolber,Lars Wortmann,Shuangjia Zheng,James S. Fraser,Matthieu Schapira
The third Critical Assessment of Computational Hit-finding Experiments (CACHE) challenged computational teams to identify chemically novel ligands targeting the macrodomain 1 of SARS-CoV-2 Nsp3, a promising coronavirus drug target. Twenty-three groups deployed diverse design strategies to collectively select 1739 ligand candidates. While over 85% of the designed molecules were chemically novel, the best experimentally confirmed hits were structurally similar to previously published compounds. Confirming a trend observed in CACHE #1 and #2, two of the best-performing workflows used compounds selected by physics-based computational screening methods to train machine learning models able to rapidly screen large chemical libraries, while four others used exclusively physics-based approaches. Three pharmacophore searches and one fragment growing strategy were also part of the seven winning workflows. While active molecules discovered by CACHE #3 participants largely mimicked the adenine ring of the endogenous substrate, ADP-ribose, preserving the canonical chemotype commonly observed in previously reported Nsp3-Mac1 ligands, they still provide novel structure–activity relationship insights that may inform the development of future antivirals. Collectively, these results show that multiple molecular design strategies can efficiently converge on similar potent molecules.
{"title":"CACHE Challenge #3: Targeting the Nsp3 Macrodomain of SARS-CoV-2","authors":"Oleksandra Herasymenko,Madhushika Silva,Galen J. Correy,Abd Al-Aziz A. Abu-Saleh,Suzanne Ackloo,Cheryl Arrowsmith,Alan Ashworth,Fuqiang Ban,Hartmut Beck,Kevin P. Bishop,Hugo J. Bohórquez,Albina Bolotokova,Marko Breznik,Irene Chau,Yu Chen,Artem Cherkasov,Wim Dehaen,Dennis Della Corte,Katrin Denzinger,Niklas P. Doering,Kristina Edfeldt,Aled Edwards,Darren Fayne,Francesco Gentile,Elisa Gibson,Ozan Gokdemir,Anders Gunnarsson,Judith Günther,John J. Irwin,Jan Halborg Jensen,Rachel J. Harding,Alexander Hillisch,Laurent Hoffer,Anders Hogner,Ashley Hutchinson,Shubhangi Kandwal,Andrea Karlova,Kushal Koirala,Sergei Kotelnikov,Dima Kozakov,Juyong Lee,Soowon Lee,Uta Lessel,Sijie Liu,Xuefeng Liu,Peter Loppnau,Jens Meiler,Rocco Moretti,Yurii S. Moroz,Charuvaka Muvva,Tudor I. Oprea,Brooks Paige,Amit Pandit,Keunwan Park,Gennady Poda,Mykola V. Protopopov,Vera Pütter,Rahul Ravichandran,Didier Rognan,Edina Rosta,Yogesh Sabnis,Thomas Scott,Almagul Seitova,Purshotam Sharma,François Sindt,Minghu Song,Casper Steinmann,Rick Stevens,Valerij Talagayev,Valentyna V. Tararina,Olga Tarkhanova,Damon Tingey,John F. Trant,Dakota Treleaven,Alexander Tropsha,Patrick Walters,Jude Wells,Yvonne Westermaier,Gerhard Wolber,Lars Wortmann,Shuangjia Zheng,James S. Fraser,Matthieu Schapira","doi":"10.1021/acs.jcim.5c02441","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02441","url":null,"abstract":"The third Critical Assessment of Computational Hit-finding Experiments (CACHE) challenged computational teams to identify chemically novel ligands targeting the macrodomain 1 of SARS-CoV-2 Nsp3, a promising coronavirus drug target. Twenty-three groups deployed diverse design strategies to collectively select 1739 ligand candidates. While over 85% of the designed molecules were chemically novel, the best experimentally confirmed hits were structurally similar to previously published compounds. Confirming a trend observed in CACHE #1 and #2, two of the best-performing workflows used compounds selected by physics-based computational screening methods to train machine learning models able to rapidly screen large chemical libraries, while four others used exclusively physics-based approaches. Three pharmacophore searches and one fragment growing strategy were also part of the seven winning workflows. While active molecules discovered by CACHE #3 participants largely mimicked the adenine ring of the endogenous substrate, ADP-ribose, preserving the canonical chemotype commonly observed in previously reported Nsp3-Mac1 ligands, they still provide novel structure–activity relationship insights that may inform the development of future antivirals. Collectively, these results show that multiple molecular design strategies can efficiently converge on similar potent molecules.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"6 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1021/acs.jcim.5c01924
Jean V Sampaio,Andrielly H S Costa,Aline O Albuquerque,Júlia S Souza,Diego S Almeida,Eduardo M Gaieta,Matheus V Almeida,Geraldo R Sartori,João H M Silva
The utilization of predictive tools has become increasingly prevalent in the development of biopharmaceuticals, reducing the time and cost of research. However, most methods for computational antibody design are hampered by their reliance on scarcely available antibody structures, potential for immunogenic modifications, and a restricted exploration of the paratope's potential chemical and conformational space. We propose Ab-SELDON, a modular and easily customizable antibody design pipeline capable of iteratively optimizing an antibody-antigen (Ab-Ag) interaction in five different modification steps, including CDR and framework grafting, and mutagenesis. The optimization process is guided by diversity data collected from millions of publicly available human antibody sequences. This approach enhanced the exploration of the chemical and conformational space of the paratope during computational tests involving the optimization of an anti-HER2 antibody. Optimization of another antibody against Gal-3BP stabilized the Ab-Ag interaction in molecular dynamics simulations at lower runtime than alternative pipelines. Tests with SKEMPI's Ab-Ag mutations also demonstrated the pipeline's ability to correctly identify the effect of the majority of mutations, especially multipoint and those that increased binding affinity. This freely available pipeline presents a new approach for computationally efficient and automated in silico antibody design, thereby facilitating the development of new biopharmaceuticals.
{"title":"Ab-SELDON: Leveraging Diversity Data for an Efficient Automated Computational Pipeline for Antibody Design.","authors":"Jean V Sampaio,Andrielly H S Costa,Aline O Albuquerque,Júlia S Souza,Diego S Almeida,Eduardo M Gaieta,Matheus V Almeida,Geraldo R Sartori,João H M Silva","doi":"10.1021/acs.jcim.5c01924","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c01924","url":null,"abstract":"The utilization of predictive tools has become increasingly prevalent in the development of biopharmaceuticals, reducing the time and cost of research. However, most methods for computational antibody design are hampered by their reliance on scarcely available antibody structures, potential for immunogenic modifications, and a restricted exploration of the paratope's potential chemical and conformational space. We propose Ab-SELDON, a modular and easily customizable antibody design pipeline capable of iteratively optimizing an antibody-antigen (Ab-Ag) interaction in five different modification steps, including CDR and framework grafting, and mutagenesis. The optimization process is guided by diversity data collected from millions of publicly available human antibody sequences. This approach enhanced the exploration of the chemical and conformational space of the paratope during computational tests involving the optimization of an anti-HER2 antibody. Optimization of another antibody against Gal-3BP stabilized the Ab-Ag interaction in molecular dynamics simulations at lower runtime than alternative pipelines. Tests with SKEMPI's Ab-Ag mutations also demonstrated the pipeline's ability to correctly identify the effect of the majority of mutations, especially multipoint and those that increased binding affinity. This freely available pipeline presents a new approach for computationally efficient and automated in silico antibody design, thereby facilitating the development of new biopharmaceuticals.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"30 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Drug-target interactions (DTIs) are the basis of the therapeutic effect of drugs, whose accurate prediction helps reduce the cost and time of experimental screening in drug development process. Present methods for DTIs prediction often focus on the study of molecular topological structure, which weakens spatial information such as the relative position of atoms and bond angle, and fail to effectively integrate molecular information with association network information. To address this issue, we propose a novel Geometry-enhanced Multiscale Joint Representation Learning method for drug-target interaction prediction (GMJRL). GMJRL not only considers the global information in the drug-target network from the macro-scale, but also extracts the geometric structure information on the drug and the target from the microscale, including the bond angle information on the drug and the atomic coordinate information on the target. To effectively fuse different scale representations, we develop a joint representation learning method with self-attention, which can capture correlations within the same scale and consider the interscale relationships, thus achieving effective fusion of the macro-scale and microscale representations. Finally, this study introduces a negative sampling algorithm to select reliable negative samples from unlabeled drug-target pairs. Extensive experiments validate that GMJRL yields promising outcomes in predicting drug-target interactions.
{"title":"Geometry-Enhanced Multiscale Joint Representation Learning for Drug-Target Interaction Prediction.","authors":"Qiao Ning,Shaohang Qiao,Yawen Cai,Yanpeng Liu,Hui Li,Qian Ma,Shikai Guo","doi":"10.1021/acs.jcim.5c02347","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02347","url":null,"abstract":"Drug-target interactions (DTIs) are the basis of the therapeutic effect of drugs, whose accurate prediction helps reduce the cost and time of experimental screening in drug development process. Present methods for DTIs prediction often focus on the study of molecular topological structure, which weakens spatial information such as the relative position of atoms and bond angle, and fail to effectively integrate molecular information with association network information. To address this issue, we propose a novel Geometry-enhanced Multiscale Joint Representation Learning method for drug-target interaction prediction (GMJRL). GMJRL not only considers the global information in the drug-target network from the macro-scale, but also extracts the geometric structure information on the drug and the target from the microscale, including the bond angle information on the drug and the atomic coordinate information on the target. To effectively fuse different scale representations, we develop a joint representation learning method with self-attention, which can capture correlations within the same scale and consider the interscale relationships, thus achieving effective fusion of the macro-scale and microscale representations. Finally, this study introduces a negative sampling algorithm to select reliable negative samples from unlabeled drug-target pairs. Extensive experiments validate that GMJRL yields promising outcomes in predicting drug-target interactions.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1021/acs.jcim.5c02616
Wenjuan Yi,Zhengdong Xu,Dushuo Feng,Lulu Guan,Jiaxing Tang,Yu Zou
Cytoplasmic accumulation of the transactive response deoxyribonucleic acid (DNA)-binding protein of 43 kDa (TDP-43) aggregates represents the primary pathological hallmark of TDP-43 proteinopathies including amyotrophic lateral sclerosis (ALS) and chronic traumatic encephalopathy (CTE). Inhibiting TDP-43 aggregation or disrupting its preformed fibrils might be promising strategies to prevent or delay the development of TDP-43 proteinopathies. Recently, the green tea polyphenol, epigallocatechin gallate (EGCG), was observed to prevent the formation of TDP-43 oligomeric species and fibrillar aggregates. Nevertheless, the atomic-level mechanism of this inhibition has been incompletely characterized. In this study, we performed a multitude of replica exchange with solute tempering 2 (REST2) and all-atom molecular dynamics (MD) simulations of 46.8 μs in total on TDP-43 models with and without EGCG. The REST2 simulation results revealed that EGCG impedes the β-sheet structure formation and interferes the interchain interaction of TDP-43304-348 dimer. Subsequent analyses show that EGCG could alter the distribution of free energy landscape and hinder the residue-residue interaction of the dimer. The binding analyses confirmed that EGCG preferentially bound to M307, F313, F316, W334, M339, Q344, and Q346 residues, and hydrophobic, polar, and π-π stacking interactions dominate the binding of EGCG on the dimer. Additional conventional molecular dynamics (MD) simulations demonstrated that the protofibrillar tetramer is the minimal stable TDP-43304-348 protofibril. Taking the tetramer as a protofibril model, we found that EGCG could reduce the structural stability and disrupt the β-sheet structure of TDP-43304-348 protofibril, thus possessing a destabilization effect on its higher-order structure. This investigation unveils the atomic-level mechanism by which EGCG against TDP-43 aggregation, which may provide potential fundamental knowledge of therapeutic strategies for TDP-43 proteinopathies.
{"title":"Computational Exploration of the Molecular Mechanism of Epigallocatechin Gallate against TDP-43 Aggregation.","authors":"Wenjuan Yi,Zhengdong Xu,Dushuo Feng,Lulu Guan,Jiaxing Tang,Yu Zou","doi":"10.1021/acs.jcim.5c02616","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02616","url":null,"abstract":"Cytoplasmic accumulation of the transactive response deoxyribonucleic acid (DNA)-binding protein of 43 kDa (TDP-43) aggregates represents the primary pathological hallmark of TDP-43 proteinopathies including amyotrophic lateral sclerosis (ALS) and chronic traumatic encephalopathy (CTE). Inhibiting TDP-43 aggregation or disrupting its preformed fibrils might be promising strategies to prevent or delay the development of TDP-43 proteinopathies. Recently, the green tea polyphenol, epigallocatechin gallate (EGCG), was observed to prevent the formation of TDP-43 oligomeric species and fibrillar aggregates. Nevertheless, the atomic-level mechanism of this inhibition has been incompletely characterized. In this study, we performed a multitude of replica exchange with solute tempering 2 (REST2) and all-atom molecular dynamics (MD) simulations of 46.8 μs in total on TDP-43 models with and without EGCG. The REST2 simulation results revealed that EGCG impedes the β-sheet structure formation and interferes the interchain interaction of TDP-43304-348 dimer. Subsequent analyses show that EGCG could alter the distribution of free energy landscape and hinder the residue-residue interaction of the dimer. The binding analyses confirmed that EGCG preferentially bound to M307, F313, F316, W334, M339, Q344, and Q346 residues, and hydrophobic, polar, and π-π stacking interactions dominate the binding of EGCG on the dimer. Additional conventional molecular dynamics (MD) simulations demonstrated that the protofibrillar tetramer is the minimal stable TDP-43304-348 protofibril. Taking the tetramer as a protofibril model, we found that EGCG could reduce the structural stability and disrupt the β-sheet structure of TDP-43304-348 protofibril, thus possessing a destabilization effect on its higher-order structure. This investigation unveils the atomic-level mechanism by which EGCG against TDP-43 aggregation, which may provide potential fundamental knowledge of therapeutic strategies for TDP-43 proteinopathies.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"44 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phase separation in bilayers composed of a few lipid species is widely used as a model for exploring the lateral heterogeneity of complex cell membranes. Molecular dynamics (MD) simulations offer atomistic insights into coexisting lipid phases. But identifying these phases from trajectories remains challenging. Here, we present an unsupervised method for lipid phase recognition in phase-separated bilayers. In this method, the membrane plane is first discretized into pixels. For each pixel, the local lipid packing degree, which is defined as the atomic density within that pixel, is calculated and assigned to the corresponding pixel. A threshold is then determined by fitting a two-component Gaussian mixture model (GMM) to the distribution of lipid packing degree, enabling phase state assignment to pixels and subsequent mapping back to lipids. Our method is applicable to different systems, regardless of their compositions or temperatures, thus minimizing potential artifacts. Tests on bilayers with diverse lipid compositions and temperatures show that our method outperforms the commonly used hidden Markov model (HMM) in both accuracy and robustness. Notably, in this method, phase recognition relies solely on bilayer-intrinsic properties (lipid packing degree), without requiring temporal information, labeled data, or assumptions about the local lipid environment. This makes our method broadly applicable to various tasks, including characterizing the phase transformation process before the system reaches equilibration and identifying coexisting phases in protein-containing bilayers. In summary, we provide a robust and accurate framework for identifying coexisting phases in bilayers and tracking their dynamic transitions in simulations.
{"title":"Recognition of Coexisting Phases in Model Membranes via an Unsupervised Method.","authors":"Yuzhuo Dai,Jianwei Zhao,Beibei Wang,Qing Liang,Ruo-Xu Gu","doi":"10.1021/acs.jcim.5c02665","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02665","url":null,"abstract":"Phase separation in bilayers composed of a few lipid species is widely used as a model for exploring the lateral heterogeneity of complex cell membranes. Molecular dynamics (MD) simulations offer atomistic insights into coexisting lipid phases. But identifying these phases from trajectories remains challenging. Here, we present an unsupervised method for lipid phase recognition in phase-separated bilayers. In this method, the membrane plane is first discretized into pixels. For each pixel, the local lipid packing degree, which is defined as the atomic density within that pixel, is calculated and assigned to the corresponding pixel. A threshold is then determined by fitting a two-component Gaussian mixture model (GMM) to the distribution of lipid packing degree, enabling phase state assignment to pixels and subsequent mapping back to lipids. Our method is applicable to different systems, regardless of their compositions or temperatures, thus minimizing potential artifacts. Tests on bilayers with diverse lipid compositions and temperatures show that our method outperforms the commonly used hidden Markov model (HMM) in both accuracy and robustness. Notably, in this method, phase recognition relies solely on bilayer-intrinsic properties (lipid packing degree), without requiring temporal information, labeled data, or assumptions about the local lipid environment. This makes our method broadly applicable to various tasks, including characterizing the phase transformation process before the system reaches equilibration and identifying coexisting phases in protein-containing bilayers. In summary, we provide a robust and accurate framework for identifying coexisting phases in bilayers and tracking their dynamic transitions in simulations.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"39 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1021/acs.jcim.5c02506
Iván Cortés,Cristina Cuadrado,José A Gavín,María Marta Zanardi,Antonio Hernández Daranas,Ariel M Sarotti
Quantum-mechanical NMR (QM-NMR) is widely used in structure elucidation. A long-sought holey grail in this field is solving structures from a simple 1H NMR spectrum with AI-driven workflows. Yet, solvent effects on chemical shifts, though long recognized, remain overlooked. We show in a theory-experiment study that implicit solvation models miss solvent-induced variations and introduce a Python tool to quantify solvent sensitivity, aiding more reliable QM-NMR structural assignments.
{"title":"Solvent Matters: Bridging Theory and Experiment in Quantum-Mechanical NMR Structural Elucidation.","authors":"Iván Cortés,Cristina Cuadrado,José A Gavín,María Marta Zanardi,Antonio Hernández Daranas,Ariel M Sarotti","doi":"10.1021/acs.jcim.5c02506","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02506","url":null,"abstract":"Quantum-mechanical NMR (QM-NMR) is widely used in structure elucidation. A long-sought holey grail in this field is solving structures from a simple 1H NMR spectrum with AI-driven workflows. Yet, solvent effects on chemical shifts, though long recognized, remain overlooked. We show in a theory-experiment study that implicit solvation models miss solvent-induced variations and introduce a Python tool to quantify solvent sensitivity, aiding more reliable QM-NMR structural assignments.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"99 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1021/acs.jcim.5c02375
Erika M. Herrera Machado,Jakob L. Andersen,Rolf Fagerberg,Daniel Merkle
In this study, we introduce a sensitivity analysis methodology for stochastic systems in chemistry, where dynamics are often governed by random processes. Our approach is based on gradient estimation via finite differences, averaging simulation outcomes, and analyzing variability under intrinsic noise. We characterize gradient uncertainty as an angular range within which all plausible gradient directions are expected to lie. A key feature of our approach is that this uncertainty measure adaptively guides the number of simulations performed for each nominal-perturbation pair of points in order to minimize unnecessary computations while maintaining robustness. Systematically exploring a range of parameter values across the parameter space, rather than focusing on a single value, allows us to identify not only sensitive parameters but also regions of parameter space associated with different levels of sensitivity. These results are visualized through vector field plots to offer an intuitive representation of local sensitivity across parameter space. Additionally, global sensitivity coefficients over sampled points in the parameter space are computed to capture overall trends. Flexibility regarding the choice of output observable measures is another key feature of our method: while traditional sensitivity analyses often focus on species concentrations, our framework allows for the definition of a large range of problem-specific observables. This makes it broadly applicable in diverse chemical and biochemical scenarios. We demonstrate our approach on two systems: classical Michaelis–Menten kinetics and a rule-based model of the formose reaction, using the cheminformatics software MØD for Gillespie-based stochastic simulations.
{"title":"A Sensitivity Analysis Methodology for Rule-Based Stochastic Chemical Systems","authors":"Erika M. Herrera Machado,Jakob L. Andersen,Rolf Fagerberg,Daniel Merkle","doi":"10.1021/acs.jcim.5c02375","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02375","url":null,"abstract":"In this study, we introduce a sensitivity analysis methodology for stochastic systems in chemistry, where dynamics are often governed by random processes. Our approach is based on gradient estimation via finite differences, averaging simulation outcomes, and analyzing variability under intrinsic noise. We characterize gradient uncertainty as an angular range within which all plausible gradient directions are expected to lie. A key feature of our approach is that this uncertainty measure adaptively guides the number of simulations performed for each nominal-perturbation pair of points in order to minimize unnecessary computations while maintaining robustness. Systematically exploring a range of parameter values across the parameter space, rather than focusing on a single value, allows us to identify not only sensitive parameters but also regions of parameter space associated with different levels of sensitivity. These results are visualized through vector field plots to offer an intuitive representation of local sensitivity across parameter space. Additionally, global sensitivity coefficients over sampled points in the parameter space are computed to capture overall trends. Flexibility regarding the choice of output observable measures is another key feature of our method: while traditional sensitivity analyses often focus on species concentrations, our framework allows for the definition of a large range of problem-specific observables. This makes it broadly applicable in diverse chemical and biochemical scenarios. We demonstrate our approach on two systems: classical Michaelis–Menten kinetics and a rule-based model of the formose reaction, using the cheminformatics software MØD for Gillespie-based stochastic simulations.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"270 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1021/acs.jcim.5c02017
F Alexander Sepúlveda,Daniel Cerro-Ramos,T Jesper Jacobsson
Accurately modeling the complex relationships among synthesis parameters, material compositions, and performance metrics is essential for accelerating the development of perovskite solar cells (PSCs). In this context, machine learning (ML) has proven to be a valuable tool. While most ML applications in PSC research rely on discriminative "black-box" models, this study adopts a generative approach by modeling the joint probability density function. We employ Gaussian Mixture Models (GMMs), a pragmatic and interpretable choice well-suited for the scarce, low-dimensional tabular data typical of PSC research. This single GMM framework is evaluated on five distinct tasks: discovering clusters, regression, generating novel configurations, training on data sets with missing data and, inverse design of the experimental (synthesis) conditions. That is, assuming we have the perovskite material composition and a target PCE, we infer the experimental conditions. For this latter task we use a novel "GMM-Assisted Optimization" method, which demonstrates to be more effective than standard random-start optimization, achieving an RMSE of 1.52 against target PCEs, more than halving the 3.32 RMSE of the baseline. These findings highlight the power of probabilistic modeling for data-driven discovery in PSC research.
{"title":"Density Estimation Based on Mixtures of Gaussians for Perovskite Solar Cells Modeling.","authors":"F Alexander Sepúlveda,Daniel Cerro-Ramos,T Jesper Jacobsson","doi":"10.1021/acs.jcim.5c02017","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02017","url":null,"abstract":"Accurately modeling the complex relationships among synthesis parameters, material compositions, and performance metrics is essential for accelerating the development of perovskite solar cells (PSCs). In this context, machine learning (ML) has proven to be a valuable tool. While most ML applications in PSC research rely on discriminative \"black-box\" models, this study adopts a generative approach by modeling the joint probability density function. We employ Gaussian Mixture Models (GMMs), a pragmatic and interpretable choice well-suited for the scarce, low-dimensional tabular data typical of PSC research. This single GMM framework is evaluated on five distinct tasks: discovering clusters, regression, generating novel configurations, training on data sets with missing data and, inverse design of the experimental (synthesis) conditions. That is, assuming we have the perovskite material composition and a target PCE, we infer the experimental conditions. For this latter task we use a novel \"GMM-Assisted Optimization\" method, which demonstrates to be more effective than standard random-start optimization, achieving an RMSE of 1.52 against target PCEs, more than halving the 3.32 RMSE of the baseline. These findings highlight the power of probabilistic modeling for data-driven discovery in PSC research.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"85 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145994807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1021/acs.jcim.5c02375
Erika M. Herrera Machado,Jakob L. Andersen,Rolf Fagerberg,Daniel Merkle
In this study, we introduce a sensitivity analysis methodology for stochastic systems in chemistry, where dynamics are often governed by random processes. Our approach is based on gradient estimation via finite differences, averaging simulation outcomes, and analyzing variability under intrinsic noise. We characterize gradient uncertainty as an angular range within which all plausible gradient directions are expected to lie. A key feature of our approach is that this uncertainty measure adaptively guides the number of simulations performed for each nominal-perturbation pair of points in order to minimize unnecessary computations while maintaining robustness. Systematically exploring a range of parameter values across the parameter space, rather than focusing on a single value, allows us to identify not only sensitive parameters but also regions of parameter space associated with different levels of sensitivity. These results are visualized through vector field plots to offer an intuitive representation of local sensitivity across parameter space. Additionally, global sensitivity coefficients over sampled points in the parameter space are computed to capture overall trends. Flexibility regarding the choice of output observable measures is another key feature of our method: while traditional sensitivity analyses often focus on species concentrations, our framework allows for the definition of a large range of problem-specific observables. This makes it broadly applicable in diverse chemical and biochemical scenarios. We demonstrate our approach on two systems: classical Michaelis–Menten kinetics and a rule-based model of the formose reaction, using the cheminformatics software MØD for Gillespie-based stochastic simulations.
{"title":"A Sensitivity Analysis Methodology for Rule-Based Stochastic Chemical Systems","authors":"Erika M. Herrera Machado,Jakob L. Andersen,Rolf Fagerberg,Daniel Merkle","doi":"10.1021/acs.jcim.5c02375","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02375","url":null,"abstract":"In this study, we introduce a sensitivity analysis methodology for stochastic systems in chemistry, where dynamics are often governed by random processes. Our approach is based on gradient estimation via finite differences, averaging simulation outcomes, and analyzing variability under intrinsic noise. We characterize gradient uncertainty as an angular range within which all plausible gradient directions are expected to lie. A key feature of our approach is that this uncertainty measure adaptively guides the number of simulations performed for each nominal-perturbation pair of points in order to minimize unnecessary computations while maintaining robustness. Systematically exploring a range of parameter values across the parameter space, rather than focusing on a single value, allows us to identify not only sensitive parameters but also regions of parameter space associated with different levels of sensitivity. These results are visualized through vector field plots to offer an intuitive representation of local sensitivity across parameter space. Additionally, global sensitivity coefficients over sampled points in the parameter space are computed to capture overall trends. Flexibility regarding the choice of output observable measures is another key feature of our method: while traditional sensitivity analyses often focus on species concentrations, our framework allows for the definition of a large range of problem-specific observables. This makes it broadly applicable in diverse chemical and biochemical scenarios. We demonstrate our approach on two systems: classical Michaelis–Menten kinetics and a rule-based model of the formose reaction, using the cheminformatics software MØD for Gillespie-based stochastic simulations.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"9 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-18DOI: 10.1021/acs.jcim.5c03021
Miriam Gulman,Jordan Chill,Dan Thomas Major
Understanding protein-peptide interactions is essential for uncovering cellular signaling mechanisms and advancing therapeutic development, as these interactions play central roles in numerous biological processes. Gaining structural insight into such complexes is crucial, yet traditional methods like nuclear magnetic resonance (NMR) and X-ray crystallography are often time-consuming and experimentally demanding. Computational approaches─including physics-based docking and deep-learning (DL) structure predictors such as AlphaFold3, Boltz-2, and Chai-1─offer powerful alternatives. Accurately modeling flexible peptides that bind to shallow, surface-exposed regions remains difficult for physics-based methods, and although multiple sequence alignment-driven DL models can achieve excellent performance in well-behaved systems, they too can struggle when the peptide adopts noncanonical conformations or when sequence identity is low. In such cases, distance restraints are often required to guide the docking toward accurate and biologically meaningful solutions, yet acquiring multiple high-quality restraints is often difficult. To address the limitation of physics and DL approaches, we developed a restraint scoring function that integrates evolutionary conservation, spatial proximity, and geometric distribution to assess the informativeness of restraint sets. This enables a more accurate evaluation of docking inputs and overcomes the shortcomings of relying solely on restraint count. Building on this framework, we introduce a minimal-restraint docking strategy, capable of identifying optimized subsets of restraints that lead to high-quality structural models. We evaluate a comprehensive set of protein-peptide systems, including 43 SH3 domain complexes, 8 WW domain complexes, and 19 medium-difficulty cases from the PepPCBench benchmark. Our approach shows that model quality improves as the restraint score increases, supporting restraint score as a simple, interpretable indicator of docking success. We further identify clear, domain-specific restraint-score thresholds for the SH3 and WW systems that enable accurate model selection. Together, these results offer a scalable and efficient strategy for structure prediction in data-limited contexts and lay the groundwork for restraint-informed modeling with quantifiable confidence, as well as a powerful foundation for data-efficient machine learning-based peptide-protein docking.
{"title":"Restraint Quality, Not Quantity, Predicts Peptide-Protein Docking Outcomes.","authors":"Miriam Gulman,Jordan Chill,Dan Thomas Major","doi":"10.1021/acs.jcim.5c03021","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c03021","url":null,"abstract":"Understanding protein-peptide interactions is essential for uncovering cellular signaling mechanisms and advancing therapeutic development, as these interactions play central roles in numerous biological processes. Gaining structural insight into such complexes is crucial, yet traditional methods like nuclear magnetic resonance (NMR) and X-ray crystallography are often time-consuming and experimentally demanding. Computational approaches─including physics-based docking and deep-learning (DL) structure predictors such as AlphaFold3, Boltz-2, and Chai-1─offer powerful alternatives. Accurately modeling flexible peptides that bind to shallow, surface-exposed regions remains difficult for physics-based methods, and although multiple sequence alignment-driven DL models can achieve excellent performance in well-behaved systems, they too can struggle when the peptide adopts noncanonical conformations or when sequence identity is low. In such cases, distance restraints are often required to guide the docking toward accurate and biologically meaningful solutions, yet acquiring multiple high-quality restraints is often difficult. To address the limitation of physics and DL approaches, we developed a restraint scoring function that integrates evolutionary conservation, spatial proximity, and geometric distribution to assess the informativeness of restraint sets. This enables a more accurate evaluation of docking inputs and overcomes the shortcomings of relying solely on restraint count. Building on this framework, we introduce a minimal-restraint docking strategy, capable of identifying optimized subsets of restraints that lead to high-quality structural models. We evaluate a comprehensive set of protein-peptide systems, including 43 SH3 domain complexes, 8 WW domain complexes, and 19 medium-difficulty cases from the PepPCBench benchmark. Our approach shows that model quality improves as the restraint score increases, supporting restraint score as a simple, interpretable indicator of docking success. We further identify clear, domain-specific restraint-score thresholds for the SH3 and WW systems that enable accurate model selection. Together, these results offer a scalable and efficient strategy for structure prediction in data-limited contexts and lay the groundwork for restraint-informed modeling with quantifiable confidence, as well as a powerful foundation for data-efficient machine learning-based peptide-protein docking.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"57 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145994808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}