Pub Date : 2026-01-21DOI: 10.1021/acs.jcim.5c02441
Oleksandra Herasymenko, , , Madhushika Silva, , , Galen J. Correy, , , Abd Al-Aziz A. Abu-Saleh, , , Suzanne Ackloo, , , Cheryl Arrowsmith, , , Alan Ashworth, , , Fuqiang Ban, , , Hartmut Beck, , , Kevin P. Bishop, , , Hugo J. Bohórquez, , , Albina Bolotokova, , , Marko Breznik, , , Irene Chau, , , Yu Chen, , , Artem Cherkasov, , , Wim Dehaen, , , Dennis Della Corte, , , Katrin Denzinger, , , Niklas P. Doering, , , Kristina Edfeldt, , , Aled Edwards, , , Darren Fayne, , , Francesco Gentile, , , Elisa Gibson, , , Ozan Gokdemir, , , Anders Gunnarsson, , , Judith Günther, , , John J. Irwin, , , Jan Halborg Jensen, , , Rachel J. Harding, , , Alexander Hillisch, , , Laurent Hoffer, , , Anders Hogner, , , Ashley Hutchinson, , , Shubhangi Kandwal, , , Andrea Karlova, , , Kushal Koirala, , , Sergei Kotelnikov, , , Dima Kozakov, , , Juyong Lee, , , Soowon Lee, , , Uta Lessel, , , Sijie Liu, , , Xuefeng Liu, , , Peter Loppnau, , , Jens Meiler, , , Rocco Moretti, , , Yurii S. Moroz, , , Charuvaka Muvva, , , Tudor I. Oprea, , , Brooks Paige, , , Amit Pandit, , , Keunwan Park, , , Gennady Poda, , , Mykola V. Protopopov, , , Vera Pütter, , , Rahul Ravichandran, , , Didier Rognan, , , Edina Rosta, , , Yogesh Sabnis, , , Thomas Scott, , , Almagul Seitova, , , Purshotam Sharma, , , François Sindt, , , Minghu Song, , , Casper Steinmann, , , Rick Stevens, , , Valerij Talagayev, , , Valentyna V. Tararina, , , Olga Tarkhanova, , , Damon Tingey, , , John F. Trant, , , Dakota Treleaven, , , Alexander Tropsha, , , Patrick Walters, , , Jude Wells, , , Yvonne Westermaier, , , Gerhard Wolber, , , Lars Wortmann, , , Shuangjia Zheng, , , James S. Fraser*, , and , Matthieu Schapira*,
The third Critical Assessment of Computational Hit-finding Experiments (CACHE) challenged computational teams to identify chemically novel ligands targeting the macrodomain 1 of SARS-CoV-2 Nsp3, a promising coronavirus drug target. Twenty-three groups deployed diverse design strategies to collectively select 1739 ligand candidates. While over 85% of the designed molecules were chemically novel, the best experimentally confirmed hits were structurally similar to previously published compounds. Confirming a trend observed in CACHE #1 and #2, two of the best-performing workflows used compounds selected by physics-based computational screening methods to train machine learning models able to rapidly screen large chemical libraries, while four others used exclusively physics-based approaches. Three pharmacophore searches and one fragment growing strategy were also part of the seven winning workflows. While active molecules discovered by CACHE #3 participants largely mimicked the adenine ring of the endogenous substrate, ADP-ribose, preserving the canonical chemotype commonly observed in previously reported Nsp3-Mac1 ligands, they still provide novel structure–activity relationship insights that may inform the development of future antivirals. Collectively, these results show that multiple molecular design strategies can efficiently converge on similar potent molecules.
{"title":"CACHE Challenge #3: Targeting the Nsp3 Macrodomain of SARS-CoV-2","authors":"Oleksandra Herasymenko, , , Madhushika Silva, , , Galen J. Correy, , , Abd Al-Aziz A. Abu-Saleh, , , Suzanne Ackloo, , , Cheryl Arrowsmith, , , Alan Ashworth, , , Fuqiang Ban, , , Hartmut Beck, , , Kevin P. Bishop, , , Hugo J. Bohórquez, , , Albina Bolotokova, , , Marko Breznik, , , Irene Chau, , , Yu Chen, , , Artem Cherkasov, , , Wim Dehaen, , , Dennis Della Corte, , , Katrin Denzinger, , , Niklas P. Doering, , , Kristina Edfeldt, , , Aled Edwards, , , Darren Fayne, , , Francesco Gentile, , , Elisa Gibson, , , Ozan Gokdemir, , , Anders Gunnarsson, , , Judith Günther, , , John J. Irwin, , , Jan Halborg Jensen, , , Rachel J. Harding, , , Alexander Hillisch, , , Laurent Hoffer, , , Anders Hogner, , , Ashley Hutchinson, , , Shubhangi Kandwal, , , Andrea Karlova, , , Kushal Koirala, , , Sergei Kotelnikov, , , Dima Kozakov, , , Juyong Lee, , , Soowon Lee, , , Uta Lessel, , , Sijie Liu, , , Xuefeng Liu, , , Peter Loppnau, , , Jens Meiler, , , Rocco Moretti, , , Yurii S. Moroz, , , Charuvaka Muvva, , , Tudor I. Oprea, , , Brooks Paige, , , Amit Pandit, , , Keunwan Park, , , Gennady Poda, , , Mykola V. Protopopov, , , Vera Pütter, , , Rahul Ravichandran, , , Didier Rognan, , , Edina Rosta, , , Yogesh Sabnis, , , Thomas Scott, , , Almagul Seitova, , , Purshotam Sharma, , , François Sindt, , , Minghu Song, , , Casper Steinmann, , , Rick Stevens, , , Valerij Talagayev, , , Valentyna V. Tararina, , , Olga Tarkhanova, , , Damon Tingey, , , John F. Trant, , , Dakota Treleaven, , , Alexander Tropsha, , , Patrick Walters, , , Jude Wells, , , Yvonne Westermaier, , , Gerhard Wolber, , , Lars Wortmann, , , Shuangjia Zheng, , , James S. Fraser*, , and , Matthieu Schapira*, ","doi":"10.1021/acs.jcim.5c02441","DOIUrl":"10.1021/acs.jcim.5c02441","url":null,"abstract":"<p >The third <i>Critical Assessment of Computational Hit-finding Experiments</i> (CACHE) challenged computational teams to identify chemically novel ligands targeting the macrodomain 1 of SARS-CoV-2 Nsp3, a promising coronavirus drug target. Twenty-three groups deployed diverse design strategies to collectively select 1739 ligand candidates. While over 85% of the designed molecules were chemically novel, the best experimentally confirmed hits were structurally similar to previously published compounds. Confirming a trend observed in CACHE #1 and #2, two of the best-performing workflows used compounds selected by physics-based computational screening methods to train machine learning models able to rapidly screen large chemical libraries, while four others used exclusively physics-based approaches. Three pharmacophore searches and one fragment growing strategy were also part of the seven winning workflows. While active molecules discovered by CACHE #3 participants largely mimicked the adenine ring of the endogenous substrate, ADP-ribose, preserving the canonical chemotype commonly observed in previously reported Nsp3-Mac1 ligands, they still provide novel structure–activity relationship insights that may inform the development of future antivirals. Collectively, these results show that multiple molecular design strategies can efficiently converge on similar potent molecules.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 3","pages":"1566–1581"},"PeriodicalIF":5.3,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c02441","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1021/acs.jcim.5c01924
Jean V. Sampaio, , , Andrielly H. S. Costa, , , Aline O. Albuquerque, , , Júlia S. Souza, , , Diego S. Almeida, , , Eduardo M. Gaieta, , , Matheus V. Almeida, , , Geraldo R. Sartori*, , and , João H. M. Silva*,
The utilization of predictive tools has become increasingly prevalent in the development of biopharmaceuticals, reducing the time and cost of research. However, most methods for computational antibody design are hampered by their reliance on scarcely available antibody structures, potential for immunogenic modifications, and a restricted exploration of the paratope’s potential chemical and conformational space. We propose Ab-SELDON, a modular and easily customizable antibody design pipeline capable of iteratively optimizing an antibody–antigen (Ab–Ag) interaction in five different modification steps, including CDR and framework grafting, and mutagenesis. The optimization process is guided by diversity data collected from millions of publicly available human antibody sequences. This approach enhanced the exploration of the chemical and conformational space of the paratope during computational tests involving the optimization of an anti-HER2 antibody. Optimization of another antibody against Gal-3BP stabilized the Ab-Ag interaction in molecular dynamics simulations at lower runtime than alternative pipelines. Tests with SKEMPI’s Ab-Ag mutations also demonstrated the pipeline’s ability to correctly identify the effect of the majority of mutations, especially multipoint and those that increased binding affinity. This freely available pipeline presents a new approach for computationally efficient and automated in silico antibody design, thereby facilitating the development of new biopharmaceuticals.
{"title":"Ab-SELDON: Leveraging Diversity Data for an Efficient Automated Computational Pipeline for Antibody Design","authors":"Jean V. Sampaio, , , Andrielly H. S. Costa, , , Aline O. Albuquerque, , , Júlia S. Souza, , , Diego S. Almeida, , , Eduardo M. Gaieta, , , Matheus V. Almeida, , , Geraldo R. Sartori*, , and , João H. M. Silva*, ","doi":"10.1021/acs.jcim.5c01924","DOIUrl":"10.1021/acs.jcim.5c01924","url":null,"abstract":"<p >The utilization of predictive tools has become increasingly prevalent in the development of biopharmaceuticals, reducing the time and cost of research. However, most methods for computational antibody design are hampered by their reliance on scarcely available antibody structures, potential for immunogenic modifications, and a restricted exploration of the paratope’s potential chemical and conformational space. We propose Ab-SELDON, a modular and easily customizable antibody design pipeline capable of iteratively optimizing an antibody–antigen (Ab–Ag) interaction in five different modification steps, including CDR and framework grafting, and mutagenesis. The optimization process is guided by diversity data collected from millions of publicly available human antibody sequences. This approach enhanced the exploration of the chemical and conformational space of the paratope during computational tests involving the optimization of an anti-HER2 antibody. Optimization of another antibody against Gal-3BP stabilized the Ab-Ag interaction in molecular dynamics simulations at lower runtime than alternative pipelines. Tests with SKEMPI’s Ab-Ag mutations also demonstrated the pipeline’s ability to correctly identify the effect of the majority of mutations, especially multipoint and those that increased binding affinity. This freely available pipeline presents a new approach for computationally efficient and automated <i>in silico</i> antibody design, thereby facilitating the development of new biopharmaceuticals.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 3","pages":"1895–1905"},"PeriodicalIF":5.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c01924","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Drug-target interactions (DTIs) are the basis of the therapeutic effect of drugs, whose accurate prediction helps reduce the cost and time of experimental screening in drug development process. Present methods for DTIs prediction often focus on the study of molecular topological structure, which weakens spatial information such as the relative position of atoms and bond angle, and fail to effectively integrate molecular information with association network information. To address this issue, we propose a novel Geometry-enhanced Multiscale Joint Representation Learning method for drug-target interaction prediction (GMJRL). GMJRL not only considers the global information in the drug-target network from the macro-scale, but also extracts the geometric structure information on the drug and the target from the microscale, including the bond angle information on the drug and the atomic coordinate information on the target. To effectively fuse different scale representations, we develop a joint representation learning method with self-attention, which can capture correlations within the same scale and consider the interscale relationships, thus achieving effective fusion of the macro-scale and microscale representations. Finally, this study introduces a negative sampling algorithm to select reliable negative samples from unlabeled drug-target pairs. Extensive experiments validate that GMJRL yields promising outcomes in predicting drug-target interactions.
{"title":"Geometry-Enhanced Multiscale Joint Representation Learning for Drug-Target Interaction Prediction","authors":"Qiao Ning*, , , Shaohang Qiao, , , Yawen Cai, , , Yanpeng Liu, , , Hui Li*, , , Qian Ma, , and , Shikai Guo, ","doi":"10.1021/acs.jcim.5c02347","DOIUrl":"10.1021/acs.jcim.5c02347","url":null,"abstract":"<p >Drug-target interactions (DTIs) are the basis of the therapeutic effect of drugs, whose accurate prediction helps reduce the cost and time of experimental screening in drug development process. Present methods for DTIs prediction often focus on the study of molecular topological structure, which weakens spatial information such as the relative position of atoms and bond angle, and fail to effectively integrate molecular information with association network information. To address this issue, we propose a novel Geometry-enhanced Multiscale Joint Representation Learning method for drug-target interaction prediction (GMJRL). GMJRL not only considers the global information in the drug-target network from the macro-scale, but also extracts the geometric structure information on the drug and the target from the microscale, including the bond angle information on the drug and the atomic coordinate information on the target. To effectively fuse different scale representations, we develop a joint representation learning method with self-attention, which can capture correlations within the same scale and consider the interscale relationships, thus achieving effective fusion of the macro-scale and microscale representations. Finally, this study introduces a negative sampling algorithm to select reliable negative samples from unlabeled drug-target pairs. Extensive experiments validate that GMJRL yields promising outcomes in predicting drug-target interactions.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 3","pages":"1906–1919"},"PeriodicalIF":5.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1021/acs.jcim.5c02616
Wenjuan Yi,Zhengdong Xu,Dushuo Feng,Lulu Guan,Jiaxing Tang,Yu Zou
Cytoplasmic accumulation of the transactive response deoxyribonucleic acid (DNA)-binding protein of 43 kDa (TDP-43) aggregates represents the primary pathological hallmark of TDP-43 proteinopathies including amyotrophic lateral sclerosis (ALS) and chronic traumatic encephalopathy (CTE). Inhibiting TDP-43 aggregation or disrupting its preformed fibrils might be promising strategies to prevent or delay the development of TDP-43 proteinopathies. Recently, the green tea polyphenol, epigallocatechin gallate (EGCG), was observed to prevent the formation of TDP-43 oligomeric species and fibrillar aggregates. Nevertheless, the atomic-level mechanism of this inhibition has been incompletely characterized. In this study, we performed a multitude of replica exchange with solute tempering 2 (REST2) and all-atom molecular dynamics (MD) simulations of 46.8 μs in total on TDP-43 models with and without EGCG. The REST2 simulation results revealed that EGCG impedes the β-sheet structure formation and interferes the interchain interaction of TDP-43304-348 dimer. Subsequent analyses show that EGCG could alter the distribution of free energy landscape and hinder the residue-residue interaction of the dimer. The binding analyses confirmed that EGCG preferentially bound to M307, F313, F316, W334, M339, Q344, and Q346 residues, and hydrophobic, polar, and π-π stacking interactions dominate the binding of EGCG on the dimer. Additional conventional molecular dynamics (MD) simulations demonstrated that the protofibrillar tetramer is the minimal stable TDP-43304-348 protofibril. Taking the tetramer as a protofibril model, we found that EGCG could reduce the structural stability and disrupt the β-sheet structure of TDP-43304-348 protofibril, thus possessing a destabilization effect on its higher-order structure. This investigation unveils the atomic-level mechanism by which EGCG against TDP-43 aggregation, which may provide potential fundamental knowledge of therapeutic strategies for TDP-43 proteinopathies.
{"title":"Computational Exploration of the Molecular Mechanism of Epigallocatechin Gallate against TDP-43 Aggregation.","authors":"Wenjuan Yi,Zhengdong Xu,Dushuo Feng,Lulu Guan,Jiaxing Tang,Yu Zou","doi":"10.1021/acs.jcim.5c02616","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02616","url":null,"abstract":"Cytoplasmic accumulation of the transactive response deoxyribonucleic acid (DNA)-binding protein of 43 kDa (TDP-43) aggregates represents the primary pathological hallmark of TDP-43 proteinopathies including amyotrophic lateral sclerosis (ALS) and chronic traumatic encephalopathy (CTE). Inhibiting TDP-43 aggregation or disrupting its preformed fibrils might be promising strategies to prevent or delay the development of TDP-43 proteinopathies. Recently, the green tea polyphenol, epigallocatechin gallate (EGCG), was observed to prevent the formation of TDP-43 oligomeric species and fibrillar aggregates. Nevertheless, the atomic-level mechanism of this inhibition has been incompletely characterized. In this study, we performed a multitude of replica exchange with solute tempering 2 (REST2) and all-atom molecular dynamics (MD) simulations of 46.8 μs in total on TDP-43 models with and without EGCG. The REST2 simulation results revealed that EGCG impedes the β-sheet structure formation and interferes the interchain interaction of TDP-43304-348 dimer. Subsequent analyses show that EGCG could alter the distribution of free energy landscape and hinder the residue-residue interaction of the dimer. The binding analyses confirmed that EGCG preferentially bound to M307, F313, F316, W334, M339, Q344, and Q346 residues, and hydrophobic, polar, and π-π stacking interactions dominate the binding of EGCG on the dimer. Additional conventional molecular dynamics (MD) simulations demonstrated that the protofibrillar tetramer is the minimal stable TDP-43304-348 protofibril. Taking the tetramer as a protofibril model, we found that EGCG could reduce the structural stability and disrupt the β-sheet structure of TDP-43304-348 protofibril, thus possessing a destabilization effect on its higher-order structure. This investigation unveils the atomic-level mechanism by which EGCG against TDP-43 aggregation, which may provide potential fundamental knowledge of therapeutic strategies for TDP-43 proteinopathies.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"44 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phase separation in bilayers composed of a few lipid species is widely used as a model for exploring the lateral heterogeneity of complex cell membranes. Molecular dynamics (MD) simulations offer atomistic insights into coexisting lipid phases. But identifying these phases from trajectories remains challenging. Here, we present an unsupervised method for lipid phase recognition in phase-separated bilayers. In this method, the membrane plane is first discretized into pixels. For each pixel, the local lipid packing degree, which is defined as the atomic density within that pixel, is calculated and assigned to the corresponding pixel. A threshold is then determined by fitting a two-component Gaussian mixture model (GMM) to the distribution of lipid packing degree, enabling phase state assignment to pixels and subsequent mapping back to lipids. Our method is applicable to different systems, regardless of their compositions or temperatures, thus minimizing potential artifacts. Tests on bilayers with diverse lipid compositions and temperatures show that our method outperforms the commonly used hidden Markov model (HMM) in both accuracy and robustness. Notably, in this method, phase recognition relies solely on bilayer-intrinsic properties (lipid packing degree), without requiring temporal information, labeled data, or assumptions about the local lipid environment. This makes our method broadly applicable to various tasks, including characterizing the phase transformation process before the system reaches equilibration and identifying coexisting phases in protein-containing bilayers. In summary, we provide a robust and accurate framework for identifying coexisting phases in bilayers and tracking their dynamic transitions in simulations.
{"title":"Recognition of Coexisting Phases in Model Membranes via an Unsupervised Method","authors":"Yuzhuo Dai, , , Jianwei Zhao, , , Beibei Wang*, , , Qing Liang*, , and , Ruo-Xu Gu*, ","doi":"10.1021/acs.jcim.5c02665","DOIUrl":"10.1021/acs.jcim.5c02665","url":null,"abstract":"<p >Phase separation in bilayers composed of a few lipid species is widely used as a model for exploring the lateral heterogeneity of complex cell membranes. Molecular dynamics (MD) simulations offer atomistic insights into coexisting lipid phases. But identifying these phases from trajectories remains challenging. Here, we present an unsupervised method for lipid phase recognition in phase-separated bilayers. In this method, the membrane plane is first discretized into pixels. For each pixel, the local lipid packing degree, which is defined as the atomic density within that pixel, is calculated and assigned to the corresponding pixel. A threshold is then determined by fitting a two-component Gaussian mixture model (GMM) to the distribution of lipid packing degree, enabling phase state assignment to pixels and subsequent mapping back to lipids. Our method is applicable to different systems, regardless of their compositions or temperatures, thus minimizing potential artifacts. Tests on bilayers with diverse lipid compositions and temperatures show that our method outperforms the commonly used hidden Markov model (HMM) in both accuracy and robustness. Notably, in this method, phase recognition relies solely on bilayer-intrinsic properties (lipid packing degree), without requiring temporal information, labeled data, or assumptions about the local lipid environment. This makes our method broadly applicable to various tasks, including characterizing the phase transformation process before the system reaches equilibration and identifying coexisting phases in protein-containing bilayers. In summary, we provide a robust and accurate framework for identifying coexisting phases in bilayers and tracking their dynamic transitions in simulations.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 3","pages":"1840–1851"},"PeriodicalIF":5.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bitterness, alongside sour, sweet, umami, and salty tastes, constitutes one of the five basic tastes and serves as a key dimension in shaping food flavor profiles. Food protein processing readily generates bitter peptides, whose intense bitterness often leads to consumer rejection, yet these peptides frequently carry beneficial bioactivities, necessitating a trade-off between flavor and functionality. This necessitates the quantitative assessment of bitterness intensity in the early stages of product development. However, experimental assays relying on sensory evaluation and electronic tongue instruments are complex, costly, and limited in throughput, constraining the systematic identification of bitter peptides and process optimization. Here, we present BIPE (Bitterness Intensity Prediction Engine), an end-to-end regression model that integrates ESM3 protein language model representations with a multilayer perceptron readout, performing regression of bitterness thresholds in log space to directly assess bitterness intensity from sequence alone. BIPE achieves R2 = 0.9050 under 10-fold cross-validation and R2 = 0.9449 on an independent test set. BIPE accurately reproduces trends in both electronic tongue readouts and human sensory scores, demonstrating a consistent external validity across assays. Besides, BIPE accurately differentiates the bitterness intensities of soybean protein hydrolysates produced by multiple commercial proteases. Finally, systematic scanning of the complete pentapeptide sequence space by BIPE further reveals amino acid compositional patterns associated with bitterness, providing mechanistic insights. By advancing from classification to quantitative regression, BIPE enables rational design of low-bitterness peptides, supports flavor engineering and process optimization, and establishes a reusable baseline for taste modeling.
{"title":"BIPE: Artificial Intelligence-Driven Peptide Bitterness Intensity Prediction Engine","authors":"Jianda Yue, , , Hua Tan, , , Jiawei Xu, , , Tingting Li, , , Zihui Chen, , , Xie Li, , , Zhaoyang Tang, , , Songping Liang, , , Zhonghua Liu*, , and , Ying Wang*, ","doi":"10.1021/acs.jcim.5c02678","DOIUrl":"10.1021/acs.jcim.5c02678","url":null,"abstract":"<p >Bitterness, alongside sour, sweet, umami, and salty tastes, constitutes one of the five basic tastes and serves as a key dimension in shaping food flavor profiles. Food protein processing readily generates bitter peptides, whose intense bitterness often leads to consumer rejection, yet these peptides frequently carry beneficial bioactivities, necessitating a trade-off between flavor and functionality. This necessitates the quantitative assessment of bitterness intensity in the early stages of product development. However, experimental assays relying on sensory evaluation and electronic tongue instruments are complex, costly, and limited in throughput, constraining the systematic identification of bitter peptides and process optimization. Here, we present BIPE (<u>B</u>itterness <u>I</u>ntensity <u>P</u>rediction <u>E</u>ngine), an end-to-end regression model that integrates ESM3 protein language model representations with a multilayer perceptron readout, performing regression of bitterness thresholds in log space to directly assess bitterness intensity from sequence alone. BIPE achieves <i>R</i><sup>2</sup> = 0.9050 under 10-fold cross-validation and <i>R</i><sup>2</sup> = 0.9449 on an independent test set. BIPE accurately reproduces trends in both electronic tongue readouts and human sensory scores, demonstrating a consistent external validity across assays. Besides, BIPE accurately differentiates the bitterness intensities of soybean protein hydrolysates produced by multiple commercial proteases. Finally, systematic scanning of the complete pentapeptide sequence space by BIPE further reveals amino acid compositional patterns associated with bitterness, providing mechanistic insights. By advancing from classification to quantitative regression, BIPE enables rational design of low-bitterness peptides, supports flavor engineering and process optimization, and establishes a reusable baseline for taste modeling.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 3","pages":"1522–1538"},"PeriodicalIF":5.3,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146002639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1021/acs.jcim.5c02506
Iván Cortés,Cristina Cuadrado,José A Gavín,María Marta Zanardi,Antonio Hernández Daranas,Ariel M Sarotti
Quantum-mechanical NMR (QM-NMR) is widely used in structure elucidation. A long-sought holey grail in this field is solving structures from a simple 1H NMR spectrum with AI-driven workflows. Yet, solvent effects on chemical shifts, though long recognized, remain overlooked. We show in a theory-experiment study that implicit solvation models miss solvent-induced variations and introduce a Python tool to quantify solvent sensitivity, aiding more reliable QM-NMR structural assignments.
{"title":"Solvent Matters: Bridging Theory and Experiment in Quantum-Mechanical NMR Structural Elucidation.","authors":"Iván Cortés,Cristina Cuadrado,José A Gavín,María Marta Zanardi,Antonio Hernández Daranas,Ariel M Sarotti","doi":"10.1021/acs.jcim.5c02506","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02506","url":null,"abstract":"Quantum-mechanical NMR (QM-NMR) is widely used in structure elucidation. A long-sought holey grail in this field is solving structures from a simple 1H NMR spectrum with AI-driven workflows. Yet, solvent effects on chemical shifts, though long recognized, remain overlooked. We show in a theory-experiment study that implicit solvation models miss solvent-induced variations and introduce a Python tool to quantify solvent sensitivity, aiding more reliable QM-NMR structural assignments.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"99 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1021/acs.jcim.5c02375
Erika M. Herrera Machado*, , , Jakob L. Andersen*, , , Rolf Fagerberg*, , and , Daniel Merkle*,
In this study, we introduce a sensitivity analysis methodology for stochastic systems in chemistry, where dynamics are often governed by random processes. Our approach is based on gradient estimation via finite differences, averaging simulation outcomes, and analyzing variability under intrinsic noise. We characterize gradient uncertainty as an angular range within which all plausible gradient directions are expected to lie. A key feature of our approach is that this uncertainty measure adaptively guides the number of simulations performed for each nominal-perturbation pair of points in order to minimize unnecessary computations while maintaining robustness. Systematically exploring a range of parameter values across the parameter space, rather than focusing on a single value, allows us to identify not only sensitive parameters but also regions of parameter space associated with different levels of sensitivity. These results are visualized through vector field plots to offer an intuitive representation of local sensitivity across parameter space. Additionally, global sensitivity coefficients over sampled points in the parameter space are computed to capture overall trends. Flexibility regarding the choice of output observable measures is another key feature of our method: while traditional sensitivity analyses often focus on species concentrations, our framework allows for the definition of a large range of problem-specific observables. This makes it broadly applicable in diverse chemical and biochemical scenarios. We demonstrate our approach on two systems: classical Michaelis–Menten kinetics and a rule-based model of the formose reaction, using the cheminformatics software MØD for Gillespie-based stochastic simulations.
{"title":"A Sensitivity Analysis Methodology for Rule-Based Stochastic Chemical Systems","authors":"Erika M. Herrera Machado*, , , Jakob L. Andersen*, , , Rolf Fagerberg*, , and , Daniel Merkle*, ","doi":"10.1021/acs.jcim.5c02375","DOIUrl":"10.1021/acs.jcim.5c02375","url":null,"abstract":"<p >In this study, we introduce a sensitivity analysis methodology for stochastic systems in chemistry, where dynamics are often governed by random processes. Our approach is based on gradient estimation via finite differences, averaging simulation outcomes, and analyzing variability under intrinsic noise. We characterize gradient uncertainty as an angular range within which all plausible gradient directions are expected to lie. A key feature of our approach is that this uncertainty measure adaptively guides the number of simulations performed for each nominal-perturbation pair of points in order to minimize unnecessary computations while maintaining robustness. Systematically exploring a range of parameter values across the parameter space, rather than focusing on a single value, allows us to identify not only sensitive parameters but also regions of parameter space associated with different levels of sensitivity. These results are visualized through vector field plots to offer an intuitive representation of local sensitivity across parameter space. Additionally, global sensitivity coefficients over sampled points in the parameter space are computed to capture overall trends. Flexibility regarding the choice of output observable measures is another key feature of our method: while traditional sensitivity analyses often focus on species concentrations, our framework allows for the definition of a large range of problem-specific observables. This makes it broadly applicable in diverse chemical and biochemical scenarios. We demonstrate our approach on two systems: classical Michaelis–Menten kinetics and a rule-based model of the formose reaction, using the cheminformatics software MØD for Gillespie-based stochastic simulations.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 3","pages":"1637–1651"},"PeriodicalIF":5.3,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1021/acs.jcim.5c02017
F Alexander Sepúlveda,Daniel Cerro-Ramos,T Jesper Jacobsson
Accurately modeling the complex relationships among synthesis parameters, material compositions, and performance metrics is essential for accelerating the development of perovskite solar cells (PSCs). In this context, machine learning (ML) has proven to be a valuable tool. While most ML applications in PSC research rely on discriminative "black-box" models, this study adopts a generative approach by modeling the joint probability density function. We employ Gaussian Mixture Models (GMMs), a pragmatic and interpretable choice well-suited for the scarce, low-dimensional tabular data typical of PSC research. This single GMM framework is evaluated on five distinct tasks: discovering clusters, regression, generating novel configurations, training on data sets with missing data and, inverse design of the experimental (synthesis) conditions. That is, assuming we have the perovskite material composition and a target PCE, we infer the experimental conditions. For this latter task we use a novel "GMM-Assisted Optimization" method, which demonstrates to be more effective than standard random-start optimization, achieving an RMSE of 1.52 against target PCEs, more than halving the 3.32 RMSE of the baseline. These findings highlight the power of probabilistic modeling for data-driven discovery in PSC research.
{"title":"Density Estimation Based on Mixtures of Gaussians for Perovskite Solar Cells Modeling.","authors":"F Alexander Sepúlveda,Daniel Cerro-Ramos,T Jesper Jacobsson","doi":"10.1021/acs.jcim.5c02017","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02017","url":null,"abstract":"Accurately modeling the complex relationships among synthesis parameters, material compositions, and performance metrics is essential for accelerating the development of perovskite solar cells (PSCs). In this context, machine learning (ML) has proven to be a valuable tool. While most ML applications in PSC research rely on discriminative \"black-box\" models, this study adopts a generative approach by modeling the joint probability density function. We employ Gaussian Mixture Models (GMMs), a pragmatic and interpretable choice well-suited for the scarce, low-dimensional tabular data typical of PSC research. This single GMM framework is evaluated on five distinct tasks: discovering clusters, regression, generating novel configurations, training on data sets with missing data and, inverse design of the experimental (synthesis) conditions. That is, assuming we have the perovskite material composition and a target PCE, we infer the experimental conditions. For this latter task we use a novel \"GMM-Assisted Optimization\" method, which demonstrates to be more effective than standard random-start optimization, achieving an RMSE of 1.52 against target PCEs, more than halving the 3.32 RMSE of the baseline. These findings highlight the power of probabilistic modeling for data-driven discovery in PSC research.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"85 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145994807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1021/acs.jcim.5c02375
Erika M. Herrera Machado,Jakob L. Andersen,Rolf Fagerberg,Daniel Merkle
In this study, we introduce a sensitivity analysis methodology for stochastic systems in chemistry, where dynamics are often governed by random processes. Our approach is based on gradient estimation via finite differences, averaging simulation outcomes, and analyzing variability under intrinsic noise. We characterize gradient uncertainty as an angular range within which all plausible gradient directions are expected to lie. A key feature of our approach is that this uncertainty measure adaptively guides the number of simulations performed for each nominal-perturbation pair of points in order to minimize unnecessary computations while maintaining robustness. Systematically exploring a range of parameter values across the parameter space, rather than focusing on a single value, allows us to identify not only sensitive parameters but also regions of parameter space associated with different levels of sensitivity. These results are visualized through vector field plots to offer an intuitive representation of local sensitivity across parameter space. Additionally, global sensitivity coefficients over sampled points in the parameter space are computed to capture overall trends. Flexibility regarding the choice of output observable measures is another key feature of our method: while traditional sensitivity analyses often focus on species concentrations, our framework allows for the definition of a large range of problem-specific observables. This makes it broadly applicable in diverse chemical and biochemical scenarios. We demonstrate our approach on two systems: classical Michaelis–Menten kinetics and a rule-based model of the formose reaction, using the cheminformatics software MØD for Gillespie-based stochastic simulations.
{"title":"A Sensitivity Analysis Methodology for Rule-Based Stochastic Chemical Systems","authors":"Erika M. Herrera Machado,Jakob L. Andersen,Rolf Fagerberg,Daniel Merkle","doi":"10.1021/acs.jcim.5c02375","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02375","url":null,"abstract":"In this study, we introduce a sensitivity analysis methodology for stochastic systems in chemistry, where dynamics are often governed by random processes. Our approach is based on gradient estimation via finite differences, averaging simulation outcomes, and analyzing variability under intrinsic noise. We characterize gradient uncertainty as an angular range within which all plausible gradient directions are expected to lie. A key feature of our approach is that this uncertainty measure adaptively guides the number of simulations performed for each nominal-perturbation pair of points in order to minimize unnecessary computations while maintaining robustness. Systematically exploring a range of parameter values across the parameter space, rather than focusing on a single value, allows us to identify not only sensitive parameters but also regions of parameter space associated with different levels of sensitivity. These results are visualized through vector field plots to offer an intuitive representation of local sensitivity across parameter space. Additionally, global sensitivity coefficients over sampled points in the parameter space are computed to capture overall trends. Flexibility regarding the choice of output observable measures is another key feature of our method: while traditional sensitivity analyses often focus on species concentrations, our framework allows for the definition of a large range of problem-specific observables. This makes it broadly applicable in diverse chemical and biochemical scenarios. We demonstrate our approach on two systems: classical Michaelis–Menten kinetics and a rule-based model of the formose reaction, using the cheminformatics software MØD for Gillespie-based stochastic simulations.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"9 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}