Foundation and multimodal models are rapidly becoming a core methodology in molecular informatics, particularly for drug discovery, by leveraging large-scale pretraining across sequences, graphs, 3D structures, and text. This mini-review provides practical guidance on when these models help, how to choose representations and data, and how to design pretraining and adaptation pipelines for real-world use. We clarify what qualifies as a foundation model in chemistry; compare chemical language models, graph-based architectures, and 3D equivariant networks; review multimodal strategies that connect molecules with proteins, pockets, and natural language; and summarize diffusion-based generative modeling. We also emphasize rigorous evaluation, discussing realistic splitting protocols, distribution shift, activity cliffs, uncertainty calibration, and conformal prediction in the context of widely used benchmarks.
{"title":"Foundation and Multimodal Models for Drug Discovery in Molecular Informatics: Principles, Evaluation, and Practical Guidance.","authors":"Emmanuel Pio Pastore, Francesco De Rango","doi":"10.1002/minf.70027","DOIUrl":"10.1002/minf.70027","url":null,"abstract":"<p><p>Foundation and multimodal models are rapidly becoming a core methodology in molecular informatics, particularly for drug discovery, by leveraging large-scale pretraining across sequences, graphs, 3D structures, and text. This mini-review provides practical guidance on when these models help, how to choose representations and data, and how to design pretraining and adaptation pipelines for real-world use. We clarify what qualifies as a foundation model in chemistry; compare chemical language models, graph-based architectures, and 3D equivariant networks; review multimodal strategies that connect molecules with proteins, pockets, and natural language; and summarize diffusion-based generative modeling. We also emphasize rigorous evaluation, discussing realistic splitting protocols, distribution shift, activity cliffs, uncertainty calibration, and conformal prediction in the context of widely used benchmarks.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"45 3","pages":"e70027"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13014059/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147513461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Louis Plyer, Alexey A Orlov, Tagir N Akhmetshin, Erik Yeghyan, Fanny Bonachera, Dragos Horvath, Alexandre Varnek
The growing number and size of DNA-encoded libraries (DELs), together with the vast space of possible DEL designs, demand interpretable and scalable criteria for selecting which libraries to construct and screen against a given target. An ideal target-focused DEL shows both strong similarity with an active reference compound collection and high intra-DEL diversity. Chemography with Generative Topographic Mapping (GTM) was shown to be a promising approach for selecting DELs, offering both intuitive visualization and fast quantitative analysis scalable to thousands of DEL designs. This is achieved by defining each library by a "stand-alone" vector, the comparison of which precludes costly pairwise inter-molecular similarity calculations. However, the extent to which such "stand-alone" (SA) approaches in general, and GTM-derived SA metrics in particular, recover DELs that are reference-proximal and chemically diverse as evaluated by conventional compound pair-matching (CP) metrics in the initial descriptor space remains insufficiently characterized. In this article, the comparative analysis of the Morgan count fingerprint-based chemical-library similarity versus GTM-derived metrics, using 100 diverse DEL subsets and a reference set of compounds tested against cyclin-dependent kinase 2 (CDK2) from ChEMBL, was performed. GTM-based SA metrics provide robust approximations for "gold standard" molecular descriptor space CP metrics for DEL selection: Spearman rank correlations fall in the 0.6-0.7 range. Our results demonstrate that GTM helps to identify DELs that best span the reference space according to same "gold standard" molecular descriptor space metrics: SA GTM-driven rankings of libraries achieve enrichment factors at 5% (EF5%) of 4-12 (in terms of finding "gold standard" top libraries within the 5% best ranked by GTM)-always picking 2 out of the top 3 libraries. The accompanying two-dimensional landscapes make intra- and interlibrary diversity visually accessible, supporting rapid, interpretable screening of alternative DEL designs. Collectively, these results position GTM as an efficient tool for chemical-library similarity assessment and target-focused DEL selection.
dna编码文库(DEL)的数量和大小不断增长,加上DEL设计的巨大空间,需要可解释和可扩展的标准来选择构建哪些文库并针对给定目标进行筛选。理想的以靶标为中心的DEL与有效参考化合物集合具有很强的相似性,并且具有较高的DEL内部多样性。使用生成式地形映射(GTM)的化学图被证明是一种很有前途的选择DEL的方法,它提供了直观的可视化和可扩展到数千个DEL设计的快速定量分析。这是通过用“独立”向量定义每个库来实现的,这种比较排除了昂贵的两两分子间相似性计算。然而,这种“独立”(SA)方法,特别是gtm衍生的SA指标,在多大程度上恢复了初始描述符空间中由传统复合配对(CP)指标评估的参考近端和化学多样性的DELs,仍然没有得到充分的表征。在这篇文章中,使用100个不同的DEL子集和一组来自ChEMBL的针对周期蛋白依赖性激酶2 (CDK2)测试的参考化合物,对基于摩根计数指纹的化学文库相似性与基于gtm的指标进行了比较分析。基于gtm的SA指标为DEL选择的“金标准”分子描述符空间CP指标提供了稳健的近似值:Spearman等级相关性在0.6-0.7范围内。我们的结果表明,GTM有助于根据相同的“金标准”分子描述符空间指标识别最佳跨越参考空间的del: SA GTM驱动的库排名在4-12的5%(就在GTM排名的5%中找到“金标准”顶级库而言)实现了富集因子(EF5%) -总是从前3个库中选择2个。伴随的二维景观使图书馆内和图书馆间的多样性在视觉上可访问,支持快速、可解释的替代DEL设计筛选。总的来说,这些结果将GTM定位为化学文库相似性评估和以目标为中心的DEL选择的有效工具。
{"title":"Interpretable and Scalable Similarity Metrics for DNA-Encoded Library Design Using Generative Topographic Mapping.","authors":"Louis Plyer, Alexey A Orlov, Tagir N Akhmetshin, Erik Yeghyan, Fanny Bonachera, Dragos Horvath, Alexandre Varnek","doi":"10.1002/minf.70026","DOIUrl":"https://doi.org/10.1002/minf.70026","url":null,"abstract":"<p><p>The growing number and size of DNA-encoded libraries (DELs), together with the vast space of possible DEL designs, demand interpretable and scalable criteria for selecting which libraries to construct and screen against a given target. An ideal target-focused DEL shows both strong similarity with an active reference compound collection and high intra-DEL diversity. Chemography with Generative Topographic Mapping (GTM) was shown to be a promising approach for selecting DELs, offering both intuitive visualization and fast quantitative analysis scalable to thousands of DEL designs. This is achieved by defining each library by a \"stand-alone\" vector, the comparison of which precludes costly pairwise inter-molecular similarity calculations. However, the extent to which such \"stand-alone\" (SA) approaches in general, and GTM-derived SA metrics in particular, recover DELs that are reference-proximal and chemically diverse as evaluated by conventional compound pair-matching (CP) metrics in the initial descriptor space remains insufficiently characterized. In this article, the comparative analysis of the Morgan count fingerprint-based chemical-library similarity versus GTM-derived metrics, using 100 diverse DEL subsets and a reference set of compounds tested against cyclin-dependent kinase 2 (CDK2) from ChEMBL, was performed. GTM-based SA metrics provide robust approximations for \"gold standard\" molecular descriptor space CP metrics for DEL selection: Spearman rank correlations fall in the 0.6-0.7 range. Our results demonstrate that GTM helps to identify DELs that best span the reference space according to same \"gold standard\" molecular descriptor space metrics: SA GTM-driven rankings of libraries achieve enrichment factors at 5% (EF5%) of 4-12 (in terms of finding \"gold standard\" top libraries within the 5% best ranked by GTM)-always picking 2 out of the top 3 libraries. The accompanying two-dimensional landscapes make intra- and interlibrary diversity visually accessible, supporting rapid, interpretable screening of alternative DEL designs. Collectively, these results position GTM as an efficient tool for chemical-library similarity assessment and target-focused DEL selection.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"45 3","pages":"e70026"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147513447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adittya Pal, Rolf Fagerberg, Jakob Lykke Andersen, Peter Dittrich, Daniel Merkle
Analyzing synthesis pathways for target molecules in a chemical reaction network annotated with information on the kinetics of individual reactions is an area of active study. This work presents a computational methodology for searching for pathways in reaction networks which is based on integer linear programing and the modeling of reaction networks by directed hypergraphs. Often multiple pathways fit the given search criteria. To rank them, we develop an objective function based on physical arguments maximizing the probability of the pathway. We furthermore develop an automated pipeline to estimate the energy barriers of individual reactions in reaction networks. Combined, the methodology facilitates flexible and kinetically informed pathway investigations on large reaction networks by computational means, even for networks coming without kinetic annotation, such as those created via generative approaches for expanding molecular spaces. To demonstrate the methodology, we apply it on a chemical reaction network generated from 2-hydroxyethanenitrile, water, and ammonia, where we search for pathways to glycine and 2-hydroxyethanoic acid using the input molecules as precursors.
{"title":"Finding Pathways in Reaction Networks Guided by Energy Barriers Using Integer Linear Programing.","authors":"Adittya Pal, Rolf Fagerberg, Jakob Lykke Andersen, Peter Dittrich, Daniel Merkle","doi":"10.1002/minf.70021","DOIUrl":"https://doi.org/10.1002/minf.70021","url":null,"abstract":"<p><p>Analyzing synthesis pathways for target molecules in a chemical reaction network annotated with information on the kinetics of individual reactions is an area of active study. This work presents a computational methodology for searching for pathways in reaction networks which is based on integer linear programing and the modeling of reaction networks by directed hypergraphs. Often multiple pathways fit the given search criteria. To rank them, we develop an objective function based on physical arguments maximizing the probability of the pathway. We furthermore develop an automated pipeline to estimate the energy barriers of individual reactions in reaction networks. Combined, the methodology facilitates flexible and kinetically informed pathway investigations on large reaction networks by computational means, even for networks coming without kinetic annotation, such as those created via generative approaches for expanding molecular spaces. To demonstrate the methodology, we apply it on a chemical reaction network generated from 2-hydroxyethanenitrile, water, and ammonia, where we search for pathways to glycine and 2-hydroxyethanoic acid using the input molecules as precursors.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"45 3","pages":"e70021"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147513395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lisa Lombardo, Francesco Agnello, Rosaria Gitto, Laura De Luca
Human 5-lipoxygenase (5-LOX) plays a crucial role in the biosynthesis of leukotrienes (LTs). Therefore, 5-LOX inhibitors are designed as effective agents for the treatment of several diseases such as asthma, cardiovascular disorders, allergies, and cancer. Insights into crystal structures of several 5-LOX isoforms have revealed that this protein adopts two different conformations (open/closed) through modulation of its Hα2 and arched helix regions, which are conditioned by the presence or absence of ligand in the active site; moreover, these structures are incomplete in regions critical for ligand binding. To advance the design of 5-LOX inhibitors, we developed a computational procedure to reconstruct the first full-length open conformation structure of 5-LOX complexed with chelating inhibitor within the active site. Dynamic simulations and protein model validation confirmed the quality of our model, which was subsequently used for docking analyses and culminated in the development of a structure-based pharmacophore model. These computational studies might constitute powerful tools for rationally designing and identifying novel 5-LOX iron chelator inhibitors.
{"title":"A Multistep Computational Approach to Achieve a Complete Human 5-Lipoxygenase Structure and Provide a Pharmacophore Model for Further Drug Design.","authors":"Lisa Lombardo, Francesco Agnello, Rosaria Gitto, Laura De Luca","doi":"10.1002/minf.70025","DOIUrl":"10.1002/minf.70025","url":null,"abstract":"<p><p>Human 5-lipoxygenase (5-LOX) plays a crucial role in the biosynthesis of leukotrienes (LTs). Therefore, 5-LOX inhibitors are designed as effective agents for the treatment of several diseases such as asthma, cardiovascular disorders, allergies, and cancer. Insights into crystal structures of several 5-LOX isoforms have revealed that this protein adopts two different conformations (open/closed) through modulation of its Hα2 and arched helix regions, which are conditioned by the presence or absence of ligand in the active site; moreover, these structures are incomplete in regions critical for ligand binding. To advance the design of 5-LOX inhibitors, we developed a computational procedure to reconstruct the first full-length open conformation structure of 5-LOX complexed with chelating inhibitor within the active site. Dynamic simulations and protein model validation confirmed the quality of our model, which was subsequently used for docking analyses and culminated in the development of a structure-based pharmacophore model. These computational studies might constitute powerful tools for rationally designing and identifying novel 5-LOX iron chelator inhibitors.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"45 3","pages":"e70025"},"PeriodicalIF":3.1,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13014066/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147513416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sania Saeed, Shahrukh Khan, Aneeqa Noor, Inga Zerr, Saima Zafar
Rapidly progressive Alzheimer's disease (rpAD) is a rare but severe form of Alzheimer's disease characterized by accelerated cognitive decline and limited therapeutic options. Conventional anti-amyloid-β interventions have shown little success due to poor target specificity, neurotoxicity, and lack of efficacy, underscoring the need for novel therapeutic strategies. This study aimed to identify and prioritize molecular targets associated with rpAD by investigating the protein interactome of amyloid-β (Aβ42) using integrative computational approaches. Functional enrichment, protein-protein interaction network analysis, and community clustering revealed that rpAD-specific Aβ42 interactors were predominantly involved in mitochondrial bioenergetics, redox regulation, and cytoskeletal stability, pathways central to neuronal survival and synaptic function. Molecular docking identified fumarate hydratase, carbonyl reductase 1, and the F-actin capping protein as high-affinity interactors of Aβ42, linking these proteins to energy failure, oxidative stress, and synaptic dysfunction. Virtual screening of a therapeutic drug library against fumarate hydratase identified several compounds with strong binding affinities, among which quinestrol, estradiol benzoate, norethindrone, tamibarotene, drospirenone, and ketanserin emerged as lead candidates. Pharmacokinetic profiling, including ADMET modeling, confirmed their blood-brain barrier permeability and drug-likeness, supporting their potential as central nervous system active agents. Together, this work highlights key molecular targets in rpAD and proposes repurposed, pharmacologically diverse compounds with multitarget neuroprotective potential. By utilizing in silico analysis, the study provides a rational framework for target discovery and drug prioritization in rpAD, offering a foundation for future experimental validation and the development of translational research.
快速进行性阿尔茨海默病(rpAD)是一种罕见但严重的阿尔茨海默病,其特征是认知能力加速下降和治疗选择有限。由于靶点特异性差、神经毒性和缺乏疗效,传统的抗淀粉样蛋白-β干预措施收效甚微,因此需要新的治疗策略。本研究旨在通过使用综合计算方法研究淀粉样蛋白-β (a -β 42)的蛋白质相互作用组来识别和优先考虑与rpAD相关的分子靶点。功能富集、蛋白相互作用网络分析和群落聚类显示,rpad特异性Aβ42相互作用物主要参与线粒体生物能量学、氧化还原调节和细胞骨架稳定性,以及神经元存活和突触功能的核心途径。分子对接发现富马酸水合酶、羰基还原酶1和f -肌动蛋白封盖蛋白是Aβ42的高亲和力相互作用蛋白,将这些蛋白与能量衰竭、氧化应激和突触功能障碍联系起来。对富马酸水合酶的治疗药物库进行虚拟筛选,确定了几种具有强结合亲和力的化合物,其中喹雌酮、雌二醇苯甲酸酯、去甲烯酮、他米巴罗汀、屈螺酮和酮色林成为主要候选化合物。包括ADMET模型在内的药代动力学分析证实了它们的血脑屏障渗透性和药物相似性,支持它们作为中枢神经系统活性剂的潜力。总之,这项工作突出了rpAD中的关键分子靶点,并提出了具有多靶点神经保护潜力的重新用途的、药理学上多样化的化合物。通过芯片分析,本研究为rpAD的靶点发现和药物优先排序提供了一个合理的框架,为未来的实验验证和转化研究的发展奠定了基础。
{"title":"Therapeutic Potential of Amyloid-β Interactors in Rapidly Progressive Alzheimer's Disease-An In Silico Study.","authors":"Sania Saeed, Shahrukh Khan, Aneeqa Noor, Inga Zerr, Saima Zafar","doi":"10.1002/minf.70024","DOIUrl":"https://doi.org/10.1002/minf.70024","url":null,"abstract":"<p><p>Rapidly progressive Alzheimer's disease (rpAD) is a rare but severe form of Alzheimer's disease characterized by accelerated cognitive decline and limited therapeutic options. Conventional anti-amyloid-β interventions have shown little success due to poor target specificity, neurotoxicity, and lack of efficacy, underscoring the need for novel therapeutic strategies. This study aimed to identify and prioritize molecular targets associated with rpAD by investigating the protein interactome of amyloid-β (Aβ<sub>42</sub>) using integrative computational approaches. Functional enrichment, protein-protein interaction network analysis, and community clustering revealed that rpAD-specific Aβ<sub>42</sub> interactors were predominantly involved in mitochondrial bioenergetics, redox regulation, and cytoskeletal stability, pathways central to neuronal survival and synaptic function. Molecular docking identified fumarate hydratase, carbonyl reductase 1, and the F-actin capping protein as high-affinity interactors of Aβ<sub>42</sub>, linking these proteins to energy failure, oxidative stress, and synaptic dysfunction. Virtual screening of a therapeutic drug library against fumarate hydratase identified several compounds with strong binding affinities, among which quinestrol, estradiol benzoate, norethindrone, tamibarotene, drospirenone, and ketanserin emerged as lead candidates. Pharmacokinetic profiling, including ADMET modeling, confirmed their blood-brain barrier permeability and drug-likeness, supporting their potential as central nervous system active agents. Together, this work highlights key molecular targets in rpAD and proposes repurposed, pharmacologically diverse compounds with multitarget neuroprotective potential. By utilizing in silico analysis, the study provides a rational framework for target discovery and drug prioritization in rpAD, offering a foundation for future experimental validation and the development of translational research.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"45 2","pages":"e70024"},"PeriodicalIF":3.1,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146202169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diversity and properties of ring systems contained in small molecules are of high interest for applications such as drug discovery and materials science. In the present work, we extract, analyze, and classify ring systems found in small-molecule compounds of open-access databases such as PubChem, ChEMBL, DrugCentral, IUPHAR, and LOTUS. We also developed a classification taxonomy of frequently found ring systems using an Open Biomedical Ontologies (OBO)-format ontology. Open-access software was used to automate the classification of compounds into their respective ring system classes. As an example, the natural product compounds of the LOTUS database were classified and are available as an open-access ontology dataset.
{"title":"Statistics and Ontology of Published Small Molecule Ring Systems.","authors":"Lutz Weber","doi":"10.1002/minf.70022","DOIUrl":"10.1002/minf.70022","url":null,"abstract":"<p><p>Diversity and properties of ring systems contained in small molecules are of high interest for applications such as drug discovery and materials science. In the present work, we extract, analyze, and classify ring systems found in small-molecule compounds of open-access databases such as PubChem, ChEMBL, DrugCentral, IUPHAR, and LOTUS. We also developed a classification taxonomy of frequently found ring systems using an Open Biomedical Ontologies (OBO)-format ontology. Open-access software was used to automate the classification of compounds into their respective ring system classes. As an example, the natural product compounds of the LOTUS database were classified and are available as an open-access ontology dataset.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"45 2","pages":"e70022"},"PeriodicalIF":3.1,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147307815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yield prediction in catalytic reactions is essential for improving chemical process efficiency and product quality. Ligands significantly influence reactivity and selectivity, highlighting the need for descriptors that accurately capture their structural and electronic properties. In this study, we focus on infrared (IR) spectra, which reflects molecular vibrational modes, and propose novel descriptors based on wavenumber information. We evaluated the predictive performance of these descriptors using two datasets: direct Pd-catalyzed arylation and Suzuki-Miyaura coupling reactions. The wavenumber-based IR descriptors outperformed conventional molecular descriptors and structural fingerprints (one-hot encoding, Mordred, MACCS, Morgan fingerprint, RDKit, and density functional theory). Notably, descriptors limited to the fingerprint region (0-1700 cm-1) effectively captured key molecular features, contributing to both high prediction accuracy and improved chemical interpretability. Our results indicate that IR-based descriptors can achieve strong generalization performance even with small datasets. This approach offers a promising strategy for redefining reaction condition spaces and enhancing the interpretability of predictive models, thereby supporting more informed experimental design.
{"title":"Infrared Spectral Descriptors for Reaction Yield Prediction: Toward Redefining Experimental Spaces.","authors":"Yuya Endo, Hiromasa Kaneko","doi":"10.1002/minf.70019","DOIUrl":"10.1002/minf.70019","url":null,"abstract":"<p><p>Yield prediction in catalytic reactions is essential for improving chemical process efficiency and product quality. Ligands significantly influence reactivity and selectivity, highlighting the need for descriptors that accurately capture their structural and electronic properties. In this study, we focus on infrared (IR) spectra, which reflects molecular vibrational modes, and propose novel descriptors based on wavenumber information. We evaluated the predictive performance of these descriptors using two datasets: direct Pd-catalyzed arylation and Suzuki-Miyaura coupling reactions. The wavenumber-based IR descriptors outperformed conventional molecular descriptors and structural fingerprints (one-hot encoding, Mordred, MACCS, Morgan fingerprint, RDKit, and density functional theory). Notably, descriptors limited to the fingerprint region (0-1700 cm<sup>-1</sup>) effectively captured key molecular features, contributing to both high prediction accuracy and improved chemical interpretability. Our results indicate that IR-based descriptors can achieve strong generalization performance even with small datasets. This approach offers a promising strategy for redefining reaction condition spaces and enhancing the interpretability of predictive models, thereby supporting more informed experimental design.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"45 2","pages":"e70019"},"PeriodicalIF":3.1,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12899324/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146166057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Considering the limited efficacy of existing pharmacotherapies for brain tumors, the development of accurate predictive models is essential for advancing neuro-oncology treatment strategies. In this article, we introduce a drug response prediction model, DeepMoDRP, specifically designed for brain cancer. This model integrates genomic, transcriptomic, and epigenomic data from various brain tumor cell lines, including low-grade glioma, glioblastoma multiforme, and diffuse large B-cell lymphoma. To address the high-dimensional complexity inherent in gene expression and copy number variations within cell line data, we have integrated sparse autoencoders (AEs) and denoising AEs to reduce noise and redundancy. Meanwhile, one-dimensional convolutional neural networks are utilized to process the low-dimensional mutation and DNA methylation data. Additionally, a multiscale graph neural network is implemented to handle the drug-related data. Finally, fully connected networks are employed to generate predictions of drug responses. A series of experiments were conducted utilizing a brain tumor dataset that was extracted and curated from public databases. The experimental results demonstrate that the proposed DeepMoDRP outperforms the performance of state-of-the-art pan-cancer baseline models in predicting drug responses for brain tumors. The downstream analysis indicates that the DeepMoDRP holds significant promise for the treatment of brain tumors.
{"title":"DeepMoDRP: A Multi-Omics-Based Deep Learning Framework for Drug Response Prediction in Brain Cancer.","authors":"Yuxuan Li, Xiumin Shi, Lu Wang, Lianzhong Zhang","doi":"10.1002/minf.70020","DOIUrl":"https://doi.org/10.1002/minf.70020","url":null,"abstract":"<p><p>Considering the limited efficacy of existing pharmacotherapies for brain tumors, the development of accurate predictive models is essential for advancing neuro-oncology treatment strategies. In this article, we introduce a drug response prediction model, DeepMoDRP, specifically designed for brain cancer. This model integrates genomic, transcriptomic, and epigenomic data from various brain tumor cell lines, including low-grade glioma, glioblastoma multiforme, and diffuse large B-cell lymphoma. To address the high-dimensional complexity inherent in gene expression and copy number variations within cell line data, we have integrated sparse autoencoders (AEs) and denoising AEs to reduce noise and redundancy. Meanwhile, one-dimensional convolutional neural networks are utilized to process the low-dimensional mutation and DNA methylation data. Additionally, a multiscale graph neural network is implemented to handle the drug-related data. Finally, fully connected networks are employed to generate predictions of drug responses. A series of experiments were conducted utilizing a brain tumor dataset that was extracted and curated from public databases. The experimental results demonstrate that the proposed DeepMoDRP outperforms the performance of state-of-the-art pan-cancer baseline models in predicting drug responses for brain tumors. The downstream analysis indicates that the DeepMoDRP holds significant promise for the treatment of brain tumors.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"45 2","pages":"e70020"},"PeriodicalIF":3.1,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146202110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurately predicting the fraction unbound in plasma (fup) from chemical structures is essential for understanding pharmacokinetic characteristics during the early stage of drug discovery. This prediction serves as a valuable tool for minimizing late-stage setbacks and refining subsequent screening processes. Conventional approaches often rely on complex computational methodologies that may require extensive descriptor sets, resulting in opaque models with limited interpretability. In this study, we applied the read-across strategy in combination with traditional quantitative structure-property relationship to predict fup while minimizing descriptor complexity. Our method employs interpretable models (regression and classification), facilitating insight into the underlying structure-property relationships governing plasma protein binding. Through comprehensive validation and comparison with different machine learning methods, we demonstrated the superior predictive performance of quantitative read-across structure-property relationship multiple linear regression and classification-based read-across structure-property relatonship respectively. support vector classifier models across diverse chemical compounds. This approach offers a valuable tool for predicting fup in the process of drug discovery. Overall, this study aims to advance the field of pharmacokinetic modeling by applying the read-across strategy that improves predictive power with interpretability. By elucidating the complex relationship between chemical structures and fup, our best models have the potential to formulate more rational drug design approaches, ultimately contributing to the development of more effective therapeutics.
{"title":"Read-Across Structure-Property Relationship-Based Superior Prediction of Fraction Unbound in Plasma from Chemical Structure: Interpretable Models with Minimum Descriptors.","authors":"Indrasis Dasgupta, Samima Khatun, Shovanlal Gayen","doi":"10.1002/minf.70023","DOIUrl":"https://doi.org/10.1002/minf.70023","url":null,"abstract":"<p><p>Accurately predicting the fraction unbound in plasma (f<sub>up</sub>) from chemical structures is essential for understanding pharmacokinetic characteristics during the early stage of drug discovery. This prediction serves as a valuable tool for minimizing late-stage setbacks and refining subsequent screening processes. Conventional approaches often rely on complex computational methodologies that may require extensive descriptor sets, resulting in opaque models with limited interpretability. In this study, we applied the read-across strategy in combination with traditional quantitative structure-property relationship to predict f<sub>up</sub> while minimizing descriptor complexity. Our method employs interpretable models (regression and classification), facilitating insight into the underlying structure-property relationships governing plasma protein binding. Through comprehensive validation and comparison with different machine learning methods, we demonstrated the superior predictive performance of quantitative read-across structure-property relationship multiple linear regression and classification-based read-across structure-property relatonship respectively. support vector classifier models across diverse chemical compounds. This approach offers a valuable tool for predicting f<sub>up</sub> in the process of drug discovery. Overall, this study aims to advance the field of pharmacokinetic modeling by applying the read-across strategy that improves predictive power with interpretability. By elucidating the complex relationship between chemical structures and f<sub>up</sub>, our best models have the potential to formulate more rational drug design approaches, ultimately contributing to the development of more effective therapeutics.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"45 2","pages":"e70023"},"PeriodicalIF":3.1,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146202152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anastasia Rudik, Leonid Stolbov, Alexey Lagunin, Dmitry Filimonov, Vladimir Poroikov
The pharmacokinetic profile of a potential drug is largely determined by its metabolic stability, which reflects its susceptibility to biotransformation. Metabolic stability data allow one to assess the therapeutic value of a compound and its toxicological risk. This assesment relies primarily on pharmacokinetic parameters, particularly half-life (t1/2) and clearance (CL), which are typically determined using in vitro systems including hepatocytes and liver microsomal fractions. Using the publicly available ChEMBL v. 35 and PubChem databases, we collected over 8000 chemical compounds with experimental intrinsic CL and/or half-life data from liver microsome assays obtained in mice, rats, and humans. Different thresholds were applied to differentiate the stable and unstable molecules. The Naive Bayesian classifier with MNA (Multilevel Neighborhoods of Atoms) descriptors and Self-Consistent Extreme Classifier (SCEC) with QNA (Quantitative Neighborhoods of Atoms) descriptors were used for creating classification models. The accuracy (AUC) of most classification models exceeded 0.85. Self-Consistent Regression was used to create quantitative models. The coefficient of determination of the regression models varied from 0.35 (rat, t1/2) to 0.7 (human, CLint). These models were integrated into the freely available web application MetaStab-Analyzer, which provides a unique combination of qualitative (stable/unstable/moderate) and quantitative predictions for three species. A key feature of the application is the providing of numerical metrics for each prediction, which increases its interpretability. This combination of innovative algorithms (SCR and SCEC), dual qualitative-quantitative assessment, and a user-friendly interface is not available in any existing tool. MetaStab-Analyzer is freely available at https://www.way2drug.com/metastab/.
一种潜在药物的药代动力学特征在很大程度上取决于其代谢稳定性,代谢稳定性反映了其对生物转化的易感性。代谢稳定性数据使人们能够评估一种化合物的治疗价值及其毒理学风险。这种评估主要依赖于药代动力学参数,特别是半衰期(t1/2)和清除率(CL),这通常是用体外系统确定的,包括肝细胞和肝微粒体部分。利用公开的ChEMBL v. 35和PubChem数据库,我们从小鼠、大鼠和人类的肝微粒体分析中收集了8000多种具有实验固有CL和/或半衰期数据的化合物。采用不同的阈值来区分稳定分子和不稳定分子。采用朴素贝叶斯分类器和自洽极端分类器分别采用多层原子邻域(MNA)和定量原子邻域(QNA)描述符建立分类模型。大多数分类模型的准确率(AUC)超过0.85。采用自洽回归建立定量模型。回归模型的决定系数从0.35(大鼠,t1/2)到0.7(人,CLint)不等。这些模型被集成到免费的web应用程序MetaStab-Analyzer中,该应用程序提供了对三种物种的定性(稳定/不稳定/中等)和定量预测的独特组合。该应用程序的一个关键特征是为每个预测提供数值度量,这增加了其可解释性。这种创新算法(SCR和SCEC)、双重定性定量评估和用户友好界面的组合在任何现有工具中都是不可用的。MetaStab-Analyzer可在https://www.way2drug.com/metastab/免费获得。
{"title":"MetaStab-Analyzer: Classification and Regression Models for Metabolic Stability Prediction.","authors":"Anastasia Rudik, Leonid Stolbov, Alexey Lagunin, Dmitry Filimonov, Vladimir Poroikov","doi":"10.1002/minf.70018","DOIUrl":"https://doi.org/10.1002/minf.70018","url":null,"abstract":"<p><p>The pharmacokinetic profile of a potential drug is largely determined by its metabolic stability, which reflects its susceptibility to biotransformation. Metabolic stability data allow one to assess the therapeutic value of a compound and its toxicological risk. This assesment relies primarily on pharmacokinetic parameters, particularly half-life (t<sub>1/2</sub>) and clearance (CL), which are typically determined using in vitro systems including hepatocytes and liver microsomal fractions. Using the publicly available ChEMBL v. 35 and PubChem databases, we collected over 8000 chemical compounds with experimental intrinsic CL and/or half-life data from liver microsome assays obtained in mice, rats, and humans. Different thresholds were applied to differentiate the stable and unstable molecules. The Naive Bayesian classifier with MNA (Multilevel Neighborhoods of Atoms) descriptors and Self-Consistent Extreme Classifier (SCEC) with QNA (Quantitative Neighborhoods of Atoms) descriptors were used for creating classification models. The accuracy (AUC) of most classification models exceeded 0.85. Self-Consistent Regression was used to create quantitative models. The coefficient of determination of the regression models varied from 0.35 (rat, t<sub>1/2</sub>) to 0.7 (human, CL<sub>int</sub>). These models were integrated into the freely available web application MetaStab-Analyzer, which provides a unique combination of qualitative (stable/unstable/moderate) and quantitative predictions for three species. A key feature of the application is the providing of numerical metrics for each prediction, which increases its interpretability. This combination of innovative algorithms (SCR and SCEC), dual qualitative-quantitative assessment, and a user-friendly interface is not available in any existing tool. MetaStab-Analyzer is freely available at https://www.way2drug.com/metastab/.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"45 2","pages":"e70018"},"PeriodicalIF":3.1,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146119507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}