Pub Date : 2026-02-05DOI: 10.1186/s13321-026-01158-w
Suwan Mao, Wenjie Tang, Li Li, Mang Jing, Yun Liu, Junjie Wang
Drug combination therapy is a well-established strategy for treating complex diseases. However, the vast combinatorial space renders exhaustive experimental screening impractical and costly. Recent studies have shown that deep learning techniques can effectively prioritize synergistic drug combinations by leveraging their powerful nonlinear modeling and automatic feature extraction capabilities. Meanwhile, Large Language Models (LLMs) offer great promise in drug discovery. In this paper, we propose CoSynLLM, an LLM-assisted predictive framework for predicting drug combination synergy. We fully leverage the latent knowledge embedded in LLMs to generate semantic-level chemical information, complemented by drug fingerprints to incorporate explicit structural details, while cell line gene expression profiles represent the cellular context. To effectively merge drug and cell line representations, a hierarchical feature fusion strategy is employed to progressively integrate features through multiple stages for predicting drug combination synergy. Extensive experiments on two benchmark datasets, NCI-ALMANAC and O'Neil, demonstrate that CoSynLLM achieves competitive performance, highlighting its effectiveness in predicting drug combination synergy. In summary, CoSynLLM effectively identifies synergistic drug combinations, offering a robust and practical computational framework for predicting drug combination synergy.
{"title":"Cosynllm: predicting drug combination synergy with LLM-generated descriptions.","authors":"Suwan Mao, Wenjie Tang, Li Li, Mang Jing, Yun Liu, Junjie Wang","doi":"10.1186/s13321-026-01158-w","DOIUrl":"https://doi.org/10.1186/s13321-026-01158-w","url":null,"abstract":"<p><p>Drug combination therapy is a well-established strategy for treating complex diseases. However, the vast combinatorial space renders exhaustive experimental screening impractical and costly. Recent studies have shown that deep learning techniques can effectively prioritize synergistic drug combinations by leveraging their powerful nonlinear modeling and automatic feature extraction capabilities. Meanwhile, Large Language Models (LLMs) offer great promise in drug discovery. In this paper, we propose CoSynLLM, an LLM-assisted predictive framework for predicting drug combination synergy. We fully leverage the latent knowledge embedded in LLMs to generate semantic-level chemical information, complemented by drug fingerprints to incorporate explicit structural details, while cell line gene expression profiles represent the cellular context. To effectively merge drug and cell line representations, a hierarchical feature fusion strategy is employed to progressively integrate features through multiple stages for predicting drug combination synergy. Extensive experiments on two benchmark datasets, NCI-ALMANAC and O'Neil, demonstrate that CoSynLLM achieves competitive performance, highlighting its effectiveness in predicting drug combination synergy. In summary, CoSynLLM effectively identifies synergistic drug combinations, offering a robust and practical computational framework for predicting drug combination synergy.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146117098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1186/s13321-026-01159-9
Darlene Nabila Zetta, Tarapong Srisongkram
Limited experimental data remains a key challenge in applying machine learning to drug discovery, particularly for cancer-related targets. In this study, we present a data-efficient active meta-deep learning framework to predict mitogen-activated protein kinase 1 (MAPK1) inhibitors, which are promising candidates for cancer-related therapies. Our approach integrates active learning (AL) with a meta-model that combines four deep architectures: a convolutional neural network, an attention, a graph convolutional network, and a graph neural network-attention, trained on molecular descriptors and graph-based representations. These models generate four probability-based features that feed into an attention-based meta-learner, improving predictive performance by 5.12% in the area under the precision-recall curve (AUPRC) and 5.48% in the Matthews correlation coefficient (MCC) using only 10% of the training data. Among the AL sampling strategies evaluated, entropy sampling showed competitive performance in selecting informative molecules for model improvement. Overall, our framework achieves an AUPRC of 0.835 ± 0.017 and MCC of 0.817 ± 0.017, on par with a traditional training method despite using only 26.7% of the training data. Compared to a conventional random forest model trained on brute-force, a 100% full training set, our approach shows a 10.6% improvement in AUPRC and modest gains in MCC, confirming the effectiveness of the proposed framework. Under severe class imbalance, balanced accuracy steadily increased across AL iterations, reaching values greater than 0.85 at the final iteration for all uncertainty-driven strategies. Molecular docking confirmed successful prioritization of the top four predicted compounds. Evaluation on an external MAPK1 data set demonstrated generalizability, with our approach achieving an AUPRC of 0.818 and an MCC of 0.403, comparable to the independent test set. These results highlight the potential of combining intelligent data selection with deep learning architectures through the meta-model to accelerate predictive performance in data-scarce drug discovery.
Scientific contribution: This study contributes a novel, data-efficient active meta-deep learning framework for predicting MAPK1 inhibitors, addressing the challenge of limited experimental data in a cancer-specific target. By integrating AL with a meta-model composed of four deep architectures, the approach significantly enhances the predictive performance using only a fraction of the training data. The framework achieves superior metrics compared to traditional training methods, highlighting its potential to accelerate drug discovery in data-scarce settings.