可解释的人工智能驱动的 APE1 抑制剂预测:利用机器学习模型和特征重要性分析加强癌症治疗。

IF 3.9 2区 化学 Q2 CHEMISTRY, APPLIED Molecular Diversity Pub Date : 2025-02-21 DOI:10.1007/s11030-025-11133-6
Aga Basit Iqbal, Tariq Ahmad Masoodi, Ajaz A Bhat, Muzafar A Macha, Assif Assad, Syed Zubair Ahmad Shah
{"title":"可解释的人工智能驱动的 APE1 抑制剂预测:利用机器学习模型和特征重要性分析加强癌症治疗。","authors":"Aga Basit Iqbal, Tariq Ahmad Masoodi, Ajaz A Bhat, Muzafar A Macha, Assif Assad, Syed Zubair Ahmad Shah","doi":"10.1007/s11030-025-11133-6","DOIUrl":null,"url":null,"abstract":"<p><p>The viability of cells and the integrity of the genome depend on the detection and repair of damaged DNA through intricate mechanisms. Cancer treatment employs chemotherapy or radiation therapy to eliminate neoplastic cells by causing substantial damage to their DNA. In many cases, improved DNA repair mechanisms lead to resistance to these medicines; therefore, it is essential to expand efforts to develop drugs that can sensitise cells to these treatments by inhibiting the DNA repair process. Multiple studies have demonstrated a correlation between the overexpression of Apurinic/Apyrimidinic Endonuclease (APE1), the primary mammalian enzyme responsible for excising apurinic or apyrimidinic sites in DNA, and the resistance of cells to cancer therapies; in contrast, APE1 downregulation increases cellular susceptibility to DNA-damaging agents. Thus, the effectiveness of existing therapies can be improved by promoting the targeted sensitization of cancer cells while protecting healthy cells. The current study aims to employ explainable artificial intelligence (XAI) to enhance the accuracy and reliability of machine learning models for the prediction of APE1 inhibitors. Various ML-based regression models are employed to predict the pIC50 value of different medicines. Bayesian optimization and the Permutation Feature Importance (PFI) approach are employed to determine the best hyperparameters of machine learning models and to discover the most significant features for recognizing drug candidates that target APE1 enzymes, respectively. To acquire comprehensive elucidations for the predictive models in our research, two XAI methodologies, namely SHAP and LIME, are used. The SHAP analysis reveals that the features 'C1SP2' and 'ASP-2' are essential in influencing the model's predictions. The SHAP values demonstrate variability for features such as 'maxHBint2' and 'GATS1s,' signifying that their impact is dependent on specific instances within the dataset. The LIME study corroborates these findings, demonstrating that 'C1SP2' and 'ASP-2' are the most significant positive contributors, whereas features like 'SHCHnX,' 'nHdCH2,' and 'GATS1s' result in a decrease in the predicted values. Due to the limited sample size of the APE1 dataset, direct training on this dataset posed challenges in model generalization and reliability. To overcome this limitation, the BACE-1 dataset is leveraged for model training, enabling the ML models to learn from a more extensive and diverse chemical space. Among the tested algorithms, XGBoost demonstrated superior predictive performance, achieving R<sup>2</sup> = 0.890, MAE = 0.186, and RMSE = 0.245, significantly surpassing state-of-the-art methods, such as LightGBM and QSAR-ML, which attained R<sup>2</sup> scores of 0.798 and 0.630, respectively. These results highlight the robustness of our approach, demonstrating its enhanced generalization capability and superior predictive accuracy compared to existing methodologies.</p>","PeriodicalId":708,"journal":{"name":"Molecular Diversity","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Explainable AI-driven prediction of APE1 inhibitors: enhancing cancer therapy with machine learning models and feature importance analysis.\",\"authors\":\"Aga Basit Iqbal, Tariq Ahmad Masoodi, Ajaz A Bhat, Muzafar A Macha, Assif Assad, Syed Zubair Ahmad Shah\",\"doi\":\"10.1007/s11030-025-11133-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The viability of cells and the integrity of the genome depend on the detection and repair of damaged DNA through intricate mechanisms. Cancer treatment employs chemotherapy or radiation therapy to eliminate neoplastic cells by causing substantial damage to their DNA. In many cases, improved DNA repair mechanisms lead to resistance to these medicines; therefore, it is essential to expand efforts to develop drugs that can sensitise cells to these treatments by inhibiting the DNA repair process. Multiple studies have demonstrated a correlation between the overexpression of Apurinic/Apyrimidinic Endonuclease (APE1), the primary mammalian enzyme responsible for excising apurinic or apyrimidinic sites in DNA, and the resistance of cells to cancer therapies; in contrast, APE1 downregulation increases cellular susceptibility to DNA-damaging agents. Thus, the effectiveness of existing therapies can be improved by promoting the targeted sensitization of cancer cells while protecting healthy cells. The current study aims to employ explainable artificial intelligence (XAI) to enhance the accuracy and reliability of machine learning models for the prediction of APE1 inhibitors. Various ML-based regression models are employed to predict the pIC50 value of different medicines. Bayesian optimization and the Permutation Feature Importance (PFI) approach are employed to determine the best hyperparameters of machine learning models and to discover the most significant features for recognizing drug candidates that target APE1 enzymes, respectively. To acquire comprehensive elucidations for the predictive models in our research, two XAI methodologies, namely SHAP and LIME, are used. The SHAP analysis reveals that the features 'C1SP2' and 'ASP-2' are essential in influencing the model's predictions. The SHAP values demonstrate variability for features such as 'maxHBint2' and 'GATS1s,' signifying that their impact is dependent on specific instances within the dataset. The LIME study corroborates these findings, demonstrating that 'C1SP2' and 'ASP-2' are the most significant positive contributors, whereas features like 'SHCHnX,' 'nHdCH2,' and 'GATS1s' result in a decrease in the predicted values. Due to the limited sample size of the APE1 dataset, direct training on this dataset posed challenges in model generalization and reliability. To overcome this limitation, the BACE-1 dataset is leveraged for model training, enabling the ML models to learn from a more extensive and diverse chemical space. Among the tested algorithms, XGBoost demonstrated superior predictive performance, achieving R<sup>2</sup> = 0.890, MAE = 0.186, and RMSE = 0.245, significantly surpassing state-of-the-art methods, such as LightGBM and QSAR-ML, which attained R<sup>2</sup> scores of 0.798 and 0.630, respectively. These results highlight the robustness of our approach, demonstrating its enhanced generalization capability and superior predictive accuracy compared to existing methodologies.</p>\",\"PeriodicalId\":708,\"journal\":{\"name\":\"Molecular Diversity\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-02-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Diversity\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1007/s11030-025-11133-6\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Diversity","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s11030-025-11133-6","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
引用次数: 0
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Explainable AI-driven prediction of APE1 inhibitors: enhancing cancer therapy with machine learning models and feature importance analysis.

The viability of cells and the integrity of the genome depend on the detection and repair of damaged DNA through intricate mechanisms. Cancer treatment employs chemotherapy or radiation therapy to eliminate neoplastic cells by causing substantial damage to their DNA. In many cases, improved DNA repair mechanisms lead to resistance to these medicines; therefore, it is essential to expand efforts to develop drugs that can sensitise cells to these treatments by inhibiting the DNA repair process. Multiple studies have demonstrated a correlation between the overexpression of Apurinic/Apyrimidinic Endonuclease (APE1), the primary mammalian enzyme responsible for excising apurinic or apyrimidinic sites in DNA, and the resistance of cells to cancer therapies; in contrast, APE1 downregulation increases cellular susceptibility to DNA-damaging agents. Thus, the effectiveness of existing therapies can be improved by promoting the targeted sensitization of cancer cells while protecting healthy cells. The current study aims to employ explainable artificial intelligence (XAI) to enhance the accuracy and reliability of machine learning models for the prediction of APE1 inhibitors. Various ML-based regression models are employed to predict the pIC50 value of different medicines. Bayesian optimization and the Permutation Feature Importance (PFI) approach are employed to determine the best hyperparameters of machine learning models and to discover the most significant features for recognizing drug candidates that target APE1 enzymes, respectively. To acquire comprehensive elucidations for the predictive models in our research, two XAI methodologies, namely SHAP and LIME, are used. The SHAP analysis reveals that the features 'C1SP2' and 'ASP-2' are essential in influencing the model's predictions. The SHAP values demonstrate variability for features such as 'maxHBint2' and 'GATS1s,' signifying that their impact is dependent on specific instances within the dataset. The LIME study corroborates these findings, demonstrating that 'C1SP2' and 'ASP-2' are the most significant positive contributors, whereas features like 'SHCHnX,' 'nHdCH2,' and 'GATS1s' result in a decrease in the predicted values. Due to the limited sample size of the APE1 dataset, direct training on this dataset posed challenges in model generalization and reliability. To overcome this limitation, the BACE-1 dataset is leveraged for model training, enabling the ML models to learn from a more extensive and diverse chemical space. Among the tested algorithms, XGBoost demonstrated superior predictive performance, achieving R2 = 0.890, MAE = 0.186, and RMSE = 0.245, significantly surpassing state-of-the-art methods, such as LightGBM and QSAR-ML, which attained R2 scores of 0.798 and 0.630, respectively. These results highlight the robustness of our approach, demonstrating its enhanced generalization capability and superior predictive accuracy compared to existing methodologies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular Diversity
Molecular Diversity 化学-化学综合
CiteScore
7.30
自引率
7.90%
发文量
219
审稿时长
2.7 months
期刊介绍: Molecular Diversity is a new publication forum for the rapid publication of refereed papers dedicated to describing the development, application and theory of molecular diversity and combinatorial chemistry in basic and applied research and drug discovery. The journal publishes both short and full papers, perspectives, news and reviews dealing with all aspects of the generation of molecular diversity, application of diversity for screening against alternative targets of all types (biological, biophysical, technological), analysis of results obtained and their application in various scientific disciplines/approaches including: combinatorial chemistry and parallel synthesis; small molecule libraries; microwave synthesis; flow synthesis; fluorous synthesis; diversity oriented synthesis (DOS); nanoreactors; click chemistry; multiplex technologies; fragment- and ligand-based design; structure/function/SAR; computational chemistry and molecular design; chemoinformatics; screening techniques and screening interfaces; analytical and purification methods; robotics, automation and miniaturization; targeted libraries; display libraries; peptides and peptoids; proteins; oligonucleotides; carbohydrates; natural diversity; new methods of library formulation and deconvolution; directed evolution, origin of life and recombination; search techniques, landscapes, random chemistry and more;
期刊最新文献
Synthesis, DFT study, in silico ADMET evaluation, molecular docking, and QSAR analysis of new anti-tuberculosis drugs derived from 2-hydroxybenzohydrazide derivatives. Computational framework for minimizing off-target toxicity in capecitabine treatment using natural compounds. Integrating network pharmacology, molecular docking, and bioinformatics to explore the mechanism of sparganii rhizoma in the treatment of laryngeal cancer. Structural insights of AKT and its activation mechanism for drug development. Identification of effective synthetic molecules against viral-induced cytokine release syndrome using in silico and in vitro approaches.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1