Drug-Protein Interactions Prediction Models Using Feature Selection and Classification Techniques

IF 2.1 4区 医学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Current drug metabolism Pub Date : 2024-01-05 DOI:10.2174/0113892002268739231211063718
T. Idhaya, A. Suruliandi, S. P. Raja
{"title":"Drug-Protein Interactions Prediction Models Using Feature Selection and Classification Techniques","authors":"T. Idhaya, A. Suruliandi, S. P. Raja","doi":"10.2174/0113892002268739231211063718","DOIUrl":null,"url":null,"abstract":"Background: Drug-Protein Interaction (DPI) identification is crucial in drug discovery. The high dimensionality of drug and protein features poses challenges for accurate interaction prediction, necessitating the use of computational techniques. Docking-based methods rely on 3D structures, while ligand-based methods have limitations such as reliance on known ligands and neglecting protein structure. Therefore, the preferred approach is the chemogenomics-based approach using machine learning, which considers both drug and protein characteristics for DPI prediction. Methods: In machine learning, feature selection plays a vital role in improving model performance, reducing overfitting, enhancing interpretability, and making the learning process more efficient. It helps extract meaningful patterns from drug and protein data while eliminating irrelevant or redundant information, resulting in more effective machine-learning models. On the other hand, classification is of great importance as it enables pattern recognition, decision-making, predictive modeling, anomaly detection, data exploration, and automation. It empowers machines to make accurate predictions and facilitates efficient decision-making in DPI prediction. For this research work, protein data was sourced from the KEGG database, while drug data was obtained from the DrugBank data machine-learning base. Results: To address the issue of imbalanced Drug Protein Pairs (DPP), different balancing techniques like Random Over Sampling (ROS), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive SMOTE were employed. Given the large number of features associated with drugs and proteins, feature selection becomes necessary. Various feature selection methods were evaluated: Correlation, Information Gain (IG), Chi-Square (CS), and Relief. Multiple classification methods, including Support Vector Machines (SVM), Random Forest (RF), Adaboost, and Logistic Regression (LR), were used to predict DPI. Finally, this research identifies the best balancing, feature selection, and classification methods for accurate DPI prediction. Conclusion: This comprehensive approach aims to overcome the limitations of existing methods and provide more reliable and efficient predictions in drug-protein interaction studies.","PeriodicalId":10770,"journal":{"name":"Current drug metabolism","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current drug metabolism","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2174/0113892002268739231211063718","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Drug-Protein Interaction (DPI) identification is crucial in drug discovery. The high dimensionality of drug and protein features poses challenges for accurate interaction prediction, necessitating the use of computational techniques. Docking-based methods rely on 3D structures, while ligand-based methods have limitations such as reliance on known ligands and neglecting protein structure. Therefore, the preferred approach is the chemogenomics-based approach using machine learning, which considers both drug and protein characteristics for DPI prediction. Methods: In machine learning, feature selection plays a vital role in improving model performance, reducing overfitting, enhancing interpretability, and making the learning process more efficient. It helps extract meaningful patterns from drug and protein data while eliminating irrelevant or redundant information, resulting in more effective machine-learning models. On the other hand, classification is of great importance as it enables pattern recognition, decision-making, predictive modeling, anomaly detection, data exploration, and automation. It empowers machines to make accurate predictions and facilitates efficient decision-making in DPI prediction. For this research work, protein data was sourced from the KEGG database, while drug data was obtained from the DrugBank data machine-learning base. Results: To address the issue of imbalanced Drug Protein Pairs (DPP), different balancing techniques like Random Over Sampling (ROS), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive SMOTE were employed. Given the large number of features associated with drugs and proteins, feature selection becomes necessary. Various feature selection methods were evaluated: Correlation, Information Gain (IG), Chi-Square (CS), and Relief. Multiple classification methods, including Support Vector Machines (SVM), Random Forest (RF), Adaboost, and Logistic Regression (LR), were used to predict DPI. Finally, this research identifies the best balancing, feature selection, and classification methods for accurate DPI prediction. Conclusion: This comprehensive approach aims to overcome the limitations of existing methods and provide more reliable and efficient predictions in drug-protein interaction studies.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用特征选择和分类技术的药物-蛋白质相互作用预测模型
背景:药物-蛋白质相互作用(DPI)的鉴定在药物发现中至关重要。药物和蛋白质特征的高维度给准确预测相互作用带来了挑战,因此有必要使用计算技术。基于 Docking 的方法依赖于三维结构,而基于配体的方法有其局限性,如依赖于已知配体和忽略蛋白质结构。因此,首选的方法是基于化学基因组学的机器学习方法,这种方法在预测 DPI 时同时考虑了药物和蛋白质的特征。方法:在机器学习中,特征选择在提高模型性能、减少过拟合、增强可解释性以及提高学习过程效率方面起着至关重要的作用。它有助于从药物和蛋白质数据中提取有意义的模式,同时消除无关或冗余信息,从而建立更有效的机器学习模型。另一方面,分类也非常重要,因为它可以实现模式识别、决策、预测建模、异常检测、数据探索和自动化。它使机器能够做出准确的预测,并促进 DPI 预测中的高效决策。在这项研究工作中,蛋白质数据来自 KEGG 数据库,而药物数据则来自 DrugBank 数据机器学习库。研究结果为了解决药物蛋白质对(DPP)不平衡的问题,我们采用了不同的平衡技术,如随机过度采样(ROS)、合成少数过度采样技术(SMOTE)和自适应 SMOTE。鉴于与药物和蛋白质相关的特征数量庞大,特征选择变得十分必要。对各种特征选择方法进行了评估:相关性、信息增益 (IG)、Chi-Square (CS) 和救济。多种分类方法,包括支持向量机 (SVM)、随机森林 (RF)、Adaboost 和逻辑回归 (LR) 被用于预测 DPI。最后,本研究确定了准确预测 DPI 的最佳平衡、特征选择和分类方法。结论这种综合方法旨在克服现有方法的局限性,为药物蛋白相互作用研究提供更可靠、更高效的预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Current drug metabolism
Current drug metabolism 医学-生化与分子生物学
CiteScore
4.30
自引率
4.30%
发文量
81
审稿时长
4-8 weeks
期刊介绍: Current Drug Metabolism aims to cover all the latest and outstanding developments in drug metabolism, pharmacokinetics, and drug disposition. The journal serves as an international forum for the publication of full-length/mini review, research articles and guest edited issues in drug metabolism. Current Drug Metabolism is an essential journal for academic, clinical, government and pharmaceutical scientists who wish to be kept informed and up-to-date with the most important developments. The journal covers the following general topic areas: pharmaceutics, pharmacokinetics, toxicology, and most importantly drug metabolism. More specifically, in vitro and in vivo drug metabolism of phase I and phase II enzymes or metabolic pathways; drug-drug interactions and enzyme kinetics; pharmacokinetics, pharmacokinetic-pharmacodynamic modeling, and toxicokinetics; interspecies differences in metabolism or pharmacokinetics, species scaling and extrapolations; drug transporters; target organ toxicity and interindividual variability in drug exposure-response; extrahepatic metabolism; bioactivation, reactive metabolites, and developments for the identification of drug metabolites. Preclinical and clinical reviews describing the drug metabolism and pharmacokinetics of marketed drugs or drug classes.
期刊最新文献
Drug Metabolizing Enzymes: An Exclusive Guide into Latest Research in Pharmaco-genetic Dynamics in Arab Countries. Unveiling the Interplay: Antioxidant Enzyme Polymorphisms and Oxidative Stress in Preterm Neonatal Renal and Hepatic Functions. Quality by Design Approach for the Development of Cariprazine Hydrochloride Loaded Lipid-Based Formulation for Brain Delivery via Intranasal Route. Ceftobiprole and Cefiderocol for Patients on Extracorporeal Membrane Oxygenation: The Role of Therapeutic Drug Monitoring. Development of Hot Melt Extruded Co-Formulated Artesunate and AmodiaquineSoluplus® Solid Dispersion System in Fixed-Dose Form: Amorphous State Characterization and Pharmacokinetic Evaluation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1