{"title":"使用特征选择和分类技术的药物-蛋白质相互作用预测模型","authors":"T. Idhaya, A. Suruliandi, S. P. Raja","doi":"10.2174/0113892002268739231211063718","DOIUrl":null,"url":null,"abstract":"Background: Drug-Protein Interaction (DPI) identification is crucial in drug discovery. The high dimensionality of drug and protein features poses challenges for accurate interaction prediction, necessitating the use of computational techniques. Docking-based methods rely on 3D structures, while ligand-based methods have limitations such as reliance on known ligands and neglecting protein structure. Therefore, the preferred approach is the chemogenomics-based approach using machine learning, which considers both drug and protein characteristics for DPI prediction. Methods: In machine learning, feature selection plays a vital role in improving model performance, reducing overfitting, enhancing interpretability, and making the learning process more efficient. It helps extract meaningful patterns from drug and protein data while eliminating irrelevant or redundant information, resulting in more effective machine-learning models. On the other hand, classification is of great importance as it enables pattern recognition, decision-making, predictive modeling, anomaly detection, data exploration, and automation. It empowers machines to make accurate predictions and facilitates efficient decision-making in DPI prediction. For this research work, protein data was sourced from the KEGG database, while drug data was obtained from the DrugBank data machine-learning base. Results: To address the issue of imbalanced Drug Protein Pairs (DPP), different balancing techniques like Random Over Sampling (ROS), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive SMOTE were employed. Given the large number of features associated with drugs and proteins, feature selection becomes necessary. Various feature selection methods were evaluated: Correlation, Information Gain (IG), Chi-Square (CS), and Relief. Multiple classification methods, including Support Vector Machines (SVM), Random Forest (RF), Adaboost, and Logistic Regression (LR), were used to predict DPI. Finally, this research identifies the best balancing, feature selection, and classification methods for accurate DPI prediction. Conclusion: This comprehensive approach aims to overcome the limitations of existing methods and provide more reliable and efficient predictions in drug-protein interaction studies.","PeriodicalId":10770,"journal":{"name":"Current drug metabolism","volume":"79 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Drug-Protein Interactions Prediction Models Using Feature Selection and Classification Techniques\",\"authors\":\"T. Idhaya, A. Suruliandi, S. P. Raja\",\"doi\":\"10.2174/0113892002268739231211063718\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Drug-Protein Interaction (DPI) identification is crucial in drug discovery. The high dimensionality of drug and protein features poses challenges for accurate interaction prediction, necessitating the use of computational techniques. Docking-based methods rely on 3D structures, while ligand-based methods have limitations such as reliance on known ligands and neglecting protein structure. Therefore, the preferred approach is the chemogenomics-based approach using machine learning, which considers both drug and protein characteristics for DPI prediction. Methods: In machine learning, feature selection plays a vital role in improving model performance, reducing overfitting, enhancing interpretability, and making the learning process more efficient. It helps extract meaningful patterns from drug and protein data while eliminating irrelevant or redundant information, resulting in more effective machine-learning models. On the other hand, classification is of great importance as it enables pattern recognition, decision-making, predictive modeling, anomaly detection, data exploration, and automation. It empowers machines to make accurate predictions and facilitates efficient decision-making in DPI prediction. For this research work, protein data was sourced from the KEGG database, while drug data was obtained from the DrugBank data machine-learning base. Results: To address the issue of imbalanced Drug Protein Pairs (DPP), different balancing techniques like Random Over Sampling (ROS), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive SMOTE were employed. Given the large number of features associated with drugs and proteins, feature selection becomes necessary. Various feature selection methods were evaluated: Correlation, Information Gain (IG), Chi-Square (CS), and Relief. Multiple classification methods, including Support Vector Machines (SVM), Random Forest (RF), Adaboost, and Logistic Regression (LR), were used to predict DPI. Finally, this research identifies the best balancing, feature selection, and classification methods for accurate DPI prediction. Conclusion: This comprehensive approach aims to overcome the limitations of existing methods and provide more reliable and efficient predictions in drug-protein interaction studies.\",\"PeriodicalId\":10770,\"journal\":{\"name\":\"Current drug metabolism\",\"volume\":\"79 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current drug metabolism\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2174/0113892002268739231211063718\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current drug metabolism","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2174/0113892002268739231211063718","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Drug-Protein Interactions Prediction Models Using Feature Selection and Classification Techniques
Background: Drug-Protein Interaction (DPI) identification is crucial in drug discovery. The high dimensionality of drug and protein features poses challenges for accurate interaction prediction, necessitating the use of computational techniques. Docking-based methods rely on 3D structures, while ligand-based methods have limitations such as reliance on known ligands and neglecting protein structure. Therefore, the preferred approach is the chemogenomics-based approach using machine learning, which considers both drug and protein characteristics for DPI prediction. Methods: In machine learning, feature selection plays a vital role in improving model performance, reducing overfitting, enhancing interpretability, and making the learning process more efficient. It helps extract meaningful patterns from drug and protein data while eliminating irrelevant or redundant information, resulting in more effective machine-learning models. On the other hand, classification is of great importance as it enables pattern recognition, decision-making, predictive modeling, anomaly detection, data exploration, and automation. It empowers machines to make accurate predictions and facilitates efficient decision-making in DPI prediction. For this research work, protein data was sourced from the KEGG database, while drug data was obtained from the DrugBank data machine-learning base. Results: To address the issue of imbalanced Drug Protein Pairs (DPP), different balancing techniques like Random Over Sampling (ROS), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive SMOTE were employed. Given the large number of features associated with drugs and proteins, feature selection becomes necessary. Various feature selection methods were evaluated: Correlation, Information Gain (IG), Chi-Square (CS), and Relief. Multiple classification methods, including Support Vector Machines (SVM), Random Forest (RF), Adaboost, and Logistic Regression (LR), were used to predict DPI. Finally, this research identifies the best balancing, feature selection, and classification methods for accurate DPI prediction. Conclusion: This comprehensive approach aims to overcome the limitations of existing methods and provide more reliable and efficient predictions in drug-protein interaction studies.
期刊介绍:
Current Drug Metabolism aims to cover all the latest and outstanding developments in drug metabolism, pharmacokinetics, and drug disposition. The journal serves as an international forum for the publication of full-length/mini review, research articles and guest edited issues in drug metabolism. Current Drug Metabolism is an essential journal for academic, clinical, government and pharmaceutical scientists who wish to be kept informed and up-to-date with the most important developments. The journal covers the following general topic areas: pharmaceutics, pharmacokinetics, toxicology, and most importantly drug metabolism.
More specifically, in vitro and in vivo drug metabolism of phase I and phase II enzymes or metabolic pathways; drug-drug interactions and enzyme kinetics; pharmacokinetics, pharmacokinetic-pharmacodynamic modeling, and toxicokinetics; interspecies differences in metabolism or pharmacokinetics, species scaling and extrapolations; drug transporters; target organ toxicity and interindividual variability in drug exposure-response; extrahepatic metabolism; bioactivation, reactive metabolites, and developments for the identification of drug metabolites. Preclinical and clinical reviews describing the drug metabolism and pharmacokinetics of marketed drugs or drug classes.