{"title":"AutoTarget: Disease-Associated druggable target identification via node representation learning in PPI networks","authors":"","doi":"10.1016/j.crbiot.2024.100260","DOIUrl":null,"url":null,"abstract":"<div><div>Drug target discovery, a pivotal early stage in drug development, is resource-intensive and crucial for ensuring drug efficacy. This study presents AutoTarget, a novel computational pipeline designed to identify disease-associated druggable targets by applying node representation learning to protein–protein interaction (PPI) networks. AutoTarget uses node2vec + for node classification, incorporating neighborhood context and structural equivalence in PPI networks derived from the STRING database. Data from the Therapeutic Target Database (TTD) and DisGeNET were integrated to identify known drug targets and gene-disease associations, respectively. Each protein is embedded into a 128-dimensional vector space, capturing local network structures and enabling the identification of structurally equivalent proteins. A Naïve Bayes classifier, trained on these embeddings, achieved a recall of 0.90 and an F1 score of 0.79 in predicting potential drug targets. AutoTarget identified 3,979 novel potential druggable target proteins out of 19,333 proteins in the PPI network, which were mapped to 23,363 diseases using DisGeNET. This creates a comprehensive resource for disease-specific drug target exploration. Case studies on triple-negative breast cancer and obesity demonstrated AutoTarget’s capability to identify both established and emerging targets, such as CD44, MAPK3, and GIP. Visualization of embedding vectors using t-SNE revealed clear separations between functional protein families, including nuclear proteins, growth factor receptors, and the G proteins within the kinase proteins. This supports the method’s ability to capture biologically relevant information. However, limitations were noted, including the inability to distinguish between different types of disease-associated proteins based solely on network features. Overall, this study advances the application of machine learning and network theory for identifying druggable targets across a wide range of diseases. AutoTarget provides researchers with a valuable tool for expediting the discovery of novel druggable targets, potentially streamlining the drug discovery process. The AutoTarget code and database are publicly available to facilitate further research.</div></div>","PeriodicalId":52676,"journal":{"name":"Current Research in Biotechnology","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Research in Biotechnology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590262824000868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Drug target discovery, a pivotal early stage in drug development, is resource-intensive and crucial for ensuring drug efficacy. This study presents AutoTarget, a novel computational pipeline designed to identify disease-associated druggable targets by applying node representation learning to protein–protein interaction (PPI) networks. AutoTarget uses node2vec + for node classification, incorporating neighborhood context and structural equivalence in PPI networks derived from the STRING database. Data from the Therapeutic Target Database (TTD) and DisGeNET were integrated to identify known drug targets and gene-disease associations, respectively. Each protein is embedded into a 128-dimensional vector space, capturing local network structures and enabling the identification of structurally equivalent proteins. A Naïve Bayes classifier, trained on these embeddings, achieved a recall of 0.90 and an F1 score of 0.79 in predicting potential drug targets. AutoTarget identified 3,979 novel potential druggable target proteins out of 19,333 proteins in the PPI network, which were mapped to 23,363 diseases using DisGeNET. This creates a comprehensive resource for disease-specific drug target exploration. Case studies on triple-negative breast cancer and obesity demonstrated AutoTarget’s capability to identify both established and emerging targets, such as CD44, MAPK3, and GIP. Visualization of embedding vectors using t-SNE revealed clear separations between functional protein families, including nuclear proteins, growth factor receptors, and the G proteins within the kinase proteins. This supports the method’s ability to capture biologically relevant information. However, limitations were noted, including the inability to distinguish between different types of disease-associated proteins based solely on network features. Overall, this study advances the application of machine learning and network theory for identifying druggable targets across a wide range of diseases. AutoTarget provides researchers with a valuable tool for expediting the discovery of novel druggable targets, potentially streamlining the drug discovery process. The AutoTarget code and database are publicly available to facilitate further research.
期刊介绍:
Current Research in Biotechnology (CRBIOT) is a new primary research, gold open access journal from Elsevier. CRBIOT publishes original papers, reviews, and short communications (including viewpoints and perspectives) resulting from research in biotechnology and biotech-associated disciplines.
Current Research in Biotechnology is a peer-reviewed gold open access (OA) journal and upon acceptance all articles are permanently and freely available. It is a companion to the highly regarded review journal Current Opinion in Biotechnology (2018 CiteScore 8.450) and is part of the Current Opinion and Research (CO+RE) suite of journals. All CO+RE journals leverage the Current Opinion legacy-of editorial excellence, high-impact, and global reach-to ensure they are a widely read resource that is integral to scientists' workflow.