AutoTarget: Disease-Associated druggable target identification via node representation learning in PPI networks

IF 4 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Current Research in Biotechnology Pub Date : 2024-01-01 DOI:10.1016/j.crbiot.2024.100260

Hyunseung Kong , Inyoung Kim , Byoung-Tak Zhang

{"title":"AutoTarget: Disease-Associated druggable target identification via node representation learning in PPI networks","authors":"Hyunseung Kong , Inyoung Kim , Byoung-Tak Zhang","doi":"10.1016/j.crbiot.2024.100260","DOIUrl":null,"url":null,"abstract":"<div><div>Drug target discovery, a pivotal early stage in drug development, is resource-intensive and crucial for ensuring drug efficacy. This study presents AutoTarget, a novel computational pipeline designed to identify disease-associated druggable targets by applying node representation learning to protein–protein interaction (PPI) networks. AutoTarget uses node2vec + for node classification, incorporating neighborhood context and structural equivalence in PPI networks derived from the STRING database. Data from the Therapeutic Target Database (TTD) and DisGeNET were integrated to identify known drug targets and gene-disease associations, respectively. Each protein is embedded into a 128-dimensional vector space, capturing local network structures and enabling the identification of structurally equivalent proteins. A Naïve Bayes classifier, trained on these embeddings, achieved a recall of 0.90 and an F1 score of 0.79 in predicting potential drug targets. AutoTarget identified 3,979 novel potential druggable target proteins out of 19,333 proteins in the PPI network, which were mapped to 23,363 diseases using DisGeNET. This creates a comprehensive resource for disease-specific drug target exploration. Case studies on triple-negative breast cancer and obesity demonstrated AutoTarget’s capability to identify both established and emerging targets, such as CD44, MAPK3, and GIP. Visualization of embedding vectors using t-SNE revealed clear separations between functional protein families, including nuclear proteins, growth factor receptors, and the G proteins within the kinase proteins. This supports the method’s ability to capture biologically relevant information. However, limitations were noted, including the inability to distinguish between different types of disease-associated proteins based solely on network features. Overall, this study advances the application of machine learning and network theory for identifying druggable targets across a wide range of diseases. AutoTarget provides researchers with a valuable tool for expediting the discovery of novel druggable targets, potentially streamlining the drug discovery process. The AutoTarget code and database are publicly available to facilitate further research.</div></div>","PeriodicalId":52676,"journal":{"name":"Current Research in Biotechnology","volume":"8 ","pages":"Article 100260"},"PeriodicalIF":4.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Research in Biotechnology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590262824000868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Drug target discovery, a pivotal early stage in drug development, is resource-intensive and crucial for ensuring drug efficacy. This study presents AutoTarget, a novel computational pipeline designed to identify disease-associated druggable targets by applying node representation learning to protein–protein interaction (PPI) networks. AutoTarget uses node2vec + for node classification, incorporating neighborhood context and structural equivalence in PPI networks derived from the STRING database. Data from the Therapeutic Target Database (TTD) and DisGeNET were integrated to identify known drug targets and gene-disease associations, respectively. Each protein is embedded into a 128-dimensional vector space, capturing local network structures and enabling the identification of structurally equivalent proteins. A Naïve Bayes classifier, trained on these embeddings, achieved a recall of 0.90 and an F1 score of 0.79 in predicting potential drug targets. AutoTarget identified 3,979 novel potential druggable target proteins out of 19,333 proteins in the PPI network, which were mapped to 23,363 diseases using DisGeNET. This creates a comprehensive resource for disease-specific drug target exploration. Case studies on triple-negative breast cancer and obesity demonstrated AutoTarget’s capability to identify both established and emerging targets, such as CD44, MAPK3, and GIP. Visualization of embedding vectors using t-SNE revealed clear separations between functional protein families, including nuclear proteins, growth factor receptors, and the G proteins within the kinase proteins. This supports the method’s ability to capture biologically relevant information. However, limitations were noted, including the inability to distinguish between different types of disease-associated proteins based solely on network features. Overall, this study advances the application of machine learning and network theory for identifying druggable targets across a wide range of diseases. AutoTarget provides researchers with a valuable tool for expediting the discovery of novel druggable targets, potentially streamlining the drug discovery process. The AutoTarget code and database are publicly available to facilitate further research.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

AutoTarget：通过 PPI 网络中的节点表示学习识别疾病相关的可药物靶点

药物靶点发现是药物开发的关键性早期阶段，需要耗费大量资源，对确保药物疗效至关重要。本研究介绍了 AutoTarget，这是一种新型计算管道，旨在通过将节点表示学习应用于蛋白质-蛋白质相互作用（PPI）网络来识别与疾病相关的可药物靶点。AutoTarget 使用 node2vec + 进行节点分类，将邻域上下文和结构等价性纳入 STRING 数据库中的 PPI 网络。治疗靶点数据库（TTD）和 DisGeNET 的数据被整合在一起，分别用于识别已知的药物靶点和基因-疾病关联。每个蛋白质都被嵌入到一个 128 维的向量空间中，从而捕捉局部网络结构并识别结构上等同的蛋白质。根据这些嵌入训练的奈夫贝叶斯分类器在预测潜在药物靶点方面的召回率为 0.90，F1 得分为 0.79。在 PPI 网络中的 19,333 个蛋白质中，AutoTarget 发现了 3,979 个新的潜在药物靶标蛋白质，这些蛋白质通过 DisGeNET 映射到 23,363 种疾病上。这为探索特定疾病的药物靶点提供了全面的资源。关于三阴性乳腺癌和肥胖症的案例研究表明，AutoTarget 有能力识别 CD44、MAPK3 和 GIP 等已确立的和新出现的靶点。使用 t-SNE 对嵌入载体进行可视化显示了功能蛋白家族之间的明显分离，包括核蛋白、生长因子受体和激酶蛋白中的 G 蛋白。这支持了该方法捕捉生物相关信息的能力。不过，该方法也存在局限性，包括无法仅根据网络特征区分不同类型的疾病相关蛋白。总之，这项研究推动了机器学习和网络理论在识别各种疾病的可药靶点方面的应用。AutoTarget 为研究人员加快发现新型可药用靶点提供了宝贵的工具，有可能简化药物发现过程。AutoTarget 的代码和数据库都是公开的，以便于进一步的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Current Research in Biotechnology Biochemistry, Genetics and Molecular Biology-Biotechnology

CiteScore

6.70

自引率

3.60%

发文量

审稿时长

38 days

期刊介绍： Current Research in Biotechnology (CRBIOT) is a new primary research, gold open access journal from Elsevier. CRBIOT publishes original papers, reviews, and short communications (including viewpoints and perspectives) resulting from research in biotechnology and biotech-associated disciplines. Current Research in Biotechnology is a peer-reviewed gold open access (OA) journal and upon acceptance all articles are permanently and freely available. It is a companion to the highly regarded review journal Current Opinion in Biotechnology (2018 CiteScore 8.450) and is part of the Current Opinion and Research (CO+RE) suite of journals. All CO+RE journals leverage the Current Opinion legacy-of editorial excellence, high-impact, and global reach-to ensure they are a widely read resource that is integral to scientists' workflow.