Hub‐aware random walk graph embedding methods for classification

IF 3.6 4区数学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Statistical Analysis and Data Mining Pub Date : 2024-04-01 DOI:10.1002/sam.11676

Aleksandar Tomčić, Miloš Savić, Miloš Radovanović

{"title":"Hub‐aware random walk graph embedding methods for classification","authors":"Aleksandar Tomčić, Miloš Savić, Miloš Radovanović","doi":"10.1002/sam.11676","DOIUrl":null,"url":null,"abstract":"In the last two decades, we are witnessing a huge increase of valuable big data structured in the form of graphs or networks. To apply traditional machine learning and data analytic techniques to such data it is necessary to transform graphs into vector‐based representations that preserve the most essential structural properties of graphs. For this purpose, a large number of graph embedding methods have been proposed in the literature. Most of them produce general‐purpose embeddings suitable for a variety of applications such as node clustering, node classification, graph visualization and link prediction. In this article, we propose two novel graph embedding algorithms based on random walks that are specifically designed for the node classification problem. Random walk sampling strategies of the proposed algorithms have been designed to pay special attention to hubs–high‐degree nodes that have the most critical role for the overall connectedness in large‐scale graphs. The proposed methods are experimentally evaluated by analyzing the classification performance of three classification algorithms trained on embeddings of real‐world networks. The obtained results indicate that our methods considerably improve the predictive power of examined classifiers compared with currently the most popular random walk method for generating general‐purpose graph embeddings (node2vec).","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"60 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/sam.11676","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In the last two decades, we are witnessing a huge increase of valuable big data structured in the form of graphs or networks. To apply traditional machine learning and data analytic techniques to such data it is necessary to transform graphs into vector‐based representations that preserve the most essential structural properties of graphs. For this purpose, a large number of graph embedding methods have been proposed in the literature. Most of them produce general‐purpose embeddings suitable for a variety of applications such as node clustering, node classification, graph visualization and link prediction. In this article, we propose two novel graph embedding algorithms based on random walks that are specifically designed for the node classification problem. Random walk sampling strategies of the proposed algorithms have been designed to pay special attention to hubs–high‐degree nodes that have the most critical role for the overall connectedness in large‐scale graphs. The proposed methods are experimentally evaluated by analyzing the classification performance of three classification algorithms trained on embeddings of real‐world networks. The obtained results indicate that our methods considerably improve the predictive power of examined classifiers compared with currently the most popular random walk method for generating general‐purpose graph embeddings (node2vec).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于分类的中枢感知随机漫步图嵌入方法

在过去二十年里，我们目睹了以图形或网络为结构的有价值大数据的大量增加。要将传统的机器学习和数据分析技术应用于此类数据，就必须将图转换为基于向量的表示法，以保留图最基本的结构特性。为此，文献中提出了大量图嵌入方法。其中大多数方法产生的通用嵌入适合于各种应用，如节点聚类、节点分类、图可视化和链接预测。在本文中，我们提出了两种基于随机游走的新型图嵌入算法，专门用于解决节点分类问题。所提算法的随机游走采样策略特别关注大规模图中对整体连通性起最关键作用的枢纽--高度节点。通过分析在真实世界网络嵌入上训练的三种分类算法的分类性能，对提出的方法进行了实验评估。结果表明，与目前最流行的生成通用图嵌入的随机漫步方法（node2vec）相比，我们的方法大大提高了所研究分类器的预测能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Statistical Analysis and Data Mining COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

3.20

自引率

7.70%

发文量

期刊介绍： Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce. The focus of the journal is on papers which satisfy one or more of the following criteria: Solve data analysis problems associated with massive, complex datasets Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research. Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models Provide survey to prominent research topics.