{"title":"用于少数群体增量的邻域分布学习","authors":"Mengting Zhou;Zhiguo Gong","doi":"10.1109/TKDE.2024.3447014","DOIUrl":null,"url":null,"abstract":"Graph Neural Networks (GNNs) have achieved remarkable success in graph-based tasks. However, learning unbiased node representations under class-imbalanced training data remains challenging. Existing solutions may face overfitting due to extensive reuse of those limited labeled data in minority classes. Furthermore, many works address the class-imbalanced issue based on the embeddings generated from the biased GNNs, which make models intrinsically biased towards majority classes. In this paper, we propose a novel data augmentation strategy GraphGLS for semi-supervised class-imbalanced node classification, which aims to select informative unlabeled nodes to augment minority classes with consideration of both global and local information. Specifically, we first design a Global Selection module to learn global information (pseudo-labels) for unlabeled nodes and then select potential ones from them for minority classes. The Local Selection module further conducts filtering over those potential nodes by comparing their neighbor distributions with minority classes. To achieve this, we further design a neighbor distribution auto-encoder to learn a robust node-level neighbor distribution for each node. Then, we define class-level neighbor distribution to capture the overall neighbor characteristics of nodes within the same class. We conduct extensive experiments on multiple datasets, and the results demonstrate the superiority of GraphGLS over state-of-the-art baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8901-8913"},"PeriodicalIF":8.9000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Neighbor Distribution Learning for Minority Class Augmentation\",\"authors\":\"Mengting Zhou;Zhiguo Gong\",\"doi\":\"10.1109/TKDE.2024.3447014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph Neural Networks (GNNs) have achieved remarkable success in graph-based tasks. However, learning unbiased node representations under class-imbalanced training data remains challenging. Existing solutions may face overfitting due to extensive reuse of those limited labeled data in minority classes. Furthermore, many works address the class-imbalanced issue based on the embeddings generated from the biased GNNs, which make models intrinsically biased towards majority classes. In this paper, we propose a novel data augmentation strategy GraphGLS for semi-supervised class-imbalanced node classification, which aims to select informative unlabeled nodes to augment minority classes with consideration of both global and local information. Specifically, we first design a Global Selection module to learn global information (pseudo-labels) for unlabeled nodes and then select potential ones from them for minority classes. The Local Selection module further conducts filtering over those potential nodes by comparing their neighbor distributions with minority classes. To achieve this, we further design a neighbor distribution auto-encoder to learn a robust node-level neighbor distribution for each node. Then, we define class-level neighbor distribution to capture the overall neighbor characteristics of nodes within the same class. We conduct extensive experiments on multiple datasets, and the results demonstrate the superiority of GraphGLS over state-of-the-art baselines.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"36 12\",\"pages\":\"8901-8913\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2024-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10643319/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10643319/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Neighbor Distribution Learning for Minority Class Augmentation
Graph Neural Networks (GNNs) have achieved remarkable success in graph-based tasks. However, learning unbiased node representations under class-imbalanced training data remains challenging. Existing solutions may face overfitting due to extensive reuse of those limited labeled data in minority classes. Furthermore, many works address the class-imbalanced issue based on the embeddings generated from the biased GNNs, which make models intrinsically biased towards majority classes. In this paper, we propose a novel data augmentation strategy GraphGLS for semi-supervised class-imbalanced node classification, which aims to select informative unlabeled nodes to augment minority classes with consideration of both global and local information. Specifically, we first design a Global Selection module to learn global information (pseudo-labels) for unlabeled nodes and then select potential ones from them for minority classes. The Local Selection module further conducts filtering over those potential nodes by comparing their neighbor distributions with minority classes. To achieve this, we further design a neighbor distribution auto-encoder to learn a robust node-level neighbor distribution for each node. Then, we define class-level neighbor distribution to capture the overall neighbor characteristics of nodes within the same class. We conduct extensive experiments on multiple datasets, and the results demonstrate the superiority of GraphGLS over state-of-the-art baselines.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.