{"title":"预算受限的意愿最大化自我网络提取","authors":"Bay-Yuan Hsu;Chia-Hsun Lu;Ming-Yi Chang;Chih-Ying Tseng;Chih-Ya Shen","doi":"10.1109/TKDE.2024.3446169","DOIUrl":null,"url":null,"abstract":"Many large-scale machine learning approaches and graph algorithms are proposed recently to address a variety of problems in online social networks (OSNs). To evaluate and validate these algorithms and models, the data of ego-centric networks (ego networks) are widely adopted. Therefore, effectively extracting large-scale ego networks from OSNs becomes an important issue, particularly when privacy policies become increasingly strict nowadays. In this paper, we study the problem of extracting ego network data by considering jointly the user willingness, crawling cost, and structure of the network. We formulate a new research problem, named \n<i>Structure and Willingness Aware Ego Network Extraction (SWAN)</i>\n and analyze its NP-hardness. We first propose a \n<inline-formula><tex-math>$(1-\\frac{1}{e})$</tex-math></inline-formula>\n-approximation algorithm, named \n<i>Tristar-Optimized Ego Network Identification with Maximum Willingness (TOMW)</i>\n. In addition to the deterministic approximation algorithm, we also propose to automatically \n<i>learn</i>\n an effective heuristic approach with machine learning, to avoid the huge efforts for human to devise a good algorithm. The learning approach is named \n<i>Willingness-maximized and Structure-aware Ego Network Extraction with Reinforcement Learning (WSRL)</i>\n, in which we propose a novel constrastive learning strategy, named \n<i>Contrastive Learning with Performance-boosting Graph Augmentation</i>\n. We recruited 1,810 real-world participants and conducted an evaluation study to validate our problem formulation and proposed approaches. Moreover, experimental results on real social network datasets show that the proposed approaches outperform the other baselines significantly.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7692-7707"},"PeriodicalIF":8.9000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Budget-Constrained Ego Network Extraction With Maximized Willingness\",\"authors\":\"Bay-Yuan Hsu;Chia-Hsun Lu;Ming-Yi Chang;Chih-Ying Tseng;Chih-Ya Shen\",\"doi\":\"10.1109/TKDE.2024.3446169\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many large-scale machine learning approaches and graph algorithms are proposed recently to address a variety of problems in online social networks (OSNs). To evaluate and validate these algorithms and models, the data of ego-centric networks (ego networks) are widely adopted. Therefore, effectively extracting large-scale ego networks from OSNs becomes an important issue, particularly when privacy policies become increasingly strict nowadays. In this paper, we study the problem of extracting ego network data by considering jointly the user willingness, crawling cost, and structure of the network. We formulate a new research problem, named \\n<i>Structure and Willingness Aware Ego Network Extraction (SWAN)</i>\\n and analyze its NP-hardness. We first propose a \\n<inline-formula><tex-math>$(1-\\\\frac{1}{e})$</tex-math></inline-formula>\\n-approximation algorithm, named \\n<i>Tristar-Optimized Ego Network Identification with Maximum Willingness (TOMW)</i>\\n. In addition to the deterministic approximation algorithm, we also propose to automatically \\n<i>learn</i>\\n an effective heuristic approach with machine learning, to avoid the huge efforts for human to devise a good algorithm. The learning approach is named \\n<i>Willingness-maximized and Structure-aware Ego Network Extraction with Reinforcement Learning (WSRL)</i>\\n, in which we propose a novel constrastive learning strategy, named \\n<i>Contrastive Learning with Performance-boosting Graph Augmentation</i>\\n. We recruited 1,810 real-world participants and conducted an evaluation study to validate our problem formulation and proposed approaches. Moreover, experimental results on real social network datasets show that the proposed approaches outperform the other baselines significantly.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"36 12\",\"pages\":\"7692-7707\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10640244/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10640244/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
最近提出了许多大规模机器学习方法和图算法,以解决在线社交网络(OSN)中的各种问题。为了评估和验证这些算法和模型,以自我为中心的网络(自我网络)数据被广泛采用。因此,有效地从 OSN 中提取大规模自我网络成为一个重要问题,尤其是在隐私政策日益严格的今天。本文通过综合考虑用户意愿、抓取成本和网络结构,研究了提取自我网络数据的问题。我们提出了一个新的研究问题,命名为 "结构和意愿感知自我网络提取(SWAN)",并分析了它的 NP 难度。我们首先提出了一种$(1-\frac{1}{e})$近似算法,命名为具有最大意愿的三星优化自我网络识别(TOMW)。除了确定性近似算法外,我们还建议利用机器学习自动学习一种有效的启发式方法,以避免人类为设计出一种好算法而付出巨大努力。这种学习方法被命名为 "意愿最大化和结构感知自我网络提取与强化学习(WSRL)",其中我们提出了一种新颖的对比学习策略,即 "性能提升图增强对比学习(Contrastive Learning with Performance-boosting Graph Augmentation)"。我们招募了 1,810 名真实世界的参与者,并开展了一项评估研究,以验证我们提出的问题和方法。此外,在真实社交网络数据集上的实验结果表明,所提出的方法明显优于其他基线方法。
Budget-Constrained Ego Network Extraction With Maximized Willingness
Many large-scale machine learning approaches and graph algorithms are proposed recently to address a variety of problems in online social networks (OSNs). To evaluate and validate these algorithms and models, the data of ego-centric networks (ego networks) are widely adopted. Therefore, effectively extracting large-scale ego networks from OSNs becomes an important issue, particularly when privacy policies become increasingly strict nowadays. In this paper, we study the problem of extracting ego network data by considering jointly the user willingness, crawling cost, and structure of the network. We formulate a new research problem, named
Structure and Willingness Aware Ego Network Extraction (SWAN)
and analyze its NP-hardness. We first propose a
$(1-\frac{1}{e})$
-approximation algorithm, named
Tristar-Optimized Ego Network Identification with Maximum Willingness (TOMW)
. In addition to the deterministic approximation algorithm, we also propose to automatically
learn
an effective heuristic approach with machine learning, to avoid the huge efforts for human to devise a good algorithm. The learning approach is named
Willingness-maximized and Structure-aware Ego Network Extraction with Reinforcement Learning (WSRL)
, in which we propose a novel constrastive learning strategy, named
Contrastive Learning with Performance-boosting Graph Augmentation
. We recruited 1,810 real-world participants and conducted an evaluation study to validate our problem formulation and proposed approaches. Moreover, experimental results on real social network datasets show that the proposed approaches outperform the other baselines significantly.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.