{"title":"Finding the Source in Networks: An Approach Based on Structural Entropy","authors":"Chong Zhang, Qiang Guo, Luoyi Fu, Jiaxin Ding, Xinde Cao, Fei Long, Xinbing Wang, Cheng Zhou","doi":"10.1145/3568309","DOIUrl":null,"url":null,"abstract":"The popularity of intelligent devices provides straightforward access to the Internet and online social networks. However, the quick and easy data updates from networks also benefit the risk spreading, such as rumor, malware, or computer viruses. To this end, this article studies the problem of source detection, which is to infer the source node out of an aftermath of a cascade, that is, the observed infected graph GN of the network at some time. Prior arts have adopted various statistical quantities such as degree, distance, or infection size to reflect the structural centrality of the source. In this article, we propose a new metric that we call the infected tree entropy (ITE), to utilize richer underlying structural features for source detection. Our idea of ITE is inspired by the conception of structural entropy [21], which demonstrated that the minimization of average bits to encode the network structures with different partitions is the principle for detecting the natural or true structures in real-world networks. Accordingly, our proposed ITE based estimator for the source tries to minimize the coding of network partitions brought by the infected tree rooted at all the potential sources, thus minimizing the structural deviation between the cascades from the potential sources and the actual infection process included in GN. On polynomially growing geometric trees, with increasing tree heterogeneity, the ITE estimator remarkably yields more reliable detection under only moderate infection sizes, and returns an asymptotically complete detection. In contrast, for regular expanding trees, we still observe guaranteed detection probability of ITE estimator even with an infinite infection size, thanks to the degree regularity property. We also algorithmically realize the ITE based detection that enjoys linear time complexity via a message-passing scheme, and further extend it to general graphs. Extensive experiments on synthetic and real datasets confirm the superiority of ITE to the baselines. For example, ITE returns an accuracy of 85%, ranking the source among the top 10%, far exceeding 55% of the classic algorithm on scale-free networks.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"23 1","pages":"1 - 25"},"PeriodicalIF":3.9000,"publicationDate":"2023-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3568309","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The popularity of intelligent devices provides straightforward access to the Internet and online social networks. However, the quick and easy data updates from networks also benefit the risk spreading, such as rumor, malware, or computer viruses. To this end, this article studies the problem of source detection, which is to infer the source node out of an aftermath of a cascade, that is, the observed infected graph GN of the network at some time. Prior arts have adopted various statistical quantities such as degree, distance, or infection size to reflect the structural centrality of the source. In this article, we propose a new metric that we call the infected tree entropy (ITE), to utilize richer underlying structural features for source detection. Our idea of ITE is inspired by the conception of structural entropy [21], which demonstrated that the minimization of average bits to encode the network structures with different partitions is the principle for detecting the natural or true structures in real-world networks. Accordingly, our proposed ITE based estimator for the source tries to minimize the coding of network partitions brought by the infected tree rooted at all the potential sources, thus minimizing the structural deviation between the cascades from the potential sources and the actual infection process included in GN. On polynomially growing geometric trees, with increasing tree heterogeneity, the ITE estimator remarkably yields more reliable detection under only moderate infection sizes, and returns an asymptotically complete detection. In contrast, for regular expanding trees, we still observe guaranteed detection probability of ITE estimator even with an infinite infection size, thanks to the degree regularity property. We also algorithmically realize the ITE based detection that enjoys linear time complexity via a message-passing scheme, and further extend it to general graphs. Extensive experiments on synthetic and real datasets confirm the superiority of ITE to the baselines. For example, ITE returns an accuracy of 85%, ranking the source among the top 10%, far exceeding 55% of the classic algorithm on scale-free networks.
期刊介绍:
ACM Transactions on Internet Technology (TOIT) brings together many computing disciplines including computer software engineering, computer programming languages, middleware, database management, security, knowledge discovery and data mining, networking and distributed systems, communications, performance and scalability etc. TOIT will cover the results and roles of the individual disciplines and the relationshipsamong them.