社会网络签名:网络数据再识别的框架与实验结果

2009 International Conference on Computational Aspects of Social Networks Pub Date : 2009-02-11 DOI:10.2139/ssrn.1341394

Shawndra Hill, A. Nagle

{"title":"社会网络签名:网络数据再识别的框架与实验结果","authors":"Shawndra Hill, A. Nagle","doi":"10.2139/ssrn.1341394","DOIUrl":null,"url":null,"abstract":"Data on large dynamic social networks, such as telecommunications networks and the Internet, are pervasive. However, few methods conducive to efficient large-scale analysis exist. In this paper, we focus on the task of re-identification. Re-identification in the context of dynamic networks is a matching problem that involves comparing the behavior of networked entities across two time periods. Prior research has reported success in the domains of e-mail alias detection, author attribution, and identifying fraudulent consumers in the telecommunications industry. In this work, we address the question of \"why are we able to re-identify entities on real world dynamic networks?\" Our contribution is two-fold. First, we address the challenge of scale with a framework for matching that does not require pairwise comparisons to ascertain the similarity scores between networked entities. Second, we show our method is robust against missing links but less tolerant to noise. Using our framework, we provide a performance estimate for re-identification on networks based solely on their degree distribution and dynamics. This work has significant implications for re-identification problems where scale is a challenge as well as for problems where false negatives (e.g.,when fraudulent consumers are not labeled as fraudulent) cannot be observed.","PeriodicalId":425748,"journal":{"name":"2009 International Conference on Computational Aspects of Social Networks","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Social Network Signatures: A Framework for Re-identification in Networked Data and Experimental Results\",\"authors\":\"Shawndra Hill, A. Nagle\",\"doi\":\"10.2139/ssrn.1341394\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data on large dynamic social networks, such as telecommunications networks and the Internet, are pervasive. However, few methods conducive to efficient large-scale analysis exist. In this paper, we focus on the task of re-identification. Re-identification in the context of dynamic networks is a matching problem that involves comparing the behavior of networked entities across two time periods. Prior research has reported success in the domains of e-mail alias detection, author attribution, and identifying fraudulent consumers in the telecommunications industry. In this work, we address the question of \\\"why are we able to re-identify entities on real world dynamic networks?\\\" Our contribution is two-fold. First, we address the challenge of scale with a framework for matching that does not require pairwise comparisons to ascertain the similarity scores between networked entities. Second, we show our method is robust against missing links but less tolerant to noise. Using our framework, we provide a performance estimate for re-identification on networks based solely on their degree distribution and dynamics. This work has significant implications for re-identification problems where scale is a challenge as well as for problems where false negatives (e.g.,when fraudulent consumers are not labeled as fraudulent) cannot be observed.\",\"PeriodicalId\":425748,\"journal\":{\"name\":\"2009 International Conference on Computational Aspects of Social Networks\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 International Conference on Computational Aspects of Social Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.1341394\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Computational Aspects of Social Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.1341394","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

大型动态社会网络(如电信网络和互联网)上的数据无处不在。然而，很少有方法能够有效地进行大规模分析。在本文中，我们关注的是再识别任务。在动态网络环境中，重新识别是一个匹配问题，涉及比较两个时间段内网络实体的行为。先前的研究已经报告了在电子邮件别名检测、作者归属和识别电信行业欺诈消费者等领域的成功。在这项工作中，我们解决了“为什么我们能够在现实世界的动态网络中重新识别实体?”我们的贡献是双重的。首先，我们用一个匹配框架来解决规模的挑战，该框架不需要两两比较来确定网络实体之间的相似性得分。其次，我们证明了我们的方法对缺失链接具有鲁棒性，但对噪声的容忍度较低。使用我们的框架，我们仅基于网络的度分布和动态提供了对网络重新识别的性能估计。这项工作对于重新识别规模是一个挑战的问题，以及无法观察到假阴性(例如，当欺诈消费者没有被标记为欺诈)的问题具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Social Network Signatures: A Framework for Re-identification in Networked Data and Experimental Results

Data on large dynamic social networks, such as telecommunications networks and the Internet, are pervasive. However, few methods conducive to efficient large-scale analysis exist. In this paper, we focus on the task of re-identification. Re-identification in the context of dynamic networks is a matching problem that involves comparing the behavior of networked entities across two time periods. Prior research has reported success in the domains of e-mail alias detection, author attribution, and identifying fraudulent consumers in the telecommunications industry. In this work, we address the question of "why are we able to re-identify entities on real world dynamic networks?" Our contribution is two-fold. First, we address the challenge of scale with a framework for matching that does not require pairwise comparisons to ascertain the similarity scores between networked entities. Second, we show our method is robust against missing links but less tolerant to noise. Using our framework, we provide a performance estimate for re-identification on networks based solely on their degree distribution and dynamics. This work has significant implications for re-identification problems where scale is a challenge as well as for problems where false negatives (e.g.,when fraudulent consumers are not labeled as fraudulent) cannot be observed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助