社会网络签名:网络数据再识别的框架与实验结果

Shawndra Hill, A. Nagle
{"title":"社会网络签名:网络数据再识别的框架与实验结果","authors":"Shawndra Hill, A. Nagle","doi":"10.2139/ssrn.1341394","DOIUrl":null,"url":null,"abstract":"Data on large dynamic social networks, such as telecommunications networks and the Internet, are pervasive. However, few methods conducive to efficient large-scale analysis exist. In this paper, we focus on the task of re-identification. Re-identification in the context of dynamic networks is a matching problem that involves comparing the behavior of networked entities across two time periods. Prior research has reported success in the domains of e-mail alias detection, author attribution, and identifying fraudulent consumers in the telecommunications industry. In this work, we address the question of \"why are we able to re-identify entities on real world dynamic networks?\" Our contribution is two-fold. First, we address the challenge of scale with a framework for matching that does not require pairwise comparisons to ascertain the similarity scores between networked entities. Second, we show our method is robust against missing links but less tolerant to noise. Using our framework, we provide a performance estimate for re-identification on networks based solely on their degree distribution and dynamics. This work has significant implications for re-identification problems where scale is a challenge as well as for problems where false negatives (e.g.,when fraudulent consumers are not labeled as fraudulent) cannot be observed.","PeriodicalId":425748,"journal":{"name":"2009 International Conference on Computational Aspects of Social Networks","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Social Network Signatures: A Framework for Re-identification in Networked Data and Experimental Results\",\"authors\":\"Shawndra Hill, A. Nagle\",\"doi\":\"10.2139/ssrn.1341394\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data on large dynamic social networks, such as telecommunications networks and the Internet, are pervasive. However, few methods conducive to efficient large-scale analysis exist. In this paper, we focus on the task of re-identification. Re-identification in the context of dynamic networks is a matching problem that involves comparing the behavior of networked entities across two time periods. Prior research has reported success in the domains of e-mail alias detection, author attribution, and identifying fraudulent consumers in the telecommunications industry. In this work, we address the question of \\\"why are we able to re-identify entities on real world dynamic networks?\\\" Our contribution is two-fold. First, we address the challenge of scale with a framework for matching that does not require pairwise comparisons to ascertain the similarity scores between networked entities. Second, we show our method is robust against missing links but less tolerant to noise. Using our framework, we provide a performance estimate for re-identification on networks based solely on their degree distribution and dynamics. This work has significant implications for re-identification problems where scale is a challenge as well as for problems where false negatives (e.g.,when fraudulent consumers are not labeled as fraudulent) cannot be observed.\",\"PeriodicalId\":425748,\"journal\":{\"name\":\"2009 International Conference on Computational Aspects of Social Networks\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 International Conference on Computational Aspects of Social Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.1341394\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Computational Aspects of Social Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.1341394","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

大型动态社会网络(如电信网络和互联网)上的数据无处不在。然而,很少有方法能够有效地进行大规模分析。在本文中,我们关注的是再识别任务。在动态网络环境中,重新识别是一个匹配问题,涉及比较两个时间段内网络实体的行为。先前的研究已经报告了在电子邮件别名检测、作者归属和识别电信行业欺诈消费者等领域的成功。在这项工作中,我们解决了“为什么我们能够在现实世界的动态网络中重新识别实体?”我们的贡献是双重的。首先,我们用一个匹配框架来解决规模的挑战,该框架不需要两两比较来确定网络实体之间的相似性得分。其次,我们证明了我们的方法对缺失链接具有鲁棒性,但对噪声的容忍度较低。使用我们的框架,我们仅基于网络的度分布和动态提供了对网络重新识别的性能估计。这项工作对于重新识别规模是一个挑战的问题,以及无法观察到假阴性(例如,当欺诈消费者没有被标记为欺诈)的问题具有重要意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Social Network Signatures: A Framework for Re-identification in Networked Data and Experimental Results
Data on large dynamic social networks, such as telecommunications networks and the Internet, are pervasive. However, few methods conducive to efficient large-scale analysis exist. In this paper, we focus on the task of re-identification. Re-identification in the context of dynamic networks is a matching problem that involves comparing the behavior of networked entities across two time periods. Prior research has reported success in the domains of e-mail alias detection, author attribution, and identifying fraudulent consumers in the telecommunications industry. In this work, we address the question of "why are we able to re-identify entities on real world dynamic networks?" Our contribution is two-fold. First, we address the challenge of scale with a framework for matching that does not require pairwise comparisons to ascertain the similarity scores between networked entities. Second, we show our method is robust against missing links but less tolerant to noise. Using our framework, we provide a performance estimate for re-identification on networks based solely on their degree distribution and dynamics. This work has significant implications for re-identification problems where scale is a challenge as well as for problems where false negatives (e.g.,when fraudulent consumers are not labeled as fraudulent) cannot be observed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Review-Based Ranking of Wikipedia Articles The Windmill Method for Setting up Support for Resolving Sparse Incidents in Communication Networks The Hybrid Reasoning Algorithm of ß-PSML Social Aspects of Web Page Contents Sentence Factorization for Opinion Feature Mining
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1