Doppelgänger Finder: Taking Stylometry to the Underground

Sadia Afroz, Aylin Caliskan, Ariel Stolerman, R. Greenstadt, Damon McCoy
{"title":"Doppelgänger Finder: Taking Stylometry to the Underground","authors":"Sadia Afroz, Aylin Caliskan, Ariel Stolerman, R. Greenstadt, Damon McCoy","doi":"10.1109/SP.2014.21","DOIUrl":null,"url":null,"abstract":"Stylometry is a method for identifying anonymous authors of anonymous texts by analyzing their writing style. While stylometric methods have produced impressive results in previous experiments, we wanted to explore their performance on a challenging dataset of particular interest to the security research community. Analysis of underground forums can provide key information about who controls a given bot network or sells a service, and the size and scope of the cybercrime underworld. Previous analyses have been accomplished primarily through analysis of limited structured metadata and painstaking manual analysis. However, the key challenge is to automate this process, since this labor intensive manual approach clearly does not scale. We consider two scenarios. The first involves text written by an unknown cybercriminal and a set of potential suspects. This is standard, supervised stylometry problem made more difficult by multilingual forums that mix l33t-speak conversations with data dumps. In the second scenario, you want to feed a forum into an analysis engine and have it output possible doppelgangers, or users with multiple accounts. While other researchers have explored this problem, we propose a method that produces good results on actual separate accounts, as opposed to data sets created by artificially splitting authors into multiple identities. For scenario 1, we achieve 77% to 84% accuracy on private messages. For scenario 2, we achieve 94% recall with 90% precision on blogs and 85.18% precision with 82.14% recall for underground forum users. We demonstrate the utility of our approach with a case study that includes applying our technique to the Carders forum and manual analysis to validate the results, enabling the discovery of previously undetected doppelganger accounts.","PeriodicalId":196038,"journal":{"name":"2014 IEEE Symposium on Security and Privacy","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"128","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Symposium on Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP.2014.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 128

Abstract

Stylometry is a method for identifying anonymous authors of anonymous texts by analyzing their writing style. While stylometric methods have produced impressive results in previous experiments, we wanted to explore their performance on a challenging dataset of particular interest to the security research community. Analysis of underground forums can provide key information about who controls a given bot network or sells a service, and the size and scope of the cybercrime underworld. Previous analyses have been accomplished primarily through analysis of limited structured metadata and painstaking manual analysis. However, the key challenge is to automate this process, since this labor intensive manual approach clearly does not scale. We consider two scenarios. The first involves text written by an unknown cybercriminal and a set of potential suspects. This is standard, supervised stylometry problem made more difficult by multilingual forums that mix l33t-speak conversations with data dumps. In the second scenario, you want to feed a forum into an analysis engine and have it output possible doppelgangers, or users with multiple accounts. While other researchers have explored this problem, we propose a method that produces good results on actual separate accounts, as opposed to data sets created by artificially splitting authors into multiple identities. For scenario 1, we achieve 77% to 84% accuracy on private messages. For scenario 2, we achieve 94% recall with 90% precision on blogs and 85.18% precision with 82.14% recall for underground forum users. We demonstrate the utility of our approach with a case study that includes applying our technique to the Carders forum and manual analysis to validate the results, enabling the discovery of previously undetected doppelganger accounts.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Doppelgänger发现者:把文体学带到地下
文体学是一种通过分析匿名文本的匿名作者的写作风格来识别他们的方法。虽然风格度量方法在以前的实验中产生了令人印象深刻的结果,但我们想在安全研究社区特别感兴趣的具有挑战性的数据集上探索它们的性能。对地下论坛的分析可以提供关键信息,比如谁控制了一个给定的机器人网络或销售一项服务,以及地下网络犯罪的规模和范围。以前的分析主要是通过分析有限的结构化元数据和艰苦的手工分析来完成的。然而,关键的挑战是自动化这个过程,因为这种劳动密集型的手工方法显然是不可伸缩的。我们考虑两种情况。第一种是由一名未知的网络罪犯和一组潜在嫌疑人撰写的文本。这是一个标准的、有监督的文体学问题,由于多语言论坛将l33t-speak对话与数据转储混合在一起,这个问题变得更加困难。在第二个场景中,您希望向分析引擎提供一个论坛,并让它输出可能的“二重身”,即拥有多个帐户的用户。虽然其他研究人员已经探索了这个问题,但我们提出了一种方法,可以在实际的单独帐户上产生良好的结果,而不是通过人为地将作者划分为多个身份创建数据集。对于场景1,我们在私人消息上实现了77%到84%的准确率。对于场景2,我们在博客上实现了94%的召回率和90%的召回率,在地下论坛用户上实现了85.18%的召回率和82.14%的召回率。我们通过一个案例研究展示了我们的方法的实用性,其中包括将我们的技术应用于Carders论坛和手动分析以验证结果,从而发现以前未检测到的二重身帐户。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Wysteria: A Programming Language for Generic, Mixed-Mode Multiparty Computations From Zygote to Morula: Fortifying Weakened ASLR on Android Quantifying Information Flow for Dynamic Secrets KCoFI: Complete Control-Flow Integrity for Commodity Operating System Kernels Analyzing Forged SSL Certificates in the Wild
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1