Stylometric Anonymity: Is Imitation the Best Strategy?

Mahmoud Khonji, Y. Iraqi
{"title":"Stylometric Anonymity: Is Imitation the Best Strategy?","authors":"Mahmoud Khonji, Y. Iraqi","doi":"10.1109/Trustcom.2015.472","DOIUrl":null,"url":null,"abstract":"Stylometry analysis of given electronic texts can allow for the extraction of information about their authors by analyzing the stylistic choices the authors make to write their texts. Such extracted information could be the identity of suspect authors or their profile attributes such as their gender, age group, ethnicity group, etc. Therefore, when preserving the anonymity of an author is critical, such as that of a whistle blower, it is important to ensure the stylistic anonymity of the conveyed text itself in addition to anonymizing communication channels (e.g. Tor, or the minimization of application fingerprints). Currently, only two stylistic anonymization strategies are known, namely: imitation and obfuscation attacks. A long-term objective is to find automated methods that reliably transform given input texts such that the output texts maximize author anonymity while, reasonably, preserving the semantics of the input texts. Before one proceeds with such long-term objective, it is important to first identify effective strategies that maximize stylistic anonymity. The current state of the literature implies that imitation attacks are better at preserving the anonymity of authors than obfuscation. However, we argue that such evaluations are limited and should not generalize to stylistic anonymity as they were only executed against AA solvers, a closed-set problem. In this study, we extend such evaluations against state-of-the-art AV solvers, an open-set problem. Our results show that imitation attacks degrade the classification accuracy of AV solvers more aggressively than that of AA solvers. We argue that such reduction in accuracy below random chance guessing renders imitation attacks as inferior strategies relative to obfuscation attacks. Furthermore, as we present a general formal notation of stylometry problems, we conjecture that the same observations apply to all stylometry problems (AA, AV, AP, SI).","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"156 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Trustcom/BigDataSE/ISPA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Trustcom.2015.472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Stylometry analysis of given electronic texts can allow for the extraction of information about their authors by analyzing the stylistic choices the authors make to write their texts. Such extracted information could be the identity of suspect authors or their profile attributes such as their gender, age group, ethnicity group, etc. Therefore, when preserving the anonymity of an author is critical, such as that of a whistle blower, it is important to ensure the stylistic anonymity of the conveyed text itself in addition to anonymizing communication channels (e.g. Tor, or the minimization of application fingerprints). Currently, only two stylistic anonymization strategies are known, namely: imitation and obfuscation attacks. A long-term objective is to find automated methods that reliably transform given input texts such that the output texts maximize author anonymity while, reasonably, preserving the semantics of the input texts. Before one proceeds with such long-term objective, it is important to first identify effective strategies that maximize stylistic anonymity. The current state of the literature implies that imitation attacks are better at preserving the anonymity of authors than obfuscation. However, we argue that such evaluations are limited and should not generalize to stylistic anonymity as they were only executed against AA solvers, a closed-set problem. In this study, we extend such evaluations against state-of-the-art AV solvers, an open-set problem. Our results show that imitation attacks degrade the classification accuracy of AV solvers more aggressively than that of AA solvers. We argue that such reduction in accuracy below random chance guessing renders imitation attacks as inferior strategies relative to obfuscation attacks. Furthermore, as we present a general formal notation of stylometry problems, we conjecture that the same observations apply to all stylometry problems (AA, AV, AP, SI).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
文体匿名:模仿是最好的策略吗?
对给定的电子文本进行文体学分析,可以通过分析作者撰写文本的风格选择来提取有关作者的信息。这些提取的信息可以是可疑作者的身份或他们的个人资料属性,如性别、年龄、种族等。因此,当保持作者的匿名性至关重要时,例如举报人的匿名性,除了匿名化通信渠道(例如Tor,或最小化应用程序指纹)之外,确保所传达文本本身的风格匿名性也很重要。目前已知的文体匿名化策略只有两种,即模仿攻击和混淆攻击。长期目标是找到可靠地转换给定输入文本的自动化方法,使输出文本最大限度地提高作者匿名性,同时合理地保留输入文本的语义。在实现这样的长期目标之前,重要的是首先确定有效的策略,使风格匿名最大化。目前的文献表明,模仿攻击比混淆攻击更能保护作者的匿名性。然而,我们认为这样的评估是有限的,不应该推广到风格匿名,因为它们只针对AA求解器执行,这是一个闭集问题。在这项研究中,我们将这种评估扩展到最先进的自动驾驶求解器,一个开集问题。我们的研究结果表明,模仿攻击对AV求解器的分类精度的降低比AA求解器更严重。我们认为,这种准确度低于随机猜测的降低使得模仿攻击相对于混淆攻击而言是较差的策略。此外,由于我们提出了文体学问题的一般形式表示法,我们推测相同的观察结果适用于所有文体学问题(AA, AV, AP, SI)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Novel Sensor Deployment Approach Using Fruit Fly Optimization Algorithm in Wireless Sensor Networks Study on the Coverage of Adaptive Wireless Sensor Network Based on Trust A Security Topology Protocol of Wireless Sensor Networks Based on Community Detection and Energy Aware WAVE: Secure Wireless Pairing Exploiting Human Body Movements Quantitative Trustworthy Evaluation Scheme for Trust Routing Scheme in Wireless Sensor Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1