A Visualizable Evidence-Driven Approach for Authorship Attribution

Steven H. H. Ding, B. Fung, M. Debbabi
{"title":"A Visualizable Evidence-Driven Approach for Authorship Attribution","authors":"Steven H. H. Ding, B. Fung, M. Debbabi","doi":"10.1145/2699910","DOIUrl":null,"url":null,"abstract":"The Internet provides an ideal anonymous channel for concealing computer-mediated malicious activities, as the network-based origins of critical electronic textual evidence (e.g., emails, blogs, forum posts, chat logs, etc.) can be easily repudiated. Authorship attribution is the study of identifying the actual author of the given anonymous documents based on the text itself, and for decades, many linguistic stylometry and computational techniques have been extensively studied for this purpose. However, most of the previous research emphasizes promoting the authorship attribution accuracy, and few works have been done for the purpose of constructing and visualizing the evidential traits. In addition, these sophisticated techniques are difficult for cyber investigators or linguistic experts to interpret. In this article, based on the End-to-End Digital Investigation (EEDI) framework, we propose a visualizable evidence-driven approach, namely VEA, which aims at facilitating the work of cyber investigation. Our comprehensive controlled experiment and the stratified experiment on the real-life Enron email dataset demonstrate that our approach can achieve even higher accuracy than traditional methods; meanwhile, its output can be easily visualized and interpreted as evidential traits. In addition to identifying the most plausible author of a given text, our approach also estimates the confidence for the predicted result based on a given identification context and presents visualizable linguistic evidence for each candidate.","PeriodicalId":50912,"journal":{"name":"ACM Transactions on Information and System Security","volume":"73 1","pages":"12:1-12:30"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information and System Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2699910","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 25

Abstract

The Internet provides an ideal anonymous channel for concealing computer-mediated malicious activities, as the network-based origins of critical electronic textual evidence (e.g., emails, blogs, forum posts, chat logs, etc.) can be easily repudiated. Authorship attribution is the study of identifying the actual author of the given anonymous documents based on the text itself, and for decades, many linguistic stylometry and computational techniques have been extensively studied for this purpose. However, most of the previous research emphasizes promoting the authorship attribution accuracy, and few works have been done for the purpose of constructing and visualizing the evidential traits. In addition, these sophisticated techniques are difficult for cyber investigators or linguistic experts to interpret. In this article, based on the End-to-End Digital Investigation (EEDI) framework, we propose a visualizable evidence-driven approach, namely VEA, which aims at facilitating the work of cyber investigation. Our comprehensive controlled experiment and the stratified experiment on the real-life Enron email dataset demonstrate that our approach can achieve even higher accuracy than traditional methods; meanwhile, its output can be easily visualized and interpreted as evidential traits. In addition to identifying the most plausible author of a given text, our approach also estimates the confidence for the predicted result based on a given identification context and presents visualizable linguistic evidence for each candidate.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
作者归属的可视化证据驱动方法
互联网为隐藏计算机媒介的恶意活动提供了一个理想的匿名渠道,因为基于网络的关键电子文本证据(例如,电子邮件、博客、论坛帖子、聊天记录等)可以很容易地被否定。作者归属是基于文本本身确定给定匿名文档的实际作者的研究,几十年来,许多语言文体学和计算技术已经为此目的进行了广泛的研究。然而,以往的研究大多侧重于提高作者归属的准确性,很少有针对证据特征的构建和可视化的研究。此外,这些复杂的技术对网络调查人员或语言专家来说很难解释。在本文中,基于端到端数字调查(EEDI)框架,我们提出了一种可视化的证据驱动方法,即VEA,旨在促进网络调查工作。我们的综合控制实验和真实安然电子邮件数据集的分层实验表明,我们的方法可以达到比传统方法更高的准确性;同时,它的输出可以很容易地可视化和解释为证据特征。除了确定给定文本中最可信的作者之外,我们的方法还根据给定的识别上下文估计预测结果的置信度,并为每个候选人提供可视化的语言证据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACM Transactions on Information and System Security
ACM Transactions on Information and System Security 工程技术-计算机:信息系统
CiteScore
4.50
自引率
0.00%
发文量
0
审稿时长
3.3 months
期刊介绍: ISSEC is a scholarly, scientific journal that publishes original research papers in all areas of information and system security, including technologies, systems, applications, and policies.
期刊最新文献
An Efficient User Verification System Using Angle-Based Mouse Movement Biometrics A New Framework for Privacy-Preserving Aggregation of Time-Series Data Behavioral Study of Users When Interacting with Active Honeytokens Model Checking Distributed Mandatory Access Control Policies Randomization-Based Intrusion Detection System for Advanced Metering Infrastructure*
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1