A Visualizable Evidence-Driven Approach for Authorship Attribution

Q Engineering ACM Transactions on Information and System Security Pub Date : 2015-03-27 DOI:10.1145/2699910

Steven H. H. Ding, B. Fung, M. Debbabi

{"title":"A Visualizable Evidence-Driven Approach for Authorship Attribution","authors":"Steven H. H. Ding, B. Fung, M. Debbabi","doi":"10.1145/2699910","DOIUrl":null,"url":null,"abstract":"The Internet provides an ideal anonymous channel for concealing computer-mediated malicious activities, as the network-based origins of critical electronic textual evidence (e.g., emails, blogs, forum posts, chat logs, etc.) can be easily repudiated. Authorship attribution is the study of identifying the actual author of the given anonymous documents based on the text itself, and for decades, many linguistic stylometry and computational techniques have been extensively studied for this purpose. However, most of the previous research emphasizes promoting the authorship attribution accuracy, and few works have been done for the purpose of constructing and visualizing the evidential traits. In addition, these sophisticated techniques are difficult for cyber investigators or linguistic experts to interpret. In this article, based on the End-to-End Digital Investigation (EEDI) framework, we propose a visualizable evidence-driven approach, namely VEA, which aims at facilitating the work of cyber investigation. Our comprehensive controlled experiment and the stratified experiment on the real-life Enron email dataset demonstrate that our approach can achieve even higher accuracy than traditional methods; meanwhile, its output can be easily visualized and interpreted as evidential traits. In addition to identifying the most plausible author of a given text, our approach also estimates the confidence for the predicted result based on a given identification context and presents visualizable linguistic evidence for each candidate.","PeriodicalId":50912,"journal":{"name":"ACM Transactions on Information and System Security","volume":"73 1","pages":"12:1-12:30"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information and System Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2699910","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q","JCRName":"Engineering","Score":null,"Total":0}

引用次数: 25

Abstract

The Internet provides an ideal anonymous channel for concealing computer-mediated malicious activities, as the network-based origins of critical electronic textual evidence (e.g., emails, blogs, forum posts, chat logs, etc.) can be easily repudiated. Authorship attribution is the study of identifying the actual author of the given anonymous documents based on the text itself, and for decades, many linguistic stylometry and computational techniques have been extensively studied for this purpose. However, most of the previous research emphasizes promoting the authorship attribution accuracy, and few works have been done for the purpose of constructing and visualizing the evidential traits. In addition, these sophisticated techniques are difficult for cyber investigators or linguistic experts to interpret. In this article, based on the End-to-End Digital Investigation (EEDI) framework, we propose a visualizable evidence-driven approach, namely VEA, which aims at facilitating the work of cyber investigation. Our comprehensive controlled experiment and the stratified experiment on the real-life Enron email dataset demonstrate that our approach can achieve even higher accuracy than traditional methods; meanwhile, its output can be easily visualized and interpreted as evidential traits. In addition to identifying the most plausible author of a given text, our approach also estimates the confidence for the predicted result based on a given identification context and presents visualizable linguistic evidence for each candidate.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

作者归属的可视化证据驱动方法

互联网为隐藏计算机媒介的恶意活动提供了一个理想的匿名渠道，因为基于网络的关键电子文本证据(例如，电子邮件、博客、论坛帖子、聊天记录等)可以很容易地被否定。作者归属是基于文本本身确定给定匿名文档的实际作者的研究，几十年来，许多语言文体学和计算技术已经为此目的进行了广泛的研究。然而，以往的研究大多侧重于提高作者归属的准确性，很少有针对证据特征的构建和可视化的研究。此外，这些复杂的技术对网络调查人员或语言专家来说很难解释。在本文中，基于端到端数字调查(EEDI)框架，我们提出了一种可视化的证据驱动方法，即VEA，旨在促进网络调查工作。我们的综合控制实验和真实安然电子邮件数据集的分层实验表明，我们的方法可以达到比传统方法更高的准确性;同时，它的输出可以很容易地可视化和解释为证据特征。除了确定给定文本中最可信的作者之外，我们的方法还根据给定的识别上下文估计预测结果的置信度，并为每个候选人提供可视化的语言证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Information and System Security 工程技术-计算机：信息系统

CiteScore

4.50

自引率

0.00%

发文量

审稿时长

3.3 months

期刊介绍： ISSEC is a scholarly, scientific journal that publishes original research papers in all areas of information and system security, including technologies, systems, applications, and policies.

期刊最新文献

An Efficient User Verification System Using Angle-Based Mouse Movement Biometrics A New Framework for Privacy-Preserving Aggregation of Time-Series Data Behavioral Study of Users When Interacting with Active Honeytokens Model Checking Distributed Mandatory Access Control Policies Randomization-Based Intrusion Detection System for Advanced Metering Infrastructure*