{"title":"A Visualizable Evidence-Driven Approach for Authorship Attribution","authors":"Steven H. H. Ding, B. Fung, M. Debbabi","doi":"10.1145/2699910","DOIUrl":null,"url":null,"abstract":"The Internet provides an ideal anonymous channel for concealing computer-mediated malicious activities, as the network-based origins of critical electronic textual evidence (e.g., emails, blogs, forum posts, chat logs, etc.) can be easily repudiated. Authorship attribution is the study of identifying the actual author of the given anonymous documents based on the text itself, and for decades, many linguistic stylometry and computational techniques have been extensively studied for this purpose. However, most of the previous research emphasizes promoting the authorship attribution accuracy, and few works have been done for the purpose of constructing and visualizing the evidential traits. In addition, these sophisticated techniques are difficult for cyber investigators or linguistic experts to interpret. In this article, based on the End-to-End Digital Investigation (EEDI) framework, we propose a visualizable evidence-driven approach, namely VEA, which aims at facilitating the work of cyber investigation. Our comprehensive controlled experiment and the stratified experiment on the real-life Enron email dataset demonstrate that our approach can achieve even higher accuracy than traditional methods; meanwhile, its output can be easily visualized and interpreted as evidential traits. In addition to identifying the most plausible author of a given text, our approach also estimates the confidence for the predicted result based on a given identification context and presents visualizable linguistic evidence for each candidate.","PeriodicalId":50912,"journal":{"name":"ACM Transactions on Information and System Security","volume":"73 1","pages":"12:1-12:30"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information and System Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2699910","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 25
Abstract
The Internet provides an ideal anonymous channel for concealing computer-mediated malicious activities, as the network-based origins of critical electronic textual evidence (e.g., emails, blogs, forum posts, chat logs, etc.) can be easily repudiated. Authorship attribution is the study of identifying the actual author of the given anonymous documents based on the text itself, and for decades, many linguistic stylometry and computational techniques have been extensively studied for this purpose. However, most of the previous research emphasizes promoting the authorship attribution accuracy, and few works have been done for the purpose of constructing and visualizing the evidential traits. In addition, these sophisticated techniques are difficult for cyber investigators or linguistic experts to interpret. In this article, based on the End-to-End Digital Investigation (EEDI) framework, we propose a visualizable evidence-driven approach, namely VEA, which aims at facilitating the work of cyber investigation. Our comprehensive controlled experiment and the stratified experiment on the real-life Enron email dataset demonstrate that our approach can achieve even higher accuracy than traditional methods; meanwhile, its output can be easily visualized and interpreted as evidential traits. In addition to identifying the most plausible author of a given text, our approach also estimates the confidence for the predicted result based on a given identification context and presents visualizable linguistic evidence for each candidate.
期刊介绍:
ISSEC is a scholarly, scientific journal that publishes original research papers in all areas of information and system security, including technologies, systems, applications, and policies.