Likelihood ratios for categorical count data with applications in digital forensics

Rachel Longjohn, Padhraic Smyth, Hal S Stern
{"title":"Likelihood ratios for categorical count data with applications in digital forensics","authors":"Rachel Longjohn, Padhraic Smyth, Hal S Stern","doi":"10.1093/lpr/mgac016","DOIUrl":null,"url":null,"abstract":"We consider the forensic context in which the goal is to assess whether two sets of observed data came from the same source or from different sources. In particular, we focus on the situation in which the evidence consists of two sets of categorical count data: a set of event counts from an unknown source tied to a crime and a set of event counts generated by a known source. Using a same-source versus different-source hypothesis framework, we develop an approach to calculating a likelihood ratio. Under our proposed model, the likelihood ratio can be calculated in closed form, and we use this to theoretically analyse how the likelihood ratio is affected by how much data is observed, the number of event types being considered, and the prior used in the Bayesian model. Our work is motivated in particular by user-generated event data in digital forensics, a context in which relatively few statistical methodologies have yet been developed to support quantitative analysis of event data after it is extracted from a device. We evaluate our proposed method through experiments using three real-world event datasets, representing a variety of event types that may arise in digital forensics. The results of the theoretical analyses and experiments with real-world datasets demonstrate that while this model is a useful starting point for the statistical forensic analysis of user-generated event data, more work is needed before it can be applied for practical use.","PeriodicalId":501426,"journal":{"name":"Law, Probability and Risk","volume":"14 6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Law, Probability and Risk","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/lpr/mgac016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We consider the forensic context in which the goal is to assess whether two sets of observed data came from the same source or from different sources. In particular, we focus on the situation in which the evidence consists of two sets of categorical count data: a set of event counts from an unknown source tied to a crime and a set of event counts generated by a known source. Using a same-source versus different-source hypothesis framework, we develop an approach to calculating a likelihood ratio. Under our proposed model, the likelihood ratio can be calculated in closed form, and we use this to theoretically analyse how the likelihood ratio is affected by how much data is observed, the number of event types being considered, and the prior used in the Bayesian model. Our work is motivated in particular by user-generated event data in digital forensics, a context in which relatively few statistical methodologies have yet been developed to support quantitative analysis of event data after it is extracted from a device. We evaluate our proposed method through experiments using three real-world event datasets, representing a variety of event types that may arise in digital forensics. The results of the theoretical analyses and experiments with real-world datasets demonstrate that while this model is a useful starting point for the statistical forensic analysis of user-generated event data, more work is needed before it can be applied for practical use.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
分类计数数据的似然比在数字取证中的应用
我们考虑的法医背景下,其目标是评估是否两组观测数据来自同一来源或来自不同的来源。特别地,我们关注证据由两组分类计数数据组成的情况:一组来自与犯罪相关的未知来源的事件计数和一组由已知来源生成的事件计数。使用同源与不同源假设框架,我们开发了一种计算似然比的方法。在我们提出的模型下,似然比可以以封闭形式计算,我们用它来从理论上分析似然比如何受到观察到的数据量、考虑的事件类型的数量以及贝叶斯模型中使用的先验的影响。我们的工作主要受到数字取证中用户生成的事件数据的推动,在这种情况下,相对较少的统计方法尚未开发出来,以支持从设备中提取事件数据后的定量分析。我们通过使用三个真实世界事件数据集的实验来评估我们提出的方法,这些数据集代表了数字取证中可能出现的各种事件类型。理论分析和实际数据集的实验结果表明,虽然该模型是用户生成事件数据的统计取证分析的有用起点,但在将其应用于实际使用之前,还需要做更多的工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
How the work being done on statistical fingerprint models provides the basis for a much broader and greater impact affecting many areas within the criminal justice system Misuse of statistical method results in highly biased interpretation of forensic evidence in Likelihood ratios for categorical count data with applications in digital forensics Likelihood ratio to evaluate handwriting evidence using similarity index Interview with Professor Colin Aitken
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1