Handwriting identification using random forests and score‐based likelihood ratios

M. Q. Johnson, Danica M. Ommen
{"title":"Handwriting identification using random forests and score‐based likelihood ratios","authors":"M. Q. Johnson, Danica M. Ommen","doi":"10.1002/sam.11566","DOIUrl":null,"url":null,"abstract":"Handwriting analysis is conducted by forensic document examiners who are able to visually recognize characteristics of writing to evaluate the evidence of writership. Recently, there have been incentives to investigate how to quantify the similarity between two written documents to support the conclusions drawn by experts. We use an automatic algorithm within the “handwriter” package in R, to decompose a handwritten sample into small graphical units of writing. These graphs are sorted into 40 exemplar groups or clusters. We hypothesize that the frequency with which a person contributes graphs to each cluster is characteristic of their handwriting. Given two questioned handwritten documents, we can then use the vectors of cluster frequencies to quantify the similarity between the two documents. We extract features from the difference between the vectors and combine them using a random forest. The output from the random forest is used as the similarity score to compare documents. We estimate the distributions of the similarity scores computed from multiple pairs of documents known to have been written by the same and by different persons, and use these estimated densities to obtain score‐based likelihood ratios (SLRs) that rely on different assumptions. We find that the SLRs are able to indicate whether the similarity observed between two documents is more or less likely depending on writership.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11566","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Handwriting analysis is conducted by forensic document examiners who are able to visually recognize characteristics of writing to evaluate the evidence of writership. Recently, there have been incentives to investigate how to quantify the similarity between two written documents to support the conclusions drawn by experts. We use an automatic algorithm within the “handwriter” package in R, to decompose a handwritten sample into small graphical units of writing. These graphs are sorted into 40 exemplar groups or clusters. We hypothesize that the frequency with which a person contributes graphs to each cluster is characteristic of their handwriting. Given two questioned handwritten documents, we can then use the vectors of cluster frequencies to quantify the similarity between the two documents. We extract features from the difference between the vectors and combine them using a random forest. The output from the random forest is used as the similarity score to compare documents. We estimate the distributions of the similarity scores computed from multiple pairs of documents known to have been written by the same and by different persons, and use these estimated densities to obtain score‐based likelihood ratios (SLRs) that rely on different assumptions. We find that the SLRs are able to indicate whether the similarity observed between two documents is more or less likely depending on writership.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用随机森林和基于分数的似然比的笔迹识别
笔迹分析是由法医文件审查员进行的,他们能够从视觉上识别笔迹的特征,以评估笔迹的证据。最近,人们开始研究如何量化两份书面文件之间的相似性,以支持专家得出的结论。我们使用R中的“handwriter”包中的自动算法,将手写样本分解为小的图形书写单元。这些图表被分为40个范例组或集群。我们假设,一个人在每个集群中贡献图形的频率是他们笔迹的特征。给定两个被质疑的手写文档,然后我们可以使用聚类频率向量来量化两个文档之间的相似性。我们从向量之间的差异中提取特征,并使用随机森林将它们组合起来。随机森林的输出用作比较文档的相似度评分。我们估计了已知由同一人和不同人撰写的多对文档计算出的相似分数的分布,并使用这些估计密度来获得依赖于不同假设的基于分数的似然比(slr)。我们发现单反能够表明两个文档之间观察到的相似性是否或多或少取决于写作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Neural interval‐censored survival regression with feature selection Bayesian batch optimization for molybdenum versus tungsten inertial confinement fusion double shell target design Gaussian process selections in semiparametric multi‐kernel machine regression for multi‐pathway analysis An automated alignment algorithm for identification of the source of footwear impressions with common class characteristics Confidence bounds for threshold similarity graph in random variable network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1