Text Reuse Detection in Handwritten Documents

Pub Date : 2024-03-11 DOI:10.1134/S106456242370120X
A. V. Grabovoy, M. S. Kaprielova, A. S. Kildyakov, I. O. Potyashin, T. B. Seyil, E. L. Finogeev, Yu. V. Chekhovich
{"title":"Text Reuse Detection in Handwritten Documents","authors":"A. V. Grabovoy,&nbsp;M. S. Kaprielova,&nbsp;A. S. Kildyakov,&nbsp;I. O. Potyashin,&nbsp;T. B. Seyil,&nbsp;E. L. Finogeev,&nbsp;Yu. V. Chekhovich","doi":"10.1134/S106456242370120X","DOIUrl":null,"url":null,"abstract":"<p>Plagiarism detection in scholar assignments becomes more and more relevant nowadays. Rapidly growing popularity of online education, active expansion of online educational platforms for secondary and high school education create demand for development of an automatic reuse detection system for handwritten assignments. The existing approaches to this problem are not usable for searching for potential sources of reuse on large collections, which significantly limits their applicability. Moreover, real-life data are likely to be low-quality photographs taken with mobile devices. We propose an approach that allows detecting text reuse in handwritten documents. Each document is a picture and the search is performed on a large collection of potential sources. The proposed method consists of three stages: handwritten text recognition, candidate search and precise source retrieval. We represent experimental results for the quality and latency estimation of our system. The recall reaches 83.3% in case of better quality pictures and 77.4% in case of pictures of lower quality. The average search time is 3.2 s per document on CPU. The results show that the created system is scalable and can be used in production, where fast reuse detection for hundreds of thousands of scholar assignments on large collection of potential reuse sources is needed. All the experiments were held on HWR200 public dataset.</p>","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1134/S106456242370120X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Plagiarism detection in scholar assignments becomes more and more relevant nowadays. Rapidly growing popularity of online education, active expansion of online educational platforms for secondary and high school education create demand for development of an automatic reuse detection system for handwritten assignments. The existing approaches to this problem are not usable for searching for potential sources of reuse on large collections, which significantly limits their applicability. Moreover, real-life data are likely to be low-quality photographs taken with mobile devices. We propose an approach that allows detecting text reuse in handwritten documents. Each document is a picture and the search is performed on a large collection of potential sources. The proposed method consists of three stages: handwritten text recognition, candidate search and precise source retrieval. We represent experimental results for the quality and latency estimation of our system. The recall reaches 83.3% in case of better quality pictures and 77.4% in case of pictures of lower quality. The average search time is 3.2 s per document on CPU. The results show that the created system is scalable and can be used in production, where fast reuse detection for hundreds of thousands of scholar assignments on large collection of potential reuse sources is needed. All the experiments were held on HWR200 public dataset.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
手写文件中的文本重复使用检测
摘要如今,学者作业中的剽窃检测变得越来越重要。在线教育的迅速普及、在线教育平台在中学和高中教育中的积极扩展,都要求开发一种手写作业重复使用自动检测系统。解决这一问题的现有方法无法用于搜索大量收集的潜在重复使用源,这大大限制了其适用性。此外,现实生活中的数据很可能是用移动设备拍摄的低质量照片。我们提出了一种可以检测手写文档中文本重复使用的方法。每份文档都是一张图片,搜索在大量潜在来源中进行。所提出的方法包括三个阶段:手写文本识别、候选搜索和精确来源检索。我们的系统在质量和延迟估计方面取得了实验结果。在图片质量较高的情况下,召回率达到 83.3%,在图片质量较低的情况下,召回率达到 77.4%。每个文档在 CPU 上的平均搜索时间为 3.2 秒。结果表明,所创建的系统具有可扩展性,可用于需要在大量潜在重复使用源上快速检测数十万份学者作业的生产中。所有实验都是在 HWR200 公共数据集上进行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1