Evaluation of the Stability of Four Document Segmentation Algorithms

Sébastien Eskenazi, Petra Gomez-Krämer, J. Ogier
{"title":"Evaluation of the Stability of Four Document Segmentation Algorithms","authors":"Sébastien Eskenazi, Petra Gomez-Krämer, J. Ogier","doi":"10.1109/DAS.2016.25","DOIUrl":null,"url":null,"abstract":"The importance of having stable information extraction algorithms for security related applications and more generally for industrial use cases has been recently highlighted. Stability is what makes an algorithm reliable as it gives a guarantee that the results will be reproducible on similar data. Without it, security criteria such as the probability of false positives cannot be quantified. As a consequence, no security application can be built from an unstable algorithm. In a document verification framework, the probability of false positives indicates the probability that two different results are given for two copies of the same document. This paper builds on our previous work about a stable layout descriptor to study the stability of four segmentation algorithms. We consider that a segmentation algorithm is stable if it produces the same layout for all copies of the same document. The algorithms studied are two versions of PAL, Voronoi, and JSEG. We compare the stability of the different algorithms and study the factors influencing their stability.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2016.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The importance of having stable information extraction algorithms for security related applications and more generally for industrial use cases has been recently highlighted. Stability is what makes an algorithm reliable as it gives a guarantee that the results will be reproducible on similar data. Without it, security criteria such as the probability of false positives cannot be quantified. As a consequence, no security application can be built from an unstable algorithm. In a document verification framework, the probability of false positives indicates the probability that two different results are given for two copies of the same document. This paper builds on our previous work about a stable layout descriptor to study the stability of four segmentation algorithms. We consider that a segmentation algorithm is stable if it produces the same layout for all copies of the same document. The algorithms studied are two versions of PAL, Voronoi, and JSEG. We compare the stability of the different algorithms and study the factors influencing their stability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
四种文档分割算法的稳定性评价
拥有稳定的信息提取算法对于安全相关的应用程序以及更普遍的工业用例的重要性最近得到了强调。稳定性是使算法可靠的原因,因为它保证了结果在类似的数据上是可重复的。没有它,诸如误报概率之类的安全标准就无法量化。因此,任何安全应用程序都不能从不稳定的算法中构建。在文档验证框架中,误报概率表示对同一文档的两个副本给出两个不同结果的概率。本文在前人关于稳定布局描述符的研究基础上,研究了四种分割算法的稳定性。如果分割算法对同一文档的所有副本产生相同的布局,我们认为分割算法是稳定的。所研究的算法是两个版本的PAL, Voronoi和JSEG。比较了不同算法的稳定性,研究了影响算法稳定性的因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Handwritten and Machine-Printed Text Discrimination Using a Template Matching Approach General Pattern Run-Length Transform for Writer Identification Automatic Selection of Parameters for Document Image Enhancement Using Image Quality Assessment Large Scale Continuous Dating of Medieval Scribes Using a Combined Image and Language Model Performance of an Off-Line Signature Verification Method Based on Texture Features on a Large Indic-Script Signature Dataset
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1