面向改进表单处理的错误源分析

U. Bhattacharya, Bikash Shaw, S. K. Parui
{"title":"面向改进表单处理的错误源分析","authors":"U. Bhattacharya, Bikash Shaw, S. K. Parui","doi":"10.1109/ICIT.2006.30","DOIUrl":null,"url":null,"abstract":"Automatic form processing is an important application of document analysis subject. Such a system requires to be trained and tested on a standard database of forms collected from real-life. However, to the best of our knowledge, the only such available databases are NIST Special Databases. These databases consist of images of synthesized form documents. On the other hand, recently we developed a form database, samples of which had been taken from the real-life. ISIFormReader, a form processing system, also developed recently, has been tested using these real-life samples. An intensive study of the processing errors showed that writers' idiosyncracies are one of the major reasons of such errors as analyzed in U. Bhattacharya, et al., (2006). In the present paper, we investigated various other sources of errors which together cause a major concern. These include sample forms which are low in contrast, noisy, smudgy, skewed, scaled disturbing its aspect ratio and so on. An analysis of errors due to similar such sources is important towards development of an improved form processing system.","PeriodicalId":161120,"journal":{"name":"9th International Conference on Information Technology (ICIT'06)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analysis of Error Sources Towards Improved Form Processing\",\"authors\":\"U. Bhattacharya, Bikash Shaw, S. K. Parui\",\"doi\":\"10.1109/ICIT.2006.30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic form processing is an important application of document analysis subject. Such a system requires to be trained and tested on a standard database of forms collected from real-life. However, to the best of our knowledge, the only such available databases are NIST Special Databases. These databases consist of images of synthesized form documents. On the other hand, recently we developed a form database, samples of which had been taken from the real-life. ISIFormReader, a form processing system, also developed recently, has been tested using these real-life samples. An intensive study of the processing errors showed that writers' idiosyncracies are one of the major reasons of such errors as analyzed in U. Bhattacharya, et al., (2006). In the present paper, we investigated various other sources of errors which together cause a major concern. These include sample forms which are low in contrast, noisy, smudgy, skewed, scaled disturbing its aspect ratio and so on. An analysis of errors due to similar such sources is important towards development of an improved form processing system.\",\"PeriodicalId\":161120,\"journal\":{\"name\":\"9th International Conference on Information Technology (ICIT'06)\",\"volume\":\"101 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"9th International Conference on Information Technology (ICIT'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIT.2006.30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"9th International Conference on Information Technology (ICIT'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIT.2006.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

自动表单处理是文档分析学科的一个重要应用。这样的系统需要在一个从现实生活中收集的表格的标准数据库上进行训练和测试。然而,据我们所知,唯一可用的数据库是NIST专用数据库。这些数据库由合成表单文档的图像组成。另一方面,最近我们开发了一个表单数据库,其中的样本来自现实生活。ISIFormReader,一个最近开发的表单处理系统,已经用这些真实的样本进行了测试。对加工错误的深入研究表明,作家的特质是产生这种错误的主要原因之一,U. Bhattacharya等人(2006)分析了这一点。在本文中,我们调查了引起主要关注的各种其他错误来源。这些包括低对比度,嘈杂,污浊,倾斜,缩放干扰其长宽比的样本形式等等。分析由类似来源引起的错误对于改进表单处理系统的开发非常重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Analysis of Error Sources Towards Improved Form Processing
Automatic form processing is an important application of document analysis subject. Such a system requires to be trained and tested on a standard database of forms collected from real-life. However, to the best of our knowledge, the only such available databases are NIST Special Databases. These databases consist of images of synthesized form documents. On the other hand, recently we developed a form database, samples of which had been taken from the real-life. ISIFormReader, a form processing system, also developed recently, has been tested using these real-life samples. An intensive study of the processing errors showed that writers' idiosyncracies are one of the major reasons of such errors as analyzed in U. Bhattacharya, et al., (2006). In the present paper, we investigated various other sources of errors which together cause a major concern. These include sample forms which are low in contrast, noisy, smudgy, skewed, scaled disturbing its aspect ratio and so on. An analysis of errors due to similar such sources is important towards development of an improved form processing system.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Design of Novel Reversible Carry Look-Ahead BCD Subtractor Design of a Framework for Handling Security Issues in Grids A Programmable Parallel Structure to perform Galois Field Exponentiation Voice Conversion by Prosody and Vocal Tract Modification Use of Instance Typicality for Efficient Detection of Outliers with Neural Network Classifiers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1