首页 > 最新文献

Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.最新文献

英文 中文
N-gram and N-class models for on line handwriting recognition 在线手写识别的N-gram和N-class模型
Freddy Perraud, C. Viard-Gaudin, E. Morin, P. Lallican
This paper highlights the interest of a language modelin increasing the performances of on-line handwritingrecognition systems. Models based on statisticalapproaches, trained on written corpora, have beeninvestigated. Two kinds of models have been studied: n-grammodels and n-class models. In the latter case, theclasses result either from a syntactic criteria or acontextual criteria. In order to integrate it into smallcapacity systems (mobile device), an n-class model hasbeen designed by combining these criteria. It outperformsbulkier models based on n-gram. Integration into an on-linehandwriting recognition system demonstrates asubstantial performance improvement due to the languagemodel.
本文强调了语言模型在提高在线手写识别系统性能方面的作用。基于统计方法的模型,在书面语料库上训练,已经被调查。研究了两种模型:n-gram模型和n-class模型。在后一种情况下,类要么来自语法标准,要么来自上下文标准。为了将其集成到小容量系统(移动设备)中,结合这些标准设计了一个n级模型。它优于基于n-gram的笨重模型。集成到在线手写识别系统中,由于该语言模型,性能得到了实质性的改善。
{"title":"N-gram and N-class models for on line handwriting recognition","authors":"Freddy Perraud, C. Viard-Gaudin, E. Morin, P. Lallican","doi":"10.1109/ICDAR.2003.1227818","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227818","url":null,"abstract":"This paper highlights the interest of a language modelin increasing the performances of on-line handwritingrecognition systems. Models based on statisticalapproaches, trained on written corpora, have beeninvestigated. Two kinds of models have been studied: n-grammodels and n-class models. In the latter case, theclasses result either from a syntactic criteria or acontextual criteria. In order to integrate it into smallcapacity systems (mobile device), an n-class model hasbeen designed by combining these criteria. It outperformsbulkier models based on n-gram. Integration into an on-linehandwriting recognition system demonstrates asubstantial performance improvement due to the languagemodel.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130460461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A vector approach for automatic interpretation of the French cadastral map 矢量法自动解译法国地籍图
Jean-Marc Viglino, M. Pierrot-Deseilligny
This paper deals with cadastral maps interpretation device. The challenge is to propose a complete reconstruction of the parcel's areas and buildings to use with geographic information systems. It is based on a low level primitives extraction and classification. As this low level may be quite noisy, an interpretation process classifies medium level objects and manages convenient processes to the particular extracted shape. Then, a reconstruction step is used to label the parcels areas and determine the final land partition. We present at first the vectorization strategy in our particular context then we will discuss the different tools used to reach the higher level.
本文研究地籍图解释装置。挑战是提出一个完整的重建包裹的区域和建筑,以使用地理信息系统。它基于低级原语的提取和分类。由于这种低层次可能相当嘈杂,解释过程对中等层次对象进行分类,并管理对特定提取形状的方便处理。然后,重建步骤用于标记地块区域并确定最终的土地分区。我们首先在我们的特定环境中提出矢量化策略,然后我们将讨论用于达到更高级别的不同工具。
{"title":"A vector approach for automatic interpretation of the French cadastral map","authors":"Jean-Marc Viglino, M. Pierrot-Deseilligny","doi":"10.1109/ICDAR.2003.1227678","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227678","url":null,"abstract":"This paper deals with cadastral maps interpretation device. The challenge is to propose a complete reconstruction of the parcel's areas and buildings to use with geographic information systems. It is based on a low level primitives extraction and classification. As this low level may be quite noisy, an interpretation process classifies medium level objects and manages convenient processes to the particular extracted shape. Then, a reconstruction step is used to label the parcels areas and determine the final land partition. We present at first the vectorization strategy in our particular context then we will discuss the different tools used to reach the higher level.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121432170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Combining model-based and discriminative classifiers: application to handwritten character recognition 结合基于模型和判别分类器:在手写体字符识别中的应用
L. Prevost, C. Michel-Sendis, A. Moises, L. Oudot, M. Milgram
Handwriting recognition is such a complex classification problem that it is quite usual now to make co-operate several classification methods at the pre-processing stage or at the classification stage. In this paper, we present an original two stages recognizer. The first stage is a model-based classifier that stores an exhaustive set of character models. The second stage is a discriminative classifier that separates the most ambiguous pairs of classes. This hybrid architecture is based on the idea that the correct class almost systematically belongs to the two more relevant classes found by the first classifier. Experiments on the Unipen database show a 30% improvement on a 62 class recognition problem.
手写识别是一个非常复杂的分类问题,通常在预处理阶段或分类阶段采用多种分类方法进行协作。本文提出了一种新颖的两阶段识别器。第一阶段是基于模型的分类器,它存储一组详尽的字符模型。第二阶段是判别分类器,分离最模糊的类对。这种混合体系结构基于这样一种思想,即正确的类几乎系统地属于第一个分类器发现的两个更相关的类。在Unipen数据库上的实验表明,在62个类的识别问题上提高了30%。
{"title":"Combining model-based and discriminative classifiers: application to handwritten character recognition","authors":"L. Prevost, C. Michel-Sendis, A. Moises, L. Oudot, M. Milgram","doi":"10.1109/ICDAR.2003.1227623","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227623","url":null,"abstract":"Handwriting recognition is such a complex classification problem that it is quite usual now to make co-operate several classification methods at the pre-processing stage or at the classification stage. In this paper, we present an original two stages recognizer. The first stage is a model-based classifier that stores an exhaustive set of character models. The second stage is a discriminative classifier that separates the most ambiguous pairs of classes. This hybrid architecture is based on the idea that the correct class almost systematically belongs to the two more relevant classes found by the first classifier. Experiments on the Unipen database show a 30% improvement on a 62 class recognition problem.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123085334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Fast lexicon-based word recognition in noisy index card images 在噪声索引卡图像中快速基于词典的单词识别
S. Lucas, Gregory Patoulas, A. Downton
This paper describes a complete system for reading type-written lexicon words in noisy images - in this case museum index cards. The system is conceptually simple, and straightforward to implement. It involves three stages of processing. The first stage extracts row-regions from the image, where each row is a hypothesized line of text. The next stage scans an OCR classifier over each row image, creating a character hypothesis graph in the process. This graph is then searched using a priority-queue based algorithm for the best matches with a set of words (lexicon). Performance evaluation on a set of museum archive cards indicates competitive accuracy and also reasonable throughput. The priority queue algorithm is over two hundred times faster than using flat dynamic programming on these graphs.
本文描述了一个完整的系统,用于在嘈杂的图像中阅读打字的词汇-在本例中是博物馆索引卡。该系统在概念上很简单,并且易于实现。它包括三个处理阶段。第一阶段从图像中提取行区域,其中每一行都是假设的文本行。下一阶段扫描每个行图像的OCR分类器,在此过程中创建一个字符假设图。然后使用基于优先级队列的算法搜索此图,以寻找与一组单词(词典)的最佳匹配。对一套博物馆档案卡的性能评估表明具有竞争力的准确性和合理的吞吐量。优先级队列算法比在这些图上使用平面动态规划快200多倍。
{"title":"Fast lexicon-based word recognition in noisy index card images","authors":"S. Lucas, Gregory Patoulas, A. Downton","doi":"10.1109/ICDAR.2003.1227708","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227708","url":null,"abstract":"This paper describes a complete system for reading type-written lexicon words in noisy images - in this case museum index cards. The system is conceptually simple, and straightforward to implement. It involves three stages of processing. The first stage extracts row-regions from the image, where each row is a hypothesized line of text. The next stage scans an OCR classifier over each row image, creating a character hypothesis graph in the process. This graph is then searched using a priority-queue based algorithm for the best matches with a set of words (lexicon). Performance evaluation on a set of museum archive cards indicates competitive accuracy and also reasonable throughput. The priority queue algorithm is over two hundred times faster than using flat dynamic programming on these graphs.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"94 17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126056467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Detection of matrices and segmentation of matrix elements in scanned images of scientific documents 科学文献扫描图像中矩阵的检测与矩阵元素的分割
T. Kanahori, M. Suzuki
We proposed a method for recognizing matrices which contain abbreviation symbols, and a format for representing the structure of matrices, and reported experimental results in our paper (2002). The method consisted of 4 processes: detection of matrices, segmentation of elements, construction of networks and analysis of the matrix structure. In the paper, our work is described with a focus on the construction of networks and the analysis of the matrix structure. However, we concluded that improvements in the other two processes were very important for obtaining a high accuracy rate for recognition. In this paper, we describe the two improved processes, the detection of matrices and the segmentation of elements, and we report the experimental results.
我们提出了一种识别包含缩写符号的矩阵的方法,以及一种表示矩阵结构的格式,并在我们的论文(2002)中报告了实验结果。该方法包括4个步骤:矩阵检测、元素分割、网络构建和矩阵结构分析。在本文中,我们的工作重点是网络的构建和矩阵结构的分析。然而,我们得出结论,其他两个过程的改进对于获得较高的识别准确率非常重要。本文描述了矩阵检测和元素分割这两个改进过程,并给出了实验结果。
{"title":"Detection of matrices and segmentation of matrix elements in scanned images of scientific documents","authors":"T. Kanahori, M. Suzuki","doi":"10.1109/ICDAR.2003.1227704","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227704","url":null,"abstract":"We proposed a method for recognizing matrices which contain abbreviation symbols, and a format for representing the structure of matrices, and reported experimental results in our paper (2002). The method consisted of 4 processes: detection of matrices, segmentation of elements, construction of networks and analysis of the matrix structure. In the paper, our work is described with a focus on the construction of networks and the analysis of the matrix structure. However, we concluded that improvements in the other two processes were very important for obtaining a high accuracy rate for recognition. In this paper, we describe the two improved processes, the detection of matrices and the segmentation of elements, and we report the experimental results.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121118519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Postal envelope segmentation by 2-D histogram clustering through watershed transform 基于分水岭变换的二维直方图聚类邮政信封分割
Eduardo Akira Yonekura, J. Facon
In this paper we present a new postal envelope segmentation method based on 2-D histogram clustering and watershed transform. Segmentation task consists in detecting the modes associated with homogeneous regions in envelope images such as handwritten address block, postmarks, stamps and background. The homogeneous modes in 2-D histogram are segmented through the morphological watershed transform. Our approach is applied to complex Brazilian postal envelopes. Very little a priori knowledge of the envelope images is required. The advantages of this approach will be described and illustrated with tests carried out on 300 different images where there are no fixed position for the handwritten address block, postmarks and stamps.
本文提出了一种基于二维直方图聚类和分水岭变换的邮政信封分割方法。分割任务包括检测信封图像中手写地址块、邮戳、邮票和背景等同质区域的相关模式。通过形态学分水岭变换对二维直方图中的均匀模式进行分割。我们的方法适用于复杂的巴西邮政信封。很少先验知识的包络图像是必需的。本文将描述和说明这种方法的优点,并对300张不同的图像进行测试,其中手写地址块、邮戳和邮票没有固定的位置。
{"title":"Postal envelope segmentation by 2-D histogram clustering through watershed transform","authors":"Eduardo Akira Yonekura, J. Facon","doi":"10.1109/ICDAR.2003.1227685","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227685","url":null,"abstract":"In this paper we present a new postal envelope segmentation method based on 2-D histogram clustering and watershed transform. Segmentation task consists in detecting the modes associated with homogeneous regions in envelope images such as handwritten address block, postmarks, stamps and background. The homogeneous modes in 2-D histogram are segmented through the morphological watershed transform. Our approach is applied to complex Brazilian postal envelopes. Very little a priori knowledge of the envelope images is required. The advantages of this approach will be described and illustrated with tests carried out on 300 different images where there are no fixed position for the handwritten address block, postmarks and stamps.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131807875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
String extraction from color airline coupon image using statistical approach 基于统计方法的彩色航空优惠券图像字符串提取
Yi Li, Zhiyan Wang, Haizan Zeng
A novel technique is presented in this paper to extract strings in color images of both business settlement plan (BSP) and non-BSP airline coupon. The essential concept is to remove non-text pixels from complex coupon images, rather than extract strings directly. First we transfer color images from RGB to HSV space, which is approximate uniformed, and then remove the black component of images using the property of HSV space. A statistical approach called principal components analysis (PCA) is applied to extract strings by removing the background decorative pattern based on priori environment. Finally, a method to validate and improve performance is present.
本文提出了一种从商业结算计划(BSP)和非BSP航空优惠券彩色图像中提取字符串的新方法。其基本概念是从复杂的优惠券图像中去除非文本像素,而不是直接提取字符串。首先将彩色图像从RGB转移到近似均匀的HSV空间,然后利用HSV空间的特性去除图像中的黑色成分。采用主成分分析(PCA)的统计方法,根据先验环境去除背景装饰图案,提取字符串。最后,提出了一种验证和改进性能的方法。
{"title":"String extraction from color airline coupon image using statistical approach","authors":"Yi Li, Zhiyan Wang, Haizan Zeng","doi":"10.1109/ICDAR.2003.1227675","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227675","url":null,"abstract":"A novel technique is presented in this paper to extract strings in color images of both business settlement plan (BSP) and non-BSP airline coupon. The essential concept is to remove non-text pixels from complex coupon images, rather than extract strings directly. First we transfer color images from RGB to HSV space, which is approximate uniformed, and then remove the black component of images using the property of HSV space. A statistical approach called principal components analysis (PCA) is applied to extract strings by removing the background decorative pattern based on priori environment. Finally, a method to validate and improve performance is present.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114296911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Using tree-grammars for training set expansion in page classi .cation 在页面分类分类中使用树状语法进行训练集扩展
Stefano Baldi, S. Marinai, G. Soda
In this paper we describe a method for the expansionof training sets made by XY trees representing page layout.This approach is appropriate when dealing with page classificationbased on MXY tree page representations. The basicidea is the use of tree grammars to model the variationsin the tree which are caused by segmentation algorithms.A set of general grammatical rules are defined and used toexpand the training set. Pages are classified with a k - nnapproach where the distance between pages is computed bymeans of tree-edit distance.
本文描述了一种由表示页面布局的XY树生成的训练集的扩展方法。在处理基于MXY树页面表示的页面分类时,这种方法是合适的。其基本思想是使用树形语法来模拟由分割算法引起的树的变化。定义了一组通用语法规则,并用于扩展训练集。用k - nn方法对页面进行分类,其中页面之间的距离由树编辑距离计算。
{"title":"Using tree-grammars for training set expansion in page classi .cation","authors":"Stefano Baldi, S. Marinai, G. Soda","doi":"10.1109/ICDAR.2003.1227778","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227778","url":null,"abstract":"In this paper we describe a method for the expansionof training sets made by XY trees representing page layout.This approach is appropriate when dealing with page classificationbased on MXY tree page representations. The basicidea is the use of tree grammars to model the variationsin the tree which are caused by segmentation algorithms.A set of general grammatical rules are defined and used toexpand the training set. Pages are classified with a k - nnapproach where the distance between pages is computed bymeans of tree-edit distance.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114690657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
A multiclass classification method based on multiple pairwise classifiers 一种基于多个成对分类器的多类分类方法
Tomoyuki Hamamura, H. Mizutani, Bunpei Irie
In this paper, a new method of composing a multi-classclassifier using pairwise classifiers is proposed. A"Resemblance Model" is exploited to calculate aposteriori probability for combining pairwise classifiers.We proved the validity of this model by usingapproximation of a posteriori probability formula. Usingthis theory, we can obtain the optimal decision. Anexperimental result of handwritten numeral recognition ispresented, supporting the effectiveness of our method.
提出了一种利用两两分类器组成多分类器的新方法。利用“相似性模型”计算成对分类器组合的后验概率。我们利用后验概率公式的近似证明了该模型的有效性。利用这一理论,我们可以得到最优决策。最后给出了手写体数字识别的实验结果,验证了该方法的有效性。
{"title":"A multiclass classification method based on multiple pairwise classifiers","authors":"Tomoyuki Hamamura, H. Mizutani, Bunpei Irie","doi":"10.1109/ICDAR.2003.1227774","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227774","url":null,"abstract":"In this paper, a new method of composing a multi-classclassifier using pairwise classifiers is proposed. A\"Resemblance Model\" is exploited to calculate aposteriori probability for combining pairwise classifiers.We proved the validity of this model by usingapproximation of a posteriori probability formula. Usingthis theory, we can obtain the optimal decision. Anexperimental result of handwritten numeral recognition ispresented, supporting the effectiveness of our method.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114852357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Stippling data on backgrounds of pages-toward seamless integration of paper and electronic documents 在页面背景上点画数据——实现纸质和电子文档的无缝集成
K. Kise, Yasuo Miki, Keinosuke Matsumoto
In order to realize seamless integration of paper andelectronic documents, it is at least necessary to assure errorfree conversion from one to the other. In general, theconversion from paper to electronic documents is the taskof document image understanding. Although its researchhas made remarkable progress, it is still a hard task withoutlimiting the type of documents. This paper presents acompletely different approach to this task on condition thatprinted documents have their originals in electronic form.The proposed method employs fine dots to represent dataof electronic documents and places the dots on white space(backgrounds) of pages. Since the data is encoded with anerror correcting code, it is guaranteed to be correctly recoveredfrom the scanned images of documents. Experimentalresults show that a page with normal foreground objects(characters and other things) can contain more than 4KB ofdata, even when errors up to 20% of the data are permitted.
为了实现纸质和电子文档的无缝集成,至少有必要确保两者之间的无错误转换。一般来说,从纸质文档到电子文档的转换是文档图像理解的任务。虽然其研究取得了显著的进展,但在不限制文献类型的情况下,仍然是一项艰巨的任务。本文提出了一种完全不同的方法来完成这一任务,条件是打印文档的原件以电子形式存在。该方法采用细点表示电子文档的数据,并将其放置在页面的空白(背景)上。由于数据是用纠错码编码的,因此可以保证从文档的扫描图像中正确恢复。实验结果表明,一个具有正常前景对象(字符和其他东西)的页面可以包含超过4KB的数据,即使允许高达20%的数据错误。
{"title":"Stippling data on backgrounds of pages-toward seamless integration of paper and electronic documents","authors":"K. Kise, Yasuo Miki, Keinosuke Matsumoto","doi":"10.1109/ICDAR.2003.1227850","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227850","url":null,"abstract":"In order to realize seamless integration of paper andelectronic documents, it is at least necessary to assure errorfree conversion from one to the other. In general, theconversion from paper to electronic documents is the taskof document image understanding. Although its researchhas made remarkable progress, it is still a hard task withoutlimiting the type of documents. This paper presents acompletely different approach to this task on condition thatprinted documents have their originals in electronic form.The proposed method employs fine dots to represent dataof electronic documents and places the dots on white space(backgrounds) of pages. Since the data is encoded with anerror correcting code, it is guaranteed to be correctly recoveredfrom the scanned images of documents. Experimentalresults show that a page with normal foreground objects(characters and other things) can contain more than 4KB ofdata, even when errors up to 20% of the data are permitted.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116721495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1