Pub Date : 1997-08-18DOI: 10.1109/ICDAR.1997.620626
Laurent Robert, Laurence Likforman-Sulem, É. Lecolinet
Presents the first achievements of HERS (Hypermedia Edit and Read Station), which is devoted to the browsing and editing of hypermedia documents from literary material including document images. Concerning the editing, our purpose as twofold. First, capabilities are offered to transcribe manuscripts. Transcribing the text consists of coupling lines typed on the keyboard with their corresponding text lines in the manuscript images. A collaborative system, based on computer-human interaction and document analysis, is proposed for performing this task. Second, interactive tools are offered to organize the electronic document and establish hypermedia links between its different components (image areas, transcribed words or lines, or other kinds of heterogeneous data). Concerning the browsing, we developed an approach based on information visualization in order to provide users with an idea of the overall organization of the hyperdocument and so help them to navigate through it.
{"title":"Image and text coupling for creating electronic books from manuscripts","authors":"Laurent Robert, Laurence Likforman-Sulem, É. Lecolinet","doi":"10.1109/ICDAR.1997.620626","DOIUrl":"https://doi.org/10.1109/ICDAR.1997.620626","url":null,"abstract":"Presents the first achievements of HERS (Hypermedia Edit and Read Station), which is devoted to the browsing and editing of hypermedia documents from literary material including document images. Concerning the editing, our purpose as twofold. First, capabilities are offered to transcribe manuscripts. Transcribing the text consists of coupling lines typed on the keyboard with their corresponding text lines in the manuscript images. A collaborative system, based on computer-human interaction and document analysis, is proposed for performing this task. Second, interactive tools are offered to organize the electronic document and establish hypermedia links between its different components (image areas, transcribed words or lines, or other kinds of heterogeneous data). Concerning the browsing, we developed an approach based on information visualization in order to provide users with an idea of the overall organization of the hyperdocument and so help them to navigate through it.","PeriodicalId":435320,"journal":{"name":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114080858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-18DOI: 10.1109/ICDAR.1997.620599
Konstantin Zuyev
Algorithm for table image segmentation, a part of complete document recognition system is presented. The proposed approach introduces a concept of table grid which can serve for advanced methods of table structure analysis. It provides a layer of terminal symbols for the table, which is used by syntactical methods. Detailed discussion of grid detection is presented which is performed through the analysis of connected components projection profile. Simple rules for analysis of table structure cover majority of real life tables. The system is implemented, rested, and is now extensively used in FineReader OCR product.
{"title":"Table image segmentation","authors":"Konstantin Zuyev","doi":"10.1109/ICDAR.1997.620599","DOIUrl":"https://doi.org/10.1109/ICDAR.1997.620599","url":null,"abstract":"Algorithm for table image segmentation, a part of complete document recognition system is presented. The proposed approach introduces a concept of table grid which can serve for advanced methods of table structure analysis. It provides a layer of terminal symbols for the table, which is used by syntactical methods. Detailed discussion of grid detection is presented which is performed through the analysis of connected components projection profile. Simple rules for analysis of table structure cover majority of real life tables. The system is implemented, rested, and is now extensively used in FineReader OCR product.","PeriodicalId":435320,"journal":{"name":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121034191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-18DOI: 10.1109/ICDAR.1997.620625
M. Köppen, L. Lohmann, B. Nickolay
A new system approach for image understanding, called the Image Consulting Framework, is proposed. It allows for the validation of image properties. The kinds of image properties considered are textual, textural, hierarchical, color and symbolic. Its main application field is information filtering from images used in World Wide Web documents. The Image Consulting Framework consists of four stages: the color separation stage, the information granulation-verification modules (GVMs), the task stage and the recognition stage. At the base of the framework are the GVMs, which are designed to solve very special tasks. They consists of three parts: a method maintainer, a parameter chooser and a tester (verifier). The parameter chooser uses a given set of parameter settings for different runs of the maintained method on the input images of the GVM. The resulting images are tested for the occurrence of the property for which the GVM is designed. All successful images are put into a queue. The task stage calls new GVMs due to the filling of the queue, and it also assigns input images to the GVMs. All fully-treated images are passed to the recognition stage, where the information extraction is performed.
{"title":"An Image Consulting Framework for document analysis of Internet graphics","authors":"M. Köppen, L. Lohmann, B. Nickolay","doi":"10.1109/ICDAR.1997.620625","DOIUrl":"https://doi.org/10.1109/ICDAR.1997.620625","url":null,"abstract":"A new system approach for image understanding, called the Image Consulting Framework, is proposed. It allows for the validation of image properties. The kinds of image properties considered are textual, textural, hierarchical, color and symbolic. Its main application field is information filtering from images used in World Wide Web documents. The Image Consulting Framework consists of four stages: the color separation stage, the information granulation-verification modules (GVMs), the task stage and the recognition stage. At the base of the framework are the GVMs, which are designed to solve very special tasks. They consists of three parts: a method maintainer, a parameter chooser and a tester (verifier). The parameter chooser uses a given set of parameter settings for different runs of the maintained method on the input images of the GVM. The resulting images are tested for the occurrence of the property for which the GVM is designed. All successful images are put into a queue. The task stage calls new GVMs due to the filling of the queue, and it also assigns input images to the GVMs. All fully-treated images are passed to the recognition stage, where the information extraction is performed.","PeriodicalId":435320,"journal":{"name":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125233012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-18DOI: 10.1109/ICDAR.1997.620651
Manabu Ohta, A. Takasu, J. Adachi
This paper presents three probabilistic text retrieval methods designed to carry out a full-text search of English documents containing OCR errors. By searching for any query term on the premise that there are errors in the recognized text, the methods presented can tolerate such errors, and therefore costly manual post-editing is not required after OCR recognition. In the applied approach, confusion matrices are used to store characters which are likely to be interchanged when a particular character is missrecognized, and the respective probability of each occurrence. Moreover, a 2-gram matrix is used to store probabilities of character connection, i.e., which letter is likely to come after another. Multiple search terms are generated for an input query term by making reference to confusion matrices, after which a full-text search is run for each search term. The validity of retrieved terms is determined based on error-occurrence and character connection probabilities. The performance of these methods is experimentally evaluated by determining retrieval effectiveness, i.e., by calculating recall and precision rates. Results indicate marked improvement in comparison with exact matching.
{"title":"Retrieval methods for English-text with missrecognized OCR characters","authors":"Manabu Ohta, A. Takasu, J. Adachi","doi":"10.1109/ICDAR.1997.620651","DOIUrl":"https://doi.org/10.1109/ICDAR.1997.620651","url":null,"abstract":"This paper presents three probabilistic text retrieval methods designed to carry out a full-text search of English documents containing OCR errors. By searching for any query term on the premise that there are errors in the recognized text, the methods presented can tolerate such errors, and therefore costly manual post-editing is not required after OCR recognition. In the applied approach, confusion matrices are used to store characters which are likely to be interchanged when a particular character is missrecognized, and the respective probability of each occurrence. Moreover, a 2-gram matrix is used to store probabilities of character connection, i.e., which letter is likely to come after another. Multiple search terms are generated for an input query term by making reference to confusion matrices, after which a full-text search is run for each search term. The validity of retrieved terms is determined based on error-occurrence and character connection probabilities. The performance of these methods is experimentally evaluated by determining retrieval effectiveness, i.e., by calculating recall and precision rates. Results indicate marked improvement in comparison with exact matching.","PeriodicalId":435320,"journal":{"name":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121878919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-18DOI: 10.1109/ICDAR.1997.619848
L. Du, A. Downton, S. Lucas, Badr Al-Badr
Describes a new general-purpose contextual architecture which provides a unified framework for efficiently combining all types and levels of context in hand-print recognition applications. The architecture has been designed and built as a C++ class library and utilised within an initial demonstrator which implements full contextual constraints for a combination of postcode and corresponding postal address. Preliminary evaluation of the demonstrator suggests the system has the potential to achieve genuinely remarkable performance compared with previous context systems: its memory requirements are an order of magnitude less than an equivalent trie-based dictionary; its search speed is at least an order of magnitude faster than the trie, and actually gets faster as the dictionary size increases(!); and its error rate is virtually zero if suitable contextual constraints can be applied. Using this architecture, it appears to be possible to build real-time solutions to large-scale heterogeneous contextual problems.
{"title":"Generalized contextual recognition of hand-printed documents using semantic trees with lazy evaluation","authors":"L. Du, A. Downton, S. Lucas, Badr Al-Badr","doi":"10.1109/ICDAR.1997.619848","DOIUrl":"https://doi.org/10.1109/ICDAR.1997.619848","url":null,"abstract":"Describes a new general-purpose contextual architecture which provides a unified framework for efficiently combining all types and levels of context in hand-print recognition applications. The architecture has been designed and built as a C++ class library and utilised within an initial demonstrator which implements full contextual constraints for a combination of postcode and corresponding postal address. Preliminary evaluation of the demonstrator suggests the system has the potential to achieve genuinely remarkable performance compared with previous context systems: its memory requirements are an order of magnitude less than an equivalent trie-based dictionary; its search speed is at least an order of magnitude faster than the trie, and actually gets faster as the dictionary size increases(!); and its error rate is virtually zero if suitable contextual constraints can be applied. Using this architecture, it appears to be possible to build real-time solutions to large-scale heterogeneous contextual problems.","PeriodicalId":435320,"journal":{"name":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127664035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-18DOI: 10.1109/ICDAR.1997.620603
F. Esposito, D. Malerba, G. Semeraro, Cesare Daniele Antifora, G. D. Gennaro
This paper presents a prototypical digital library service. It integrates machine learning tools and techniques in order to make effective, efficient and economically feasible the process of capturing the information that should be stored and indexed by content in the digital library. Infact, information capture is one of the main bottleneck when building a digital library, since it involves complex pattern recognition problems, such as document analysis, classification and understanding. Experimental results show that learning systems can solve effectively and efficiently these problems.
{"title":"Information capture and semantic indexing of digital libraries through machine learning techniques","authors":"F. Esposito, D. Malerba, G. Semeraro, Cesare Daniele Antifora, G. D. Gennaro","doi":"10.1109/ICDAR.1997.620603","DOIUrl":"https://doi.org/10.1109/ICDAR.1997.620603","url":null,"abstract":"This paper presents a prototypical digital library service. It integrates machine learning tools and techniques in order to make effective, efficient and economically feasible the process of capturing the information that should be stored and indexed by content in the digital library. Infact, information capture is one of the main bottleneck when building a digital library, since it involves complex pattern recognition problems, such as document analysis, classification and understanding. Experimental results show that learning systems can solve effectively and efficiently these problems.","PeriodicalId":435320,"journal":{"name":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133635929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-18DOI: 10.1109/ICDAR.1997.619869
Christian Ah-Soon, K. Tombre
Lately, our team has begun investigating the analysis of architectural drawings. The paper presents our first results. After a brief introduction to the specificities of architectural drawings and a short review of existing work, we describe the low level processing steps we have implemented: segmentation, vectorization and detection of arcs, loops detection. In the present state of our work, we have investigated two complementary techniques for higher level analysis; one is based on geometric analysis and symbol recognition, the other relies on the idea that architecture is a combination of spaces, and is therefore based on spatial analysis. We present our current results with these two techniques, and we suggest a number of perspectives for the continuation of this work.
{"title":"Variations on the analysis of architectural drawings","authors":"Christian Ah-Soon, K. Tombre","doi":"10.1109/ICDAR.1997.619869","DOIUrl":"https://doi.org/10.1109/ICDAR.1997.619869","url":null,"abstract":"Lately, our team has begun investigating the analysis of architectural drawings. The paper presents our first results. After a brief introduction to the specificities of architectural drawings and a short review of existing work, we describe the low level processing steps we have implemented: segmentation, vectorization and detection of arcs, loops detection. In the present state of our work, we have investigated two complementary techniques for higher level analysis; one is based on geometric analysis and symbol recognition, the other relies on the idea that architecture is a combination of spaces, and is therefore based on spatial analysis. We present our current results with these two techniques, and we suggest a number of perspectives for the continuation of this work.","PeriodicalId":435320,"journal":{"name":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134397290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-18DOI: 10.1109/ICDAR.1997.620571
T. Caesar
The wide range of shape variations for Chinese characters requires an adequate representation of the discriminating features for classification. For the recognition of Latin characters or numerals pixel values of a normalized raster image are proper features to reach very good recognition rates. But Chinese characters require a much higher resolution of the normalized raster image to enable a discrimination of complex shaped characters which leads to a feature space dimensionality of prohibitive computational effort for classification. Therefore feature extraction algorithms are needed which capture the discriminative characteristics of character shapes in a compact form. Several algorithms were proposed in the past and many of them are based on the contour data. This paper also introduces a contour based approach which is very time efficient and overcomes the problem of vanishing lines during anisotropic size normalization.
{"title":"New features for Chinese character recognition","authors":"T. Caesar","doi":"10.1109/ICDAR.1997.620571","DOIUrl":"https://doi.org/10.1109/ICDAR.1997.620571","url":null,"abstract":"The wide range of shape variations for Chinese characters requires an adequate representation of the discriminating features for classification. For the recognition of Latin characters or numerals pixel values of a normalized raster image are proper features to reach very good recognition rates. But Chinese characters require a much higher resolution of the normalized raster image to enable a discrimination of complex shaped characters which leads to a feature space dimensionality of prohibitive computational effort for classification. Therefore feature extraction algorithms are needed which capture the discriminative characteristics of character shapes in a compact form. Several algorithms were proposed in the past and many of them are based on the contour data. This paper also introduces a contour based approach which is very time efficient and overcomes the problem of vanishing lines during anisotropic size normalization.","PeriodicalId":435320,"journal":{"name":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133853297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-18DOI: 10.1109/ICDAR.1997.620629
A. Benedetti, Z. Kovács-Vajna
An expression in closed form is derived for the recognition error vs. rejection rate of optical character or word recognition systems. This expression allows to define a lower bound for the error rate of any recognition system employing a rejection process based on the definition of a confidence threshold. This relation has also proved to be useful to make a quantitative comparison between two confidence computation methods implemented in a system for reading USA Census '90 hand-written forms. The newly proposed method is based upon a confidence model integrating single-character confidence levels, digram statistics and other information from the dictionary matching phase. At a 50% rejection rate, the field error rate calculated using the new confidence computation algorithm decreased from 47.7% to 44.6%, which represents a considerable improvement, given a theoretical lower bound of 40.8% on the error rate.
{"title":"Confidence computation improvement in an optical field reading system","authors":"A. Benedetti, Z. Kovács-Vajna","doi":"10.1109/ICDAR.1997.620629","DOIUrl":"https://doi.org/10.1109/ICDAR.1997.620629","url":null,"abstract":"An expression in closed form is derived for the recognition error vs. rejection rate of optical character or word recognition systems. This expression allows to define a lower bound for the error rate of any recognition system employing a rejection process based on the definition of a confidence threshold. This relation has also proved to be useful to make a quantitative comparison between two confidence computation methods implemented in a system for reading USA Census '90 hand-written forms. The newly proposed method is based upon a confidence model integrating single-character confidence levels, digram statistics and other information from the dictionary matching phase. At a 50% rejection rate, the field error rate calculated using the new confidence computation algorithm decreased from 47.7% to 44.6%, which represents a considerable improvement, given a theoretical lower bound of 40.8% on the error rate.","PeriodicalId":435320,"journal":{"name":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122955681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-18DOI: 10.1109/ICDAR.1997.620613
Gregory I. Dzuba, Alexander Filatov, A. Volgunin
The encoding of delivery point code (DPC) for a handwritten address is one of the most complex problems of the US mail delivery automation. This paper describes a real-time system intended to recognize the 5-digit ZIP code part of DPC. To increase the system performance the results of ZIP code recognition are cross-validated with those of city and state name recognition. The main principles of the handwritten word recognizer which provide the core of the system are explained. The system throughput is 40,000 address blocks per hour. Experimental results on live mail pieces are presented. The ZIP code recognition rate is 73% with 1% error rate.
{"title":"Handwritten ZIP code recognition","authors":"Gregory I. Dzuba, Alexander Filatov, A. Volgunin","doi":"10.1109/ICDAR.1997.620613","DOIUrl":"https://doi.org/10.1109/ICDAR.1997.620613","url":null,"abstract":"The encoding of delivery point code (DPC) for a handwritten address is one of the most complex problems of the US mail delivery automation. This paper describes a real-time system intended to recognize the 5-digit ZIP code part of DPC. To increase the system performance the results of ZIP code recognition are cross-validated with those of city and state name recognition. The main principles of the handwritten word recognizer which provide the core of the system are explained. The system throughput is 40,000 address blocks per hour. Experimental results on live mail pieces are presented. The ZIP code recognition rate is 73% with 1% error rate.","PeriodicalId":435320,"journal":{"name":"Proceedings of the Fourth International Conference on Document Analysis and Recognition","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121784133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}