Pub Date : 1995-08-14DOI: 10.1109/ICDAR.1995.601965
A. Dengel, F. Dubiel
We describe a system which is capable of learning the presentation of document logical structures, exemplarily shown for business letters. Presenting a set of instances to the system, it clusters them into structural concepts and induces a concept hierarchy. This concept hierarchy is taken as a source for classifying future input. The paper introduces the different learning steps, describes how the resulting concept hierarchy is applied for logical labeling and reports on the results.
{"title":"Clustering and classification of document structure-a machine learning approach","authors":"A. Dengel, F. Dubiel","doi":"10.1109/ICDAR.1995.601965","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.601965","url":null,"abstract":"We describe a system which is capable of learning the presentation of document logical structures, exemplarily shown for business letters. Presenting a set of instances to the system, it clusters them into structural concepts and induces a concept hierarchy. This concept hierarchy is taken as a source for classifying future input. The paper introduces the different learning steps, describes how the resulting concept hierarchy is applied for logical labeling and reports on the results.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127905751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-08-14DOI: 10.1109/ICDAR.1995.599005
D. Le, G. Thoma, H. Wechsler
In this paper we present robust algorithms for detecting the page orientation (portrait/landscape) and the degree of skew for binary document images, and a method for classification of binary document images into textual or non-textual data blocks using neural network models. The performance of four neural network models are compared in terms of training times, memory requirements, and classification accuracy, and it was found that the radial basis functions performed best. The experiments show the feasibility of building an integrated document analysis system for page orientation and skew angle detection, and textual block classification.
{"title":"Document image analysis using integrated image and neural processing","authors":"D. Le, G. Thoma, H. Wechsler","doi":"10.1109/ICDAR.1995.599005","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.599005","url":null,"abstract":"In this paper we present robust algorithms for detecting the page orientation (portrait/landscape) and the degree of skew for binary document images, and a method for classification of binary document images into textual or non-textual data blocks using neural network models. The performance of four neural network models are compared in terms of training times, memory requirements, and classification accuracy, and it was found that the radial basis functions performed best. The experiments show the feasibility of building an integrated document analysis system for page orientation and skew angle detection, and textual block classification.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128767786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-08-14DOI: 10.1109/ICDAR.1995.601961
Y. Tang, Hong Ma, Dihua Xi, Y. Cheng, C. Suen
Based on wavelets, a new theoretical method has been developed to process form documents. In this method, two-dimensional multiresolution analysis (MSA), wavelet decomposition algorithm, and compactly supported orthonormal wavelets are used to transform a document image into sub-images. According to these sub-images, the reference lines of forms can be extracted, and knowledge about the geometric structure of the document can be acquired. Experiments prove that this new method can be applied to process documents with promising results.
{"title":"Extraction of reference lines from documents with grey-level background using sub-images of wavelets","authors":"Y. Tang, Hong Ma, Dihua Xi, Y. Cheng, C. Suen","doi":"10.1109/ICDAR.1995.601961","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.601961","url":null,"abstract":"Based on wavelets, a new theoretical method has been developed to process form documents. In this method, two-dimensional multiresolution analysis (MSA), wavelet decomposition algorithm, and compactly supported orthonormal wavelets are used to transform a document image into sub-images. According to these sub-images, the reference lines of forms can be extracted, and knowledge about the geometric structure of the document can be acquired. Experiments prove that this new method can be applied to process documents with promising results.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121000177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-08-14DOI: 10.1109/ICDAR.1995.602113
Dz-Mou Jung, G. Nagy
Shift-invariant, custom designed n-tuple features are combined with a probabilistic decision tree to classify isolated printed characters. The feature probabilities are estimated using a novel compound Bayesian procedure in order to delay the fall-off in classification accuracy with tree size due to a small sample set. On a ten-class confusion set of eight-point characters, the method yields error rates under 1% with only 3 training samples per class.
{"title":"Joint feature and classifier design for OCR","authors":"Dz-Mou Jung, G. Nagy","doi":"10.1109/ICDAR.1995.602113","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.602113","url":null,"abstract":"Shift-invariant, custom designed n-tuple features are combined with a probabilistic decision tree to classify isolated printed characters. The feature probabilities are estimated using a novel compound Bayesian procedure in order to delay the fall-off in classification accuracy with tree size due to a small sample set. On a ten-class confusion set of eight-point characters, the method yields error rates under 1% with only 3 training samples per class.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121501714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-08-14DOI: 10.1109/ICDAR.1995.602062
Kathy Ryall, S. Shieber, J. Marks, Murray Mazer
We propose a technique that uses a proximity metric for delineating partially or fully bounded regions of a scanned bitmap that depicts a building floor plan. A proximity field is defined over the bitmap, which is used both to identify the centers of subjective regions in the image and to assign pixels to regions by proximity. The region boundaries generated by the method tend to match well the subjective boundaries of regions in the image. We discuss incorporation of the technique in a semi-automated interactive system for region identification in floor plans. In contrast to area-filling techniques for delineating areal regions of images, our approach works robustly for partially bounded regions. Furthermore, the frailties of the method that do remain, unlike those of alternative techniques, are well-moderated by simple human intervention.
{"title":"Semi-automatic delineation of regions in floor plans","authors":"Kathy Ryall, S. Shieber, J. Marks, Murray Mazer","doi":"10.1109/ICDAR.1995.602062","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.602062","url":null,"abstract":"We propose a technique that uses a proximity metric for delineating partially or fully bounded regions of a scanned bitmap that depicts a building floor plan. A proximity field is defined over the bitmap, which is used both to identify the centers of subjective regions in the image and to assign pixels to regions by proximity. The region boundaries generated by the method tend to match well the subjective boundaries of regions in the image. We discuss incorporation of the technique in a semi-automated interactive system for region identification in floor plans. In contrast to area-filling techniques for delineating areal regions of images, our approach works robustly for partially bounded regions. Furthermore, the frailties of the method that do remain, unlike those of alternative techniques, are well-moderated by simple human intervention.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130156717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-08-14DOI: 10.1109/ICDAR.1995.598949
S. Madhvanath, V. Govindaraju, V. Ramanaprasad, Dar-Shyang Lee, S. Srihari
Commercial forms-reading systems for extraction of data from forms do not meet acceptable accuracy requirements on forms filled out by hand. In December 1993, NIST called industry and research organizations working in the area of handwriting recognition to participate in a test to determine the state of the art in the area. A database of form images containing actual responses received by the US Census Bureau was provided. The handwritten responses are very loosely constrained in terms of writing style, format of response and choice of text. The sizes of the lexicons provided are very large (about 50000 entries) and yet the coverage is incomplete (about 70%). In this paper we discuss the approach taken by CEDAR to automate the task of reading the census forms. The subtasks of field extraction and phrase recognition are described.
{"title":"Reading handwritten US census forms","authors":"S. Madhvanath, V. Govindaraju, V. Ramanaprasad, Dar-Shyang Lee, S. Srihari","doi":"10.1109/ICDAR.1995.598949","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.598949","url":null,"abstract":"Commercial forms-reading systems for extraction of data from forms do not meet acceptable accuracy requirements on forms filled out by hand. In December 1993, NIST called industry and research organizations working in the area of handwriting recognition to participate in a test to determine the state of the art in the area. A database of form images containing actual responses received by the US Census Bureau was provided. The handwritten responses are very loosely constrained in terms of writing style, format of response and choice of text. The sizes of the lexicons provided are very large (about 50000 entries) and yet the coverage is incomplete (about 70%). In this paper we discuss the approach taken by CEDAR to automate the task of reading the census forms. The subtasks of field extraction and phrase recognition are described.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122681781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-08-14DOI: 10.1109/ICDAR.1995.601986
Din-Chang Tseng, Yao-Fu Li, Cheng-Tan Tung
A circular histogram thresholding for color image segmentation is proposed. A circular hue histogram is first constructed based on a UCS (I,H,S) color space. The histogram is automatically smoothed by a scale-space filter, then transformed into traditional histogram form, and finally recursively thresholded based on the maximum principle of variance. Three comparisons of performance are reported: (i) the proposed thresholding on the circular histogram with that on a traditional histogram; (ii) the proposed thresholding with clustering; and (iii) thresholding based on a UCS hue attribute with that based on a non-UCS hue attribute. Benefits of the proposed approach are confirmed in experiments.
{"title":"Circular histogram thresholding for color image segmentation","authors":"Din-Chang Tseng, Yao-Fu Li, Cheng-Tan Tung","doi":"10.1109/ICDAR.1995.601986","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.601986","url":null,"abstract":"A circular histogram thresholding for color image segmentation is proposed. A circular hue histogram is first constructed based on a UCS (I,H,S) color space. The histogram is automatically smoothed by a scale-space filter, then transformed into traditional histogram form, and finally recursively thresholded based on the maximum principle of variance. Three comparisons of performance are reported: (i) the proposed thresholding on the circular histogram with that on a traditional histogram; (ii) the proposed thresholding with clustering; and (iii) thresholding based on a UCS hue attribute with that based on a non-UCS hue attribute. Benefits of the proposed approach are confirmed in experiments.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122848665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-08-14DOI: 10.1109/ICDAR.1995.601964
Y. Hirayama
This paper presents a novel method for table structure analysis. Many documents have table areas, and some have both table and figure areas. It is very important to be able to classify table and figure areas automatically. Furthermore, in tables, the column and row in which a character string is located are very important pieces of information. To detect and analyze table areas, the following method is applied: First, areas that may contain tables or figures are distinguished from text areas by the presence of horizontal and vertical lines. Next, the areas are assumed to be table areas and are analyzed as such. A judgment is made on whether each of the areas can in fact be a table area or not; in this way, the actual table areas are detected. Finally, the structures of the areas are analyzed and character strings in the areas are arranged by using the DP matching method. This method was applied to sixty-five pages of Japanese technical papers, magazines, manuals for software programs, and pages including 34 table areas, 48 line drawing areas, and 35 image areas. As a result, 96.6 percent of the areas were detected correctly and 91.7 percent of the tables were analyzed and arranged correctly.
{"title":"A method for table structure analysis using DP matching","authors":"Y. Hirayama","doi":"10.1109/ICDAR.1995.601964","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.601964","url":null,"abstract":"This paper presents a novel method for table structure analysis. Many documents have table areas, and some have both table and figure areas. It is very important to be able to classify table and figure areas automatically. Furthermore, in tables, the column and row in which a character string is located are very important pieces of information. To detect and analyze table areas, the following method is applied: First, areas that may contain tables or figures are distinguished from text areas by the presence of horizontal and vertical lines. Next, the areas are assumed to be table areas and are analyzed as such. A judgment is made on whether each of the areas can in fact be a table area or not; in this way, the actual table areas are detected. Finally, the structures of the areas are analyzed and character strings in the areas are arranged by using the DP matching method. This method was applied to sixty-five pages of Japanese technical papers, magazines, manuals for software programs, and pages including 34 table areas, 48 line drawing areas, and 35 image areas. As a result, 96.6 percent of the areas were detected correctly and 91.7 percent of the tables were analyzed and arranged correctly.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127806687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-08-14DOI: 10.1109/ICDAR.1995.598997
Gary S. D. Farrow, C. Xydeas, J. Oakley
Intelligent Document Understanding (IDU) is the process of converting scanned document pages into an electronic, processable form. We have previously presented a IDU system architecture suitable for this task which uses a hybrid bottom-up/top-down control strategy. In this paper we focus on a specific subproblem that arises within the chosen framework, concerned with selecting an appropriate page layout structure. A detailed analysis of the problem using an error propagation model, allows computationally simple search strategies to be developed. A multistage layout formation algorithm is proposed and its performance is critically assessed when implemented using two different Layout Object selection criterion. The first selection criterion is based on a maximal column area coverage; the second is based on a probabilistic Layout Object selection. Both techniques have been incorporated into the hybrid IDU system and the results presented indicate its superiority over previously reported systems.
{"title":"Model matching in intelligent document understanding","authors":"Gary S. D. Farrow, C. Xydeas, J. Oakley","doi":"10.1109/ICDAR.1995.598997","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.598997","url":null,"abstract":"Intelligent Document Understanding (IDU) is the process of converting scanned document pages into an electronic, processable form. We have previously presented a IDU system architecture suitable for this task which uses a hybrid bottom-up/top-down control strategy. In this paper we focus on a specific subproblem that arises within the chosen framework, concerned with selecting an appropriate page layout structure. A detailed analysis of the problem using an error propagation model, allows computationally simple search strategies to be developed. A multistage layout formation algorithm is proposed and its performance is critically assessed when implemented using two different Layout Object selection criterion. The first selection criterion is based on a maximal column area coverage; the second is based on a probabilistic Layout Object selection. Both techniques have been incorporated into the hybrid IDU system and the results presented indicate its superiority over previously reported systems.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128008175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-08-14DOI: 10.1109/ICDAR.1995.599037
F. Esposito, D. Malerba, G. Semeraro
In this paper, we present a hybrid approach to the problem of the document analysis in which the document image is segmented by means of a top-down technique and then basic blocks are grouped bottom-up in order to form complex layout components. In this latter process, called layout analysis, only generic knowledge on typesetting conventions is exploited. Such knowledge is independent of the particular class of processed documents and turns out to be valuable for a wide range of documents. Preliminary results of the layout analysis system LEX (Layout EXpert) show the methodological validity of this approach.
{"title":"A knowledge-based approach to the layout analysis","authors":"F. Esposito, D. Malerba, G. Semeraro","doi":"10.1109/ICDAR.1995.599037","DOIUrl":"https://doi.org/10.1109/ICDAR.1995.599037","url":null,"abstract":"In this paper, we present a hybrid approach to the problem of the document analysis in which the document image is segmented by means of a top-down technique and then basic blocks are grouped bottom-up in order to form complex layout components. In this latter process, called layout analysis, only generic knowledge on typesetting conventions is exploited. Such knowledge is independent of the particular class of processed documents and turns out to be valuable for a wide range of documents. Preliminary results of the layout analysis system LEX (Layout EXpert) show the methodological validity of this approach.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128071116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}