Akihito Kitadai, Jun Takakura, Masatoshi Ishikawa, M. Nakagawa, Hajime Baba, Akihiro Watanabe
This paper presents a design and an implementation of document image retrieval to support reading mokkans. A mokkan is a wooden tablet with text written by a brush in India ink. Despite the archaeological and historical value of the mokkans excavated from ancient ruins, many of the mokkans have not been decoded yet due to the lost or too much damaged character patterns on them. Character recognition for damaged patterns is useful to decode such mokkans. Furthermore, if the recognition results show not only the character codes but also the images of the character patterns and the whole mokkans, the recognition becomes useful document retrieval to complement the lost or unreadable part of the mokkans. In the implementation, we built a public database of historical mokkans with their photographs and a character recognition module working on our support system to search the database. The evaluation by archaeologists is in progress.
{"title":"Document Image Retrieval to Support Reading Mokkans","authors":"Akihito Kitadai, Jun Takakura, Masatoshi Ishikawa, M. Nakagawa, Hajime Baba, Akihiro Watanabe","doi":"10.1109/DAS.2008.32","DOIUrl":"https://doi.org/10.1109/DAS.2008.32","url":null,"abstract":"This paper presents a design and an implementation of document image retrieval to support reading mokkans. A mokkan is a wooden tablet with text written by a brush in India ink. Despite the archaeological and historical value of the mokkans excavated from ancient ruins, many of the mokkans have not been decoded yet due to the lost or too much damaged character patterns on them. Character recognition for damaged patterns is useful to decode such mokkans. Furthermore, if the recognition results show not only the character codes but also the images of the character patterns and the whole mokkans, the recognition becomes useful document retrieval to complement the lost or unreadable part of the mokkans. In the implementation, we built a public database of historical mokkans with their photographs and a character recognition module working on our support system to search the database. The evaluation by archaeologists is in progress.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130254134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today, printing and reverse printing (scanning, OCR, logical labeling etc.) technologies have become quite mature and thus allow for an easy transition of documents between physical and electronic world. However, there is no technology today which supports the lossless interpretation of paper-based user interaction with direct effects upon the electronic representation of that document. The HyperPrinting environment tries to fill in this gap and thus accounts for the personal favors of a majority of office workers: Not only managers and knowledge workers prefer to read longer documents, articles or news from paper in contrast to a computer monitor or handheld computer. With the help of HyperPrinting, users can annotate, send notes or initiate tasks and it thus offers a completely new paradigm in the usage and treatment of paper documents. As a side-effect, the use of HyperPrinting builds up a document repository which is not only searchable by full text but also by meta-information, which in turn is depending on the selected user scenario.
{"title":"The HCI Paradigm of HyperPrinting","authors":"T. Kieninger, A. Dengel","doi":"10.1109/DAS.2008.11","DOIUrl":"https://doi.org/10.1109/DAS.2008.11","url":null,"abstract":"Today, printing and reverse printing (scanning, OCR, logical labeling etc.) technologies have become quite mature and thus allow for an easy transition of documents between physical and electronic world. However, there is no technology today which supports the lossless interpretation of paper-based user interaction with direct effects upon the electronic representation of that document. The HyperPrinting environment tries to fill in this gap and thus accounts for the personal favors of a majority of office workers: Not only managers and knowledge workers prefer to read longer documents, articles or news from paper in contrast to a computer monitor or handheld computer. With the help of HyperPrinting, users can annotate, send notes or initiate tasks and it thus offers a completely new paradigm in the usage and treatment of paper documents. As a side-effect, the use of HyperPrinting builds up a document repository which is not only searchable by full text but also by meta-information, which in turn is depending on the selected user scenario.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125324566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In graphical documents (map, engineering drawing), artistic documents etc. there exist many printed materials where text lines are not parallel to each other and they are multi-oriented and curve in nature. For the OCR of such documents we need to extract individual text lines from the documents. Extraction of individual text lines from multi-oriented and/or curved text document is a difficult problem. In this paper, we propose a novel method to extract individual text lines from such document pages and the method is based on the foreground and background information of the characters of the text. To take care of background information, water reservoir concept is used here. In the proposed scheme at first, individual components are detected and grouped into 3-character clusters using their inter-component distance, size and positional information. Applying concept of graph, initial 3-character clusters are merged to have larger cluster group. Using inter-character background information, orientations of the extreme characters of a larger cluster are decided and based on these orientation, two candidate regions are formed from the cluster. Finally, with the help of these candidate regions, individual lines are extracted. From the experiment, we obtained encouraging result.
{"title":"Multi-Oriented English Text Line Extraction Using Background and Foreground Information","authors":"P. Roy, U. Pal, J. Lladós, F. Kimura","doi":"10.1109/DAS.2008.83","DOIUrl":"https://doi.org/10.1109/DAS.2008.83","url":null,"abstract":"In graphical documents (map, engineering drawing), artistic documents etc. there exist many printed materials where text lines are not parallel to each other and they are multi-oriented and curve in nature. For the OCR of such documents we need to extract individual text lines from the documents. Extraction of individual text lines from multi-oriented and/or curved text document is a difficult problem. In this paper, we propose a novel method to extract individual text lines from such document pages and the method is based on the foreground and background information of the characters of the text. To take care of background information, water reservoir concept is used here. In the proposed scheme at first, individual components are detected and grouped into 3-character clusters using their inter-component distance, size and positional information. Applying concept of graph, initial 3-character clusters are merged to have larger cluster group. Using inter-character background information, orientations of the extreme characters of a larger cluster are decided and based on these orientation, two candidate regions are formed from the cluster. Finally, with the help of these candidate regions, individual lines are extracted. From the experiment, we obtained encouraging result.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124048315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Albert Gordo, D. Llorens, A. Marzal, F. Prat, J. M. Vilar
We present a complete assisted transcription system for ancient documents: State. The system consists of two applications: a pen-based, interactive application to assist humans in transcribing ancient documents and a recognition engine which offers automatic transcriptions via a web service. The interaction model and the recognition algorithm employed in the current version of State are presented. Some preliminary experiments show the productivity gains obtained with the system when transcribing a document and the error rate of the current recognition engine.
{"title":"State: A Multimodal Assisted Text-Transcription System for Ancient Documents","authors":"Albert Gordo, D. Llorens, A. Marzal, F. Prat, J. M. Vilar","doi":"10.1109/DAS.2008.28","DOIUrl":"https://doi.org/10.1109/DAS.2008.28","url":null,"abstract":"We present a complete assisted transcription system for ancient documents: State. The system consists of two applications: a pen-based, interactive application to assist humans in transcribing ancient documents and a recognition engine which offers automatic transcriptions via a web service. The interaction model and the recognition algorithm employed in the current version of State are presented. Some preliminary experiments show the productivity gains obtained with the system when transcribing a document and the error rate of the current recognition engine.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127923955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A decomposition method for color document images is proposed in this paper. A two dimensional feature surface is constructed from the input color image, and then a novel and unsupervised method based on contour lines is proposed to segment the surface. In detail, colors of the image pixels are firstly projected to a spherical surface whose center is the background color. The projection is used to transform the observed colors to the corresponding 'ideal' foreground colors. Then the spherical surface is segmented into several non-overlapped regions, and each region corresponds to an individual layer of the input color document image. Finally, the image pixels are projected to the spherical surface and classified to the corresponding layers.
{"title":"Unsupervised Decomposition of Color Document Images by Projecting Colors to a Spherical Surface","authors":"Yuan He, Jun Sun, S. Naoi, Y. Fujii, K. Fujimoto","doi":"10.1109/DAS.2008.37","DOIUrl":"https://doi.org/10.1109/DAS.2008.37","url":null,"abstract":"A decomposition method for color document images is proposed in this paper. A two dimensional feature surface is constructed from the input color image, and then a novel and unsupervised method based on contour lines is proposed to segment the surface. In detail, colors of the image pixels are firstly projected to a spherical surface whose center is the background color. The projection is used to transform the observed colors to the corresponding 'ideal' foreground colors. Then the spherical surface is segmented into several non-overlapped regions, and each region corresponds to an individual layer of the input color document image. Finally, the image pixels are projected to the spherical surface and classified to the corresponding layers.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127321515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Stamatopoulos, B. Gatos, I. Pratikakis, S. Perantonis
Dewarping of camera document images has attracted a lot of interest over the last few years since warping not only reduces the document readability but also affects the accuracy of an OCR application. In this paper, a two-step approach for efficient dewarping of camera document images is presented. At a first step, a coarse dewarping is accomplished with the help of a transformation model which maps the projection of a curved surface to a 2D rectangular area. The projection of the curved surface is delimited by the two curved lines which fit the top and bottom text lines along with the two straight lines which fit to the left and right text boundaries. At a second step, fine dewarping is achieved based on words detection. All words are pose normalized guided by the lower and upper word baselines. Experimental results on several camera document images demonstrate the robustness and effectiveness of the proposed technique.
{"title":"A Two-Step Dewarping of Camera Document Images","authors":"N. Stamatopoulos, B. Gatos, I. Pratikakis, S. Perantonis","doi":"10.1109/DAS.2008.40","DOIUrl":"https://doi.org/10.1109/DAS.2008.40","url":null,"abstract":"Dewarping of camera document images has attracted a lot of interest over the last few years since warping not only reduces the document readability but also affects the accuracy of an OCR application. In this paper, a two-step approach for efficient dewarping of camera document images is presented. At a first step, a coarse dewarping is accomplished with the help of a transformation model which maps the projection of a curved surface to a 2D rectangular area. The projection of the curved surface is delimited by the two curved lines which fit the top and bottom text lines along with the two straight lines which fit to the left and right text boundaries. At a second step, fine dewarping is achieved based on words detection. All words are pose normalized guided by the lower and upper word baselines. Experimental results on several camera document images demonstrate the robustness and effectiveness of the proposed technique.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132178671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present an adaptive method for graphic symbol representation based on shape contexts. The proposed descriptor is invariant under classical geometric transforms (rotation, scale) and based on interest points. To reduce the complexity of matching a symbol to a largeset of candidates we use the popular vector model for information retrieval. In this way, on the set of shape descriptors we build a visual vocabulary where each symbol is retrieved on visual words. Experimental results on complex and occluded symbols show that the approach is very promising.
{"title":"Symbol Descriptor Based on Shape Context and Vector Model of Information Retrieval","authors":"T.-O. Nguyen, S. Tabbone, O. R. Terrades","doi":"10.1109/DAS.2008.58","DOIUrl":"https://doi.org/10.1109/DAS.2008.58","url":null,"abstract":"In this paper we present an adaptive method for graphic symbol representation based on shape contexts. The proposed descriptor is invariant under classical geometric transforms (rotation, scale) and based on interest points. To reduce the complexity of matching a symbol to a largeset of candidates we use the popular vector model for information retrieval. In this way, on the set of shape descriptors we build a visual vocabulary where each symbol is retrieved on visual words. Experimental results on complex and occluded symbols show that the approach is very promising.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121261702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We report an improved methodology for training a sequence of classifiers for document image content extraction, that is, the location and segmentation of regions containing handwriting, machine-printed text, photographs, blank space, etc. The resulting segmentation is pixel-accurate, and so accommodates a wide range of zone shapes (not merely rectangles). We have systematically explored the best scale (spatial extent) of features. We have found that the methodology is sensitive to ground-truthing policy, and especially to precision of ground-truth boundaries. Experiments on a diverse test set of 83 document images show that tighter ground-truth reduces per-pixel classification errors by 45% (from 38.9% to 21.4%). Strong evidence, from both experiments and simulation, suggests that iterated classification converges region boundaries to the ground-truth (i.e. they don't drift). Experiments show that four-stage iterated classifiers reduce the error rates by 24%. We also present an analysis of special cases suggesting reasons why boundaries converge to the ground-truth.
{"title":"The Convergence of Iterated Classification","authors":"Chang An, H. Baird","doi":"10.1109/DAS.2008.52","DOIUrl":"https://doi.org/10.1109/DAS.2008.52","url":null,"abstract":"We report an improved methodology for training a sequence of classifiers for document image content extraction, that is, the location and segmentation of regions containing handwriting, machine-printed text, photographs, blank space, etc. The resulting segmentation is pixel-accurate, and so accommodates a wide range of zone shapes (not merely rectangles). We have systematically explored the best scale (spatial extent) of features. We have found that the methodology is sensitive to ground-truthing policy, and especially to precision of ground-truth boundaries. Experiments on a diverse test set of 83 document images show that tighter ground-truth reduces per-pixel classification errors by 45% (from 38.9% to 21.4%). Strong evidence, from both experiments and simulation, suggests that iterated classification converges region boundaries to the ground-truth (i.e. they don't drift). Experiments show that four-stage iterated classifiers reduce the error rates by 24%. We also present an analysis of special cases suggesting reasons why boundaries converge to the ground-truth.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116281193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a method to spot both text and graphical symbols in a collection of images of wiring diagrams. Word spotting and symbol spotting methods tend to use the most discriminative features to describe the objects to be located. This fact makes that one can not tackle with textual and symbolic information at the same time. We propose a spotting architecture able to index both words and symbols, inspired in off-the-shelf object recognition architectures. Keypoints are extracted from a document image and a local descriptor is computed at each of these points of interest. The spatial organization of these descriptors validate the hypothesis to find an object (text or symbol) in a certain location and under a certain pose.
{"title":"Word and Symbol Spotting Using Spatial Organization of Local Descriptors","authors":"Marçal Rusiñol, J. Lladós","doi":"10.1109/DAS.2008.24","DOIUrl":"https://doi.org/10.1109/DAS.2008.24","url":null,"abstract":"In this paper we present a method to spot both text and graphical symbols in a collection of images of wiring diagrams. Word spotting and symbol spotting methods tend to use the most discriminative features to describe the objects to be located. This fact makes that one can not tackle with textual and symbolic information at the same time. We propose a spotting architecture able to index both words and symbols, inspired in off-the-shelf object recognition architectures. Keypoints are extracted from a document image and a local descriptor is computed at each of these points of interest. The spatial organization of these descriptors validate the hypothesis to find an object (text or symbol) in a certain location and under a certain pose.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127203518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a framework for understanding tables of contents (TOC) of books, journals, and magazines. We propose a universal logical structure representation in terms of a hierarchy of entries, each of which may contain a descriptor and a locator. We enumerate graphical and perceptual cues that provide cues to parsing of tables of contents in terms of this formalism. We make initial suggestions about the form of evaluation metrics for comparing ground truthed tables of contents with the output of recognition algorithms. Typical and a typical tables of contents are used throughout to illustrate significant phenomena that must be dealt with in principled ways in any general TOC interpretation scheme. Finally we discuss implications of our observations on the design of recognition algorithms.
{"title":"On the Reading of Tables of Contents","authors":"Prateek Sarkar, E. Saund","doi":"10.1109/DAS.2008.87","DOIUrl":"https://doi.org/10.1109/DAS.2008.87","url":null,"abstract":"This paper presents a framework for understanding tables of contents (TOC) of books, journals, and magazines. We propose a universal logical structure representation in terms of a hierarchy of entries, each of which may contain a descriptor and a locator. We enumerate graphical and perceptual cues that provide cues to parsing of tables of contents in terms of this formalism. We make initial suggestions about the form of evaluation metrics for comparing ground truthed tables of contents with the output of recognition algorithms. Typical and a typical tables of contents are used throughout to illustrate significant phenomena that must be dealt with in principled ways in any general TOC interpretation scheme. Finally we discuss implications of our observations on the design of recognition algorithms.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125631010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}