Pub Date : 2023-01-01DOI: 10.1007/s10032-022-00425-4
C H Nachappa, N Shobha Rani, Peeta Basa Pati, M Gokulnath
Automated dewarping of camera-captured handwritten documents is a challenging research problem in Computer Vision and Pattern Recognition. Most available systems assume the shape of the camera-captured image boundaries to be anywhere between trapezoidal and octahedral, with linear distortion in areas between the boundaries for dewarping. The majority of the state-of-the-art applications successfully dewarp the simple-to-medium range geometrical distortions with partial selection of control points by a user. The proposed work implements a fully automated technique for control point detection from simple-to-complex geometrical distortions in camera-captured document images. The input image is subject to preprocessing, corner point detection, document map generation, and rendering of the de-warped document image. The proposed algorithm has been tested on five different camera-captured document datasets (one internal and four external publicly available) consisting of 958 images. Both quantitative and qualitative evaluations have been performed to test the efficacy of the proposed system. On the quantitative front, an Intersection Over Union (IoU) score of 0.92, 0.88, and 0.80 for document map generation for low-, medium-, and high-complexity datasets, respectively. Additionally, accuracies of the recognized texts, obtained from a market leading OCR engine, are utilized for quantitative comparative analysis on document images before and after the proposed enhancement. Finally, the qualitative analysis visually establishes the system's reliability by demonstrating improved readability even for severely distorted image samples.
{"title":"Adaptive dewarping of severely warped camera-captured document images based on document map generation.","authors":"C H Nachappa, N Shobha Rani, Peeta Basa Pati, M Gokulnath","doi":"10.1007/s10032-022-00425-4","DOIUrl":"https://doi.org/10.1007/s10032-022-00425-4","url":null,"abstract":"<p><p>Automated dewarping of camera-captured handwritten documents is a challenging research problem in Computer Vision and Pattern Recognition. Most available systems assume the shape of the camera-captured image boundaries to be anywhere between trapezoidal and octahedral, with linear distortion in areas between the boundaries for dewarping. The majority of the state-of-the-art applications successfully dewarp the simple-to-medium range geometrical distortions with partial selection of control points by a user. The proposed work implements a fully automated technique for control point detection from simple-to-complex geometrical distortions in camera-captured document images. The input image is subject to preprocessing, corner point detection, document map generation, and rendering of the de-warped document image. The proposed algorithm has been tested on five different camera-captured document datasets (one internal and four external publicly available) consisting of 958 images. Both quantitative and qualitative evaluations have been performed to test the efficacy of the proposed system. On the quantitative front, an Intersection Over Union (IoU) score of 0.92, 0.88, and 0.80 for document map generation for low-, medium-, and high-complexity datasets, respectively. Additionally, accuracies of the recognized texts, obtained from a market leading OCR engine, are utilized for quantitative comparative analysis on document images before and after the proposed enhancement. Finally, the qualitative analysis visually establishes the system's reliability by demonstrating improved readability even for severely distorted image samples.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9838515/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9493783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01Epub Date: 2021-08-10DOI: 10.1007/s10032-021-00385-1
Josep Lladós, Daniel Lopresti, Seiichi Uchida
{"title":"Editorial for special issue on \"Advanced Topics in Document Analysis and Recognition\".","authors":"Josep Lladós, Daniel Lopresti, Seiichi Uchida","doi":"10.1007/s10032-021-00385-1","DOIUrl":"https://doi.org/10.1007/s10032-021-00385-1","url":null,"abstract":"","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10032-021-00385-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39314127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-06-01DOI: 10.1007/s10032-009-0105-9
Jie Zou, Daniel Le, George R Thoma
The set of references that typically appear toward the end of journal articles is sometimes, though not always, a field in bibliographic (citation) databases. But even if references do not constitute such a field, they can be useful as a preprocessing step in the automated extraction of other bibliographic data from articles, as well as in computer-assisted indexing of articles. Automation in data extraction and indexing to minimize human labor is key to the affordable creation and maintenance of large bibliographic databases. Extracting the components of references, such as author names, article title, journal name, publication date and other entities, is therefore a valuable and sometimes necessary task. This paper describes a two-step process using statistical machine learning algorithms, to first locate the references in HTML medical articles and then to parse them. Reference locating identifies the reference section in an article and then decomposes it into individual references. We formulate this step as a two-class classification problem based on text and geometric features. An evaluation conducted on 500 articles drawn from 100 medical journals achieves near-perfect precision and recall rates for locating references. Reference parsing identifies the components of each reference. For this second step, we implement and compare two algorithms. One relies on sequence statistics and trains a Conditional Random Field. The other focuses on local feature statistics and trains a Support Vector Machine to classify each individual word, followed by a search algorithm that systematically corrects low confidence labels if the label sequence violates a set of predefined rules. The overall performance of these two reference-parsing algorithms is about the same: above 99% accuracy at the word level, and over 97% accuracy at the chunk level.
{"title":"Locating and parsing bibliographic references in HTML medical articles.","authors":"Jie Zou, Daniel Le, George R Thoma","doi":"10.1007/s10032-009-0105-9","DOIUrl":"https://doi.org/10.1007/s10032-009-0105-9","url":null,"abstract":"<p><p>The set of references that typically appear toward the end of journal articles is sometimes, though not always, a field in bibliographic (citation) databases. But even if references do not constitute such a field, they can be useful as a preprocessing step in the automated extraction of other bibliographic data from articles, as well as in computer-assisted indexing of articles. Automation in data extraction and indexing to minimize human labor is key to the affordable creation and maintenance of large bibliographic databases. Extracting the components of references, such as author names, article title, journal name, publication date and other entities, is therefore a valuable and sometimes necessary task. This paper describes a two-step process using statistical machine learning algorithms, to first locate the references in HTML medical articles and then to parse them. Reference locating identifies the reference section in an article and then decomposes it into individual references. We formulate this step as a two-class classification problem based on text and geometric features. An evaluation conducted on 500 articles drawn from 100 medical journals achieves near-perfect precision and recall rates for locating references. Reference parsing identifies the components of each reference. For this second step, we implement and compare two algorithms. One relies on sequence statistics and trains a Conditional Random Field. The other focuses on local feature statistics and trains a Support Vector Machine to classify each individual word, followed by a search algorithm that systematically corrects low confidence labels if the label sequence violates a set of predefined rules. The overall performance of these two reference-parsing algorithms is about the same: above 99% accuracy at the word level, and over 97% accuracy at the chunk level.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2010-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10032-009-0105-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29129418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StubbeAndrea, RinglstetterChristoph, U. SchulzKlaus
Given a specific information need, documents of the wrong genre can be considered as noise. From this perspective, genre classification helps to separate relevant documents from noise. Orthographic...
{"title":"Genre as noise","authors":"StubbeAndrea, RinglstetterChristoph, U. SchulzKlaus","doi":"10.2307/j.ctv125jncf.8","DOIUrl":"https://doi.org/10.2307/j.ctv125jncf.8","url":null,"abstract":"Given a specific information need, documents of the wrong genre can be considered as noise. From this perspective, genre classification helps to separate relevant documents from noise. Orthographic...","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84556320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in mo...
在图书馆和各个国家档案馆中有大量的历史文献没有被电子利用。虽然自动读取完整的页面仍然存在,在…
{"title":"Text line segmentation of historical documents: a survey","authors":"Likforman-SulemLaurence, ZahourAbderrazak, TaconetBruno","doi":"10.5555/1237480.1237483","DOIUrl":"https://doi.org/10.5555/1237480.1237483","url":null,"abstract":"There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in mo...","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2007-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85190118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in mo...
在图书馆和各个国家档案馆中有大量的历史文献没有被电子利用。虽然自动读取完整的页面仍然存在,在…
{"title":"Text line segmentation of historical documents","authors":"Likforman-SulemLaurence, ZahourAbderrazak, TaconetBruno","doi":"10.5555/2722890.2723025","DOIUrl":"https://doi.org/10.5555/2722890.2723025","url":null,"abstract":"There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in mo...","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2007-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85564508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-04-01DOI: 10.1007/s10032-002-0085-5
A. Britto, R. Sabourin, Flávio Bortolozzi
{"title":"The recognition of handwritten numeral strings using a two-stage HMM-based method","authors":"A. Britto, R. Sabourin, Flávio Bortolozzi","doi":"10.1007/s10032-002-0085-5","DOIUrl":"https://doi.org/10.1007/s10032-002-0085-5","url":null,"abstract":"","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2003-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89695619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-04-01DOI: 10.1007/s10032-002-0098-0
Lixin Fan, Liying Fan, C. Tan
{"title":"Adaptive image-smoothing using a coplanar matrix and its application to document image binarization","authors":"Lixin Fan, Liying Fan, C. Tan","doi":"10.1007/s10032-002-0098-0","DOIUrl":"https://doi.org/10.1007/s10032-002-0098-0","url":null,"abstract":"","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2003-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82471177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-04-01DOI: 10.1007/s10032-002-0093-5
A. Spitz, K. Tombre
{"title":"Special issue – selected papers from the ICDAR'01 conference","authors":"A. Spitz, K. Tombre","doi":"10.1007/s10032-002-0093-5","DOIUrl":"https://doi.org/10.1007/s10032-002-0093-5","url":null,"abstract":"","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2003-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82836000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-04-01DOI: 10.1007/s10032-002-0087-3
J. Pitrelli, Amit Roy
{"title":"Creating word-level language models for large-vocabulary handwriting recognition","authors":"J. Pitrelli, Amit Roy","doi":"10.1007/s10032-002-0087-3","DOIUrl":"https://doi.org/10.1007/s10032-002-0087-3","url":null,"abstract":"","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2003-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89467363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}