{"title":"Document understanding using probabilistic relaxation: application on tables of contents of periodicals","authors":"Frank Lebourgeois, H. Emptoz, S. Souafi-Bensafi","doi":"10.1109/ICDAR.2001.953841","DOIUrl":null,"url":null,"abstract":"This paper describes a statistical model for a document understanding system, which uses both text attributes and document layouts. Probabilistic relaxation is used as a recognition scheme to find the hierarchical structure of the logical layout. This approach, commonly used for pixels classification in image analysis, can be applied to classify text blocks into logical classes according to local compatibility with other neighboring blocks at different hierarchical levels. It provides a logical layout that is globally compatible with the training model. We have tested this approach on reading tables of contents of periodicals for documents indexing. Probabilistic relaxation has interesting properties like high-speed training and the 'a priori' recognition rate, which provides the consistency of the model according to the features used, and the samples chosen among the training set.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Sixth International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2001.953841","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
Abstract
This paper describes a statistical model for a document understanding system, which uses both text attributes and document layouts. Probabilistic relaxation is used as a recognition scheme to find the hierarchical structure of the logical layout. This approach, commonly used for pixels classification in image analysis, can be applied to classify text blocks into logical classes according to local compatibility with other neighboring blocks at different hierarchical levels. It provides a logical layout that is globally compatible with the training model. We have tested this approach on reading tables of contents of periodicals for documents indexing. Probabilistic relaxation has interesting properties like high-speed training and the 'a priori' recognition rate, which provides the consistency of the model according to the features used, and the samples chosen among the training set.