S. Ferilli, D. Grieco, Domenico Redavid, F. Esposito
{"title":"Abstract argumentation for reading order detection","authors":"S. Ferilli, D. Grieco, Domenico Redavid, F. Esposito","doi":"10.1145/2644866.2644883","DOIUrl":null,"url":null,"abstract":"Detecting the reading order among the layout components of a document's page is fundamental to ensure effectiveness or even applicability of subsequent content extraction steps. While in single-column documents the reading flow can be straightforwardly determined, in more complex documents the task may become very hard. This paper proposes an automatic strategy for identifying the correct reading order of a document page's components based on abstract argumentation. The technique is unsupervised, and works on any kind of document based only on general assumptions about how humans behave when reading documents. Experimental results show that it is effective in more complex cases, and requires less background knowledge, than previous solutions that have been proposed in the literature.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"47 1","pages":"45-48"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2644866.2644883","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Detecting the reading order among the layout components of a document's page is fundamental to ensure effectiveness or even applicability of subsequent content extraction steps. While in single-column documents the reading flow can be straightforwardly determined, in more complex documents the task may become very hard. This paper proposes an automatic strategy for identifying the correct reading order of a document page's components based on abstract argumentation. The technique is unsupervised, and works on any kind of document based only on general assumptions about how humans behave when reading documents. Experimental results show that it is effective in more complex cases, and requires less background knowledge, than previous solutions that have been proposed in the literature.