{"title":"撕裂文件的判读分析与分类","authors":"Markus Diem, Florian Kleber, Robert Sablatnig","doi":"10.1145/2644866.2644876","DOIUrl":null,"url":null,"abstract":"A ruling classification is presented in this paper. In contrast to state-of-the-art methods which focus on ruling line removal, ruling lines are analyzed for document clustering in the context of document snippet reassembling. First, a background patch is extracted from a snippet at a position which minimizes the inscribed content. A novel Fourier feature is then computed on the image patch. The classification into void, lined and checked is carried out using Support Vector Machines. Finally, an accurate line localization is performed by means of projection profiles and robust line fitting. The ruling classification achieves an F-score of 0.987 evaluated on a dataset comprising real world document snippets. In addition the line removal was evaluated on a synthetically generated dataset where an F-score of 0.931 is achieved. This dataset is made publicly available so as to allow for benchmarking.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"8 1","pages":"63-72"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ruling analysis and classification of torn documents\",\"authors\":\"Markus Diem, Florian Kleber, Robert Sablatnig\",\"doi\":\"10.1145/2644866.2644876\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A ruling classification is presented in this paper. In contrast to state-of-the-art methods which focus on ruling line removal, ruling lines are analyzed for document clustering in the context of document snippet reassembling. First, a background patch is extracted from a snippet at a position which minimizes the inscribed content. A novel Fourier feature is then computed on the image patch. The classification into void, lined and checked is carried out using Support Vector Machines. Finally, an accurate line localization is performed by means of projection profiles and robust line fitting. The ruling classification achieves an F-score of 0.987 evaluated on a dataset comprising real world document snippets. In addition the line removal was evaluated on a synthetically generated dataset where an F-score of 0.931 is achieved. This dataset is made publicly available so as to allow for benchmarking.\",\"PeriodicalId\":91385,\"journal\":{\"name\":\"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering\",\"volume\":\"8 1\",\"pages\":\"63-72\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2644866.2644876\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2644866.2644876","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Ruling analysis and classification of torn documents
A ruling classification is presented in this paper. In contrast to state-of-the-art methods which focus on ruling line removal, ruling lines are analyzed for document clustering in the context of document snippet reassembling. First, a background patch is extracted from a snippet at a position which minimizes the inscribed content. A novel Fourier feature is then computed on the image patch. The classification into void, lined and checked is carried out using Support Vector Machines. Finally, an accurate line localization is performed by means of projection profiles and robust line fitting. The ruling classification achieves an F-score of 0.987 evaluated on a dataset comprising real world document snippets. In addition the line removal was evaluated on a synthetically generated dataset where an F-score of 0.931 is achieved. This dataset is made publicly available so as to allow for benchmarking.