Johan Zenk, Florian Kordon, Martin Mayr, Mathias Seuret, V. Christlein
{"title":"历史文献文字、字型、位置分类的自监督学习研究","authors":"Johan Zenk, Florian Kordon, Martin Mayr, Mathias Seuret, V. Christlein","doi":"10.1145/3604951.3605519","DOIUrl":null,"url":null,"abstract":"In the context of automated classification of historical documents, we investigate three contemporary self-supervised learning (SSL) techniques (SimSiam, Dino, and VICReg) for the pre-training of three different document analysis tasks, namely script-type, font-type, and location classification. Our study draws samples from multiple datasets that contain images of manuscripts, prints, charters, and letters. The representations derived via pre-text training are taken as inputs for k-NN classification and a parametric linear classifier. The latter is placed atop the pre-trained backbones to enable fine-tuning of the entire network to further improve the classification by exploiting task-specific label data. The network’s final performance is assessed via independent test sets obtained from the ICDAR2021 Competition on Historical Document Classification. We empirically show that representations learned with SSL are significantly better suited for subsequent document classification than features generated by commonly used transfer learning on ImageNet.","PeriodicalId":375632,"journal":{"name":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Investigations on Self-supervised Learning for Script-, Font-type, and Location Classification on Historical Documents\",\"authors\":\"Johan Zenk, Florian Kordon, Martin Mayr, Mathias Seuret, V. Christlein\",\"doi\":\"10.1145/3604951.3605519\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the context of automated classification of historical documents, we investigate three contemporary self-supervised learning (SSL) techniques (SimSiam, Dino, and VICReg) for the pre-training of three different document analysis tasks, namely script-type, font-type, and location classification. Our study draws samples from multiple datasets that contain images of manuscripts, prints, charters, and letters. The representations derived via pre-text training are taken as inputs for k-NN classification and a parametric linear classifier. The latter is placed atop the pre-trained backbones to enable fine-tuning of the entire network to further improve the classification by exploiting task-specific label data. The network’s final performance is assessed via independent test sets obtained from the ICDAR2021 Competition on Historical Document Classification. We empirically show that representations learned with SSL are significantly better suited for subsequent document classification than features generated by commonly used transfer learning on ImageNet.\",\"PeriodicalId\":375632,\"journal\":{\"name\":\"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3604951.3605519\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3604951.3605519","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Investigations on Self-supervised Learning for Script-, Font-type, and Location Classification on Historical Documents
In the context of automated classification of historical documents, we investigate three contemporary self-supervised learning (SSL) techniques (SimSiam, Dino, and VICReg) for the pre-training of three different document analysis tasks, namely script-type, font-type, and location classification. Our study draws samples from multiple datasets that contain images of manuscripts, prints, charters, and letters. The representations derived via pre-text training are taken as inputs for k-NN classification and a parametric linear classifier. The latter is placed atop the pre-trained backbones to enable fine-tuning of the entire network to further improve the classification by exploiting task-specific label data. The network’s final performance is assessed via independent test sets obtained from the ICDAR2021 Competition on Historical Document Classification. We empirically show that representations learned with SSL are significantly better suited for subsequent document classification than features generated by commonly used transfer learning on ImageNet.