{"title":"Papy-S-Net: A Siamese Network to match papyrus fragments","authors":"A. Pirrone, M. Beurton-Aimar, N. Journet","doi":"10.1145/3352631.3352646","DOIUrl":null,"url":null,"abstract":"Like all heritage documents, papyri are the subject of an in-depth study by scientists. While large volumes of papyri have been digitized and indexed, many are still waiting to be so. It takes time to study a papyrus mainly because they are rarely available in one piece. Papyrologists must review a large number of fragments, find those that go together and then assemble them to finally analyze the text. Unfortunately, some fragments no longer exist. It is then a time consuming puzzle to solve, where not all the pieces are available and where fragments boundaries are not perfectly matching.AB@This article describes a method to help Papyrologists save time by helping them to solve this complex puzzle. We provide a solution where an expert use a fragment as a request element and get fragments that belong to the same papyrus. The main contribution is the proposal of a deep siamese network architecture, called Papy-S-Net for Papyrus-Siamese-Network, designed for papyri fragment matching. This network is trained and validated on 500 papyrus fragments approx. We compare the results of Papy-S-Net with a previous work of Koch et al. [14] which proposes a siamese network to match written symbols. In order to train and validate the network, we proceed to the extraction of patches from the papyrus fragments to create our ground truth. Papy-S-Net outperforms Koch et al.'s network. We also evaluate our approach on a real use case on which Papy-S-Net achieves 79% of correct matches.","PeriodicalId":174440,"journal":{"name":"Proceedings of the 5th International Workshop on Historical Document Imaging and Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Workshop on Historical Document Imaging and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3352631.3352646","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Like all heritage documents, papyri are the subject of an in-depth study by scientists. While large volumes of papyri have been digitized and indexed, many are still waiting to be so. It takes time to study a papyrus mainly because they are rarely available in one piece. Papyrologists must review a large number of fragments, find those that go together and then assemble them to finally analyze the text. Unfortunately, some fragments no longer exist. It is then a time consuming puzzle to solve, where not all the pieces are available and where fragments boundaries are not perfectly matching.AB@This article describes a method to help Papyrologists save time by helping them to solve this complex puzzle. We provide a solution where an expert use a fragment as a request element and get fragments that belong to the same papyrus. The main contribution is the proposal of a deep siamese network architecture, called Papy-S-Net for Papyrus-Siamese-Network, designed for papyri fragment matching. This network is trained and validated on 500 papyrus fragments approx. We compare the results of Papy-S-Net with a previous work of Koch et al. [14] which proposes a siamese network to match written symbols. In order to train and validate the network, we proceed to the extraction of patches from the papyrus fragments to create our ground truth. Papy-S-Net outperforms Koch et al.'s network. We also evaluate our approach on a real use case on which Papy-S-Net achieves 79% of correct matches.
像所有的遗产文献一样,纸莎草纸也是科学家们深入研究的对象。虽然大量的纸莎草纸已经被数字化和索引,但许多仍在等待。研究纸莎草纸需要时间,主要是因为它们很少是完整的。纸莎草学家必须审查大量的碎片,找到那些在一起的碎片,然后将它们组合起来,最后分析文本。不幸的是,有些片段已不复存在。这是一个耗时的谜题,因为不是所有的碎片都可用,碎片的边界也不完全匹配。AB@This文章介绍了一种方法,帮助纸莎草学家节省时间,帮助他们解决这个复杂的难题。我们提供了一个解决方案,专家使用片段作为请求元素,并获得属于同一纸莎草的片段。主要贡献是提出了一种深度暹罗网络架构,称为Papy-S-Net for Papyrus-Siamese-Network,专为纸莎草碎片匹配而设计。该网络在大约500个莎草纸碎片上进行了训练和验证。我们将Papy-S-Net的结果与Koch等人先前的工作进行了比较,后者提出了一个连体网络来匹配书写符号。为了训练和验证网络,我们继续从莎草纸碎片中提取补丁来创建我们的地面真相。Papy-S-Net优于Koch等人的网络。我们还在一个真实的用例中评估了我们的方法,在这个用例中,Papy-S-Net实现了79%的正确匹配。