首页 > 最新文献

Proceedings of the 7th International Workshop on Historical Document Imaging and Processing最新文献

英文 中文
Drawing the Line: A Dual Evaluation Approach for Shaping Ground Truth in Image Retrieval Using Rich Visual Embeddings of Historical Images 绘制界限:利用历史图像的丰富视觉嵌入在图像检索中塑造地面真相的双重评估方法
David Tschirschwitz, Franziska Klemstein, Henning Schmidgen, V. Rodehorst
Images contain rich visual information that can be interpreted in multiple ways, each of which may be correct. However, current retrieval systems in computer vision predominantly focus on content-based and instance-based image retrieval, while other facets relevant to the querying person, such as temporal aspects or image syntax, are often neglected. This study addresses this issue by examining a retrieval system in a domain-specific document processing pipeline. A retrieval evaluation dataset, which focuses on the aforementioned tasks, is utilized to compare different promising approaches. Subsequently, a qualitative study is conducted to compare the usability of the retrieval results with their corresponding metrics. This comparison reveals a discrepancy between the best-performing model by performance metrics and the model that provides better results for answering potential research questions. While current models such as DINO and CLIP demonstrate their ability to retrieve images based on their semantics and contents with high reliability, they exhibit limited capabilities in retrieving other facets.
图像包含丰富的视觉信息,可以用多种方式解释,每种方式都可能是正确的。然而,目前的计算机视觉检索系统主要集中在基于内容和基于实例的图像检索上,而与查询人相关的其他方面,如时间方面或图像语法,往往被忽视。本研究通过检查特定领域文档处理管道中的检索系统来解决这个问题。检索评估数据集侧重于上述任务,用于比较不同的有前途的方法。随后,进行了定性研究,将检索结果的可用性与其相应的指标进行比较。这种比较揭示了性能指标表现最好的模型与为回答潜在研究问题提供更好结果的模型之间的差异。虽然目前的模型(如DINO和CLIP)展示了它们基于语义和内容高可靠性检索图像的能力,但它们在检索其他方面的能力有限。
{"title":"Drawing the Line: A Dual Evaluation Approach for Shaping Ground Truth in Image Retrieval Using Rich Visual Embeddings of Historical Images","authors":"David Tschirschwitz, Franziska Klemstein, Henning Schmidgen, V. Rodehorst","doi":"10.1145/3604951.3605524","DOIUrl":"https://doi.org/10.1145/3604951.3605524","url":null,"abstract":"Images contain rich visual information that can be interpreted in multiple ways, each of which may be correct. However, current retrieval systems in computer vision predominantly focus on content-based and instance-based image retrieval, while other facets relevant to the querying person, such as temporal aspects or image syntax, are often neglected. This study addresses this issue by examining a retrieval system in a domain-specific document processing pipeline. A retrieval evaluation dataset, which focuses on the aforementioned tasks, is utilized to compare different promising approaches. Subsequently, a qualitative study is conducted to compare the usability of the retrieval results with their corresponding metrics. This comparison reveals a discrepancy between the best-performing model by performance metrics and the model that provides better results for answering potential research questions. While current models such as DINO and CLIP demonstrate their ability to retrieve images based on their semantics and contents with high reliability, they exhibit limited capabilities in retrieving other facets.","PeriodicalId":375632,"journal":{"name":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124833123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PapyTwin net: a Twin network for Greek letters detection on ancient Papyri PapyTwin网:用于古莎草纸上希腊字母检测的Twin网络
Manh-Tu Vu, M. Beurton-Aimar
Ancient historical documents, such as Greek papyri, are crucial for understanding human knowledge and history. However, transcribing and translating these documents manually is a difficult and time-consuming process. As a result, an automatic algorithm is required to identify and interpret the writing on these ancient historical documents with accuracy and dependability. In this work, we introduce PapyTwin, a deep neural network which consists of two subnetworks, the first and second twin, that cooperate together to address the challenge of detecting Greek letters on ancient papyri. While the first twin network aims at uniforming the letter size across the images, the second twin network predicts letter bounding boxes based on these letter-uniformed images. Experiment results show that our proposing approach outperformed the baseline model by a large margin, suggesting that uniform letter size across images is a crucial factor in enhancing the performance of detection networks on ancient documents such as Greek papyri.
古代历史文献,如希腊纸莎草纸,对理解人类知识和历史至关重要。然而,手工抄录和翻译这些文件是一个困难且耗时的过程。因此,需要一种自动算法来准确可靠地识别和解释这些古代历史文献上的文字。在这项工作中,我们介绍了PapyTwin,这是一个由两个子网络组成的深度神经网络,第一个和第二个双胞胎,它们一起合作来解决检测古代纸莎草纸上的希腊字母的挑战。第一个孪生网络的目标是统一图像上的字母大小,而第二个孪生网络则基于这些统一的字母图像来预测字母边界框。实验结果表明,我们提出的方法在很大程度上优于基线模型,这表明图像之间统一的字母大小是提高古代文件(如希腊纸莎草纸)检测网络性能的关键因素。
{"title":"PapyTwin net: a Twin network for Greek letters detection on ancient Papyri","authors":"Manh-Tu Vu, M. Beurton-Aimar","doi":"10.1145/3604951.3605522","DOIUrl":"https://doi.org/10.1145/3604951.3605522","url":null,"abstract":"Ancient historical documents, such as Greek papyri, are crucial for understanding human knowledge and history. However, transcribing and translating these documents manually is a difficult and time-consuming process. As a result, an automatic algorithm is required to identify and interpret the writing on these ancient historical documents with accuracy and dependability. In this work, we introduce PapyTwin, a deep neural network which consists of two subnetworks, the first and second twin, that cooperate together to address the challenge of detecting Greek letters on ancient papyri. While the first twin network aims at uniforming the letter size across the images, the second twin network predicts letter bounding boxes based on these letter-uniformed images. Experiment results show that our proposing approach outperformed the baseline model by a large margin, suggesting that uniform letter size across images is a crucial factor in enhancing the performance of detection networks on ancient documents such as Greek papyri.","PeriodicalId":375632,"journal":{"name":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","volume":"75 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121027722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Document Layout Analysis with Deep Learning and Heuristics 基于深度学习和启发式的文档布局分析
V. Rezanezhad, Konstantin Baierer, Mike Gerber, Kai Labusch, Clemens Neudecker
The automated yet highly accurate layout analysis (segmentation) of historical document images remains a key challenge for the improvement of Optical Character Recognition (OCR) results. But historical documents exhibit a wide array of features that disturb layout analysis, such as multiple columns, drop capitals and illustrations, skewed or curved text lines, noise, annotations, etc. We present a document layout analysis (DLA) system for historical documents implemented by pixel-wise segmentation using convolutional neural networks. In addition, heuristic methods are applied to detect marginals and to determine the reading order of text regions. Our system can detect more layout classes (e.g. initials, marginals) and achieves higher accuracy than competitive approaches. We describe the algorithm, the different models and how they were trained and discuss our results in comparison to the state-of-the-art on the basis of three historical document datasets.
历史文档图像的自动而高精度的布局分析(分割)仍然是提高光学字符识别(OCR)结果的关键挑战。但是,历史文献显示了一系列干扰布局分析的特征,例如多栏、大写字母和插图、歪斜或弯曲的文本行、噪音、注释等。我们提出了一个使用卷积神经网络实现逐像素分割的历史文档布局分析(DLA)系统。此外,还采用启发式方法检测文本区域的边缘和确定文本区域的阅读顺序。我们的系统可以检测更多的布局类(例如首字母、边距),并且比竞争对手的方法达到更高的准确性。我们描述了算法、不同的模型以及它们是如何训练的,并在三个历史文档数据集的基础上讨论了我们的结果与最先进的结果的比较。
{"title":"Document Layout Analysis with Deep Learning and Heuristics","authors":"V. Rezanezhad, Konstantin Baierer, Mike Gerber, Kai Labusch, Clemens Neudecker","doi":"10.1145/3604951.3605513","DOIUrl":"https://doi.org/10.1145/3604951.3605513","url":null,"abstract":"The automated yet highly accurate layout analysis (segmentation) of historical document images remains a key challenge for the improvement of Optical Character Recognition (OCR) results. But historical documents exhibit a wide array of features that disturb layout analysis, such as multiple columns, drop capitals and illustrations, skewed or curved text lines, noise, annotations, etc. We present a document layout analysis (DLA) system for historical documents implemented by pixel-wise segmentation using convolutional neural networks. In addition, heuristic methods are applied to detect marginals and to determine the reading order of text regions. Our system can detect more layout classes (e.g. initials, marginals) and achieves higher accuracy than competitive approaches. We describe the algorithm, the different models and how they were trained and discuss our results in comparison to the state-of-the-art on the basis of three historical document datasets.","PeriodicalId":375632,"journal":{"name":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126450507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Laypa: A Novel Framework for Applying Segmentation Networks to Historical Documents Laypa:一个将分割网络应用于历史文献的新框架
Stefan Klut, Rutger van Koert, R. Sluijter
We present novel software to process scans of historical documents to extract their layout information. We do this using a ResNet backbone with a feature pyramid head. We extract region information directly into PageXML. For baseline extraction, we use a two stage processing approach. The software has been applied successfully to several projects. The results show the feasibility to automatically label text lines and regions in historical documents.
我们提出了一种新的软件来处理历史文档的扫描,以提取其布局信息。我们使用带有特征金字塔头的ResNet主干来实现这一点。我们将区域信息直接提取到PageXML中。对于基线提取,我们使用两阶段处理方法。该软件已成功应用于多个工程中。结果表明,在历史文献中实现文本行和区域自动标注是可行的。
{"title":"Laypa: A Novel Framework for Applying Segmentation Networks to Historical Documents","authors":"Stefan Klut, Rutger van Koert, R. Sluijter","doi":"10.1145/3604951.3605520","DOIUrl":"https://doi.org/10.1145/3604951.3605520","url":null,"abstract":"We present novel software to process scans of historical documents to extract their layout information. We do this using a ResNet backbone with a feature pyramid head. We extract region information directly into PageXML. For baseline extraction, we use a two stage processing approach. The software has been applied successfully to several projects. The results show the feasibility to automatically label text lines and regions in historical documents.","PeriodicalId":375632,"journal":{"name":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","volume":"65 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116468688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature Mixing for Writer Retrieval and Identification on Papyri Fragments 基于特征混合的纸莎草碎片作者检索与鉴定
Marco Peer, Robert Sablatnig
This paper proposes a deep-learning-based approach to writer retrieval and identification for papyri, with a focus on identifying fragments associated with a specific writer and those corresponding to the same image. We present a novel neural network architecture that combines a residual backbone with a feature mixing stage to improve retrieval performance, and the final descriptor is derived from a projection layer. The methodology is evaluated on two benchmarks: PapyRow, where we achieve a mAP of 26.6 % and 24.9 % on writer and page retrieval, and HisFragIR20, showing state-of-the-art performance (44.0 % and 29.3 % mAP). Furthermore, our network has an accuracy of 28.7 % for writer identification. Additionally, we conduct experiments on the influence of two binarization techniques on fragments and show that binarizing does not enhance performance. Our code and models are available to the community at https://github.com/marco-peer/hip23.
本文提出了一种基于深度学习的莎草纸作者检索和识别方法,重点是识别与特定作者相关的碎片和与同一图像对应的碎片。我们提出了一种新的神经网络架构,将残差主干与特征混合阶段相结合以提高检索性能,并从投影层派生出最终描述子。该方法在两个基准上进行了评估:PapyRow,我们在作者和页面检索方面实现了26.6%和24.9%的mAP, HisFragIR20,显示了最先进的性能(44.0%和29.3% mAP)。此外,我们的网络在作者识别方面的准确率为28.7%。此外,我们对两种二值化技术对片段的影响进行了实验,结果表明二值化并不能提高性能。我们的代码和模型可在https://github.com/marco-peer/hip23上向社区提供。
{"title":"Feature Mixing for Writer Retrieval and Identification on Papyri Fragments","authors":"Marco Peer, Robert Sablatnig","doi":"10.1145/3604951.3605515","DOIUrl":"https://doi.org/10.1145/3604951.3605515","url":null,"abstract":"This paper proposes a deep-learning-based approach to writer retrieval and identification for papyri, with a focus on identifying fragments associated with a specific writer and those corresponding to the same image. We present a novel neural network architecture that combines a residual backbone with a feature mixing stage to improve retrieval performance, and the final descriptor is derived from a projection layer. The methodology is evaluated on two benchmarks: PapyRow, where we achieve a mAP of 26.6 % and 24.9 % on writer and page retrieval, and HisFragIR20, showing state-of-the-art performance (44.0 % and 29.3 % mAP). Furthermore, our network has an accuracy of 28.7 % for writer identification. Additionally, we conduct experiments on the influence of two binarization techniques on fragments and show that binarizing does not enhance performance. Our code and models are available to the community at https://github.com/marco-peer/hip23.","PeriodicalId":375632,"journal":{"name":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130591248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Handwritten Text Recognition from Crowdsourced Annotations 来自众包注释的手写文本识别
Solène Tarride, Tristan Faine, Mélodie Boillet, H. Mouchère, Christopher Kermorvant
In this paper, we explore different ways of training a model for handwritten text recognition when multiple imperfect or noisy transcriptions are available. We consider various training configurations, such as selecting a single transcription, retaining all transcriptions, or computing an aggregated transcription from all available annotations. In addition, we evaluate the impact of quality-based data selection, where samples with low agreement are removed from the training set. Our experiments are carried out on municipal registers of the city of Belfort (France) written between 1790 and 1946. The results show that computing a consensus transcription or training on multiple transcriptions are good alternatives. However, selecting training samples based on the degree of agreement between annotators introduces a bias in the training data and does not improve the results. Our dataset is publicly available on Zenodo.
在本文中,我们探索了在多个不完美或有噪声的转录文本可用时训练手写文本识别模型的不同方法。我们考虑了各种训练配置,例如选择单个转录,保留所有转录,或者从所有可用的注释中计算聚合转录。此外,我们评估了基于质量的数据选择的影响,其中一致性较低的样本从训练集中删除。我们的实验是在1790年至1946年间写的贝尔福市(法国)的市政登记册上进行的。结果表明,计算一致转录或对多个转录进行训练是很好的选择。然而,基于注释者之间的一致程度来选择训练样本会在训练数据中引入偏差,并且不会改善结果。我们的数据集在Zenodo上是公开的。
{"title":"Handwritten Text Recognition from Crowdsourced Annotations","authors":"Solène Tarride, Tristan Faine, Mélodie Boillet, H. Mouchère, Christopher Kermorvant","doi":"10.1145/3604951.3605517","DOIUrl":"https://doi.org/10.1145/3604951.3605517","url":null,"abstract":"In this paper, we explore different ways of training a model for handwritten text recognition when multiple imperfect or noisy transcriptions are available. We consider various training configurations, such as selecting a single transcription, retaining all transcriptions, or computing an aggregated transcription from all available annotations. In addition, we evaluate the impact of quality-based data selection, where samples with low agreement are removed from the training set. Our experiments are carried out on municipal registers of the city of Belfort (France) written between 1790 and 1946. The results show that computing a consensus transcription or training on multiple transcriptions are good alternatives. However, selecting training samples based on the degree of agreement between annotators introduces a bias in the training data and does not improve the results. Our dataset is publicly available on Zenodo.","PeriodicalId":375632,"journal":{"name":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121626613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DocLangID: Improving Few-Shot Training to Identify the Language of Historical Documents DocLangID:改进历史文献语言识别的几次训练
Furkan Simsek, Brian Pfitzmann, Hendrik Raetz, Jona Otholt, Haojin Yang, C. Meinel
In this work, we propose DocLangID, a transfer learning approach to identify the language of unlabeled historical documents. We achieve this by first leveraging labeled data from a different but related domain of historical documents. Secondly, we implement a distance-based few-shot learning approach to adapt a convolutional neural network to new languages of the unlabeled dataset. By introducing small amounts of manually labeled examples from the set of unlabeled images, our feature extractor develops a better adaptability towards new and different data distributions of historical documents. We show that such a model can be effectively fine-tuned for the unlabeled set of images by only reusing the same few-shot examples. We showcase our work across 10 languages that mostly use the Latin script. Our experiments on historical documents demonstrate that our combined approach improves the language identification performance, achieving 74% recognition accuracy on the four unseen languages of the unlabeled dataset.
在这项工作中,我们提出了DocLangID,一种迁移学习方法来识别未标记的历史文档的语言。我们首先利用来自不同但相关的历史文档领域的标记数据来实现这一点。其次,我们实现了一种基于距离的少镜头学习方法,使卷积神经网络适应未标记数据集的新语言。通过从未标记的图像集中引入少量手动标记的示例,我们的特征提取器对新的和不同的历史文档数据分布具有更好的适应性。我们证明了这样的模型可以通过重复使用相同的少数镜头样本来有效地对未标记的图像集进行微调。我们展示了10种语言的作品,这些语言主要使用拉丁文字。我们在历史文档上的实验表明,我们的组合方法提高了语言识别性能,在未标记数据集的四种未见过的语言上实现了74%的识别准确率。
{"title":"DocLangID: Improving Few-Shot Training to Identify the Language of Historical Documents","authors":"Furkan Simsek, Brian Pfitzmann, Hendrik Raetz, Jona Otholt, Haojin Yang, C. Meinel","doi":"10.1145/3604951.3605512","DOIUrl":"https://doi.org/10.1145/3604951.3605512","url":null,"abstract":"In this work, we propose DocLangID, a transfer learning approach to identify the language of unlabeled historical documents. We achieve this by first leveraging labeled data from a different but related domain of historical documents. Secondly, we implement a distance-based few-shot learning approach to adapt a convolutional neural network to new languages of the unlabeled dataset. By introducing small amounts of manually labeled examples from the set of unlabeled images, our feature extractor develops a better adaptability towards new and different data distributions of historical documents. We show that such a model can be effectively fine-tuned for the unlabeled set of images by only reusing the same few-shot examples. We showcase our work across 10 languages that mostly use the Latin script. Our experiments on historical documents demonstrate that our combined approach improves the language identification performance, achieving 74% recognition accuracy on the four unseen languages of the unlabeled dataset.","PeriodicalId":375632,"journal":{"name":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130911370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis DIVA-DAF:一个用于历史文档图像分析的深度学习框架
Lars Vögtlin, Anna Scius-Bertrand, Paul Maergner, Andreas Fischer, R. Ingold
Deep learning methods have shown strong performance in solving tasks for historical document image analysis. However, despite current libraries and frameworks, programming an experiment or a set of experiments and executing them can be time-consuming. This is why we propose an open-source deep learning framework, DIVA-DAF, which is based on PyTorch Lightning and specifically designed for historical document analysis. Pre-implemented tasks such as segmentation and classification can be easily used or customized. It is also easy to create one’s own tasks with the benefit of powerful modules for loading data, even large data sets, and different forms of ground truth. The applications conducted have demonstrated time savings for the programming of a document analysis task, as well as for different scenarios such as pre-training or changing the architecture. Thanks to its data module, the framework also allows to reduce the time of model training significantly.
深度学习方法在解决历史文献图像分析任务方面表现出了很强的性能。然而,尽管有当前的库和框架,编写一个或一组实验并执行它们可能很耗时。这就是为什么我们提出了一个开源的深度学习框架,DIVA-DAF,它基于PyTorch闪电,专门为历史文档分析而设计。预先实现的任务,如分割和分类,可以很容易地使用或定制。它也很容易创建自己的任务,因为它有强大的模块来加载数据,甚至是大型数据集,以及不同形式的ground truth。所执行的应用程序已经证明了为文档分析任务的编程以及不同的场景(如预训练或更改体系结构)节省了时间。由于其数据模块,该框架还可以大大减少模型训练的时间。
{"title":"DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis","authors":"Lars Vögtlin, Anna Scius-Bertrand, Paul Maergner, Andreas Fischer, R. Ingold","doi":"10.1145/3604951.3605511","DOIUrl":"https://doi.org/10.1145/3604951.3605511","url":null,"abstract":"Deep learning methods have shown strong performance in solving tasks for historical document image analysis. However, despite current libraries and frameworks, programming an experiment or a set of experiments and executing them can be time-consuming. This is why we propose an open-source deep learning framework, DIVA-DAF, which is based on PyTorch Lightning and specifically designed for historical document analysis. Pre-implemented tasks such as segmentation and classification can be easily used or customized. It is also easy to create one’s own tasks with the benefit of powerful modules for loading data, even large data sets, and different forms of ground truth. The applications conducted have demonstrated time savings for the programming of a document analysis task, as well as for different scenarios such as pre-training or changing the architecture. Thanks to its data module, the framework also allows to reduce the time of model training significantly.","PeriodicalId":375632,"journal":{"name":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115552379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Proceedings of the 7th International Workshop on Historical Document Imaging and Processing 第七届历史文献成像与处理国际研讨会论文集
{"title":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","authors":"","doi":"10.1145/3604951","DOIUrl":"https://doi.org/10.1145/3604951","url":null,"abstract":"","PeriodicalId":375632,"journal":{"name":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","volume":"285 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124550845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the 7th International Workshop on Historical Document Imaging and Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1