Advances on the Transcription of Historical Manuscripts based on Multimodality, Interactivity and Crowdsourcing

IberSPEECH Conference Pub Date : 2018-11-21 DOI:10.4995/Thesis/10251/86137

Emilio Granell, C. Martínez-Hinarejos, Verónica Romero

{"title":"Advances on the Transcription of Historical Manuscripts based on Multimodality, Interactivity and Crowdsourcing","authors":"Emilio Granell, C. Martínez-Hinarejos, Verónica Romero","doi":"10.4995/Thesis/10251/86137","DOIUrl":null,"url":null,"abstract":"Natural Language Processing (NLP) is an interdisciplinary research field of Computer Science, Linguistics, and Pattern Recognition that studies, among others, the use of human natural languages in Human-Computer Interaction (HCI). Most of NLP research tasks can be applied for solving real-world problems. This is the case of natural language recognition and natural language translation, that can be used for building automatic systems for document transcription and document translation. \nRegarding digitalised handwritten text documents, transcription is used to obtain an easy digital access to the contents, since simple image digitalisation only provides, in most cases, search by image and not by linguistic contents (keywords, expressions, syntactic or semantic categories). Transcription is even more important in historical manuscripts, since most of these documents are unique and the preservation of their contents is crucial for cultural and historical reasons. \nThe transcription of historical manuscripts is usually done by paleographers, who are experts on ancient script and vocabulary. Recently, Handwritten Text Recognition (HTR) has become a common tool for assisting paleographers in their task, by providing a draft transcription that they may amend with more or less sophisticated methods. This draft transcription is useful when it presents an error rate low enough to make the amending process more comfortable than a complete transcription from scratch. Thus, obtaining a draft transcription with an acceptable low error rate is crucial to have this NLP technology incorporated into the transcription process. \nThe work described in this thesis is focused on the improvement of the draft transcription offered by an HTR system, with the aim of reducing the effort made by paleographers for obtaining the actual transcription on digitalised historical manuscripts. \nThis problem is faced from three different, but complementary, scenarios: · Multimodality: The use of HTR systems allow paleographers to speed up the manual transcription process, since they are able to correct on a draft transcription. Another alternative is to obtain the draft transcription by dictating the contents to an Automatic Speech Recognition (ASR) system. When both sources (image and speech) are available, a multimodal combination is possible and an iterative process can be used in order to refine the final hypothesis. \n· Interactivity: The use of assistive technologies in the transcription process allows one to reduce the time and human effort required for obtaining the actual transcription, given that the assistive system and the palaeographer cooperate to generate a perfect transcription. \nMultimodal feedback can be used to provide the assistive system with additional sources of information by using signals that represent the whole same sequence of words to transcribe (e.g. a text image, and the speech of the dictation of the contents of this text image), or that represent just a word or character to correct (e.g. an on-line handwritten word). \n· Crowdsourcing: Open distributed collaboration emerges as a powerful tool for massive transcription at a relatively low cost, since the paleographer supervision effort may be dramatically reduced. Multimodal combination allows one to use the speech dictation of handwritten text lines in a multimodal crowdsourcing platform, where collaborators may provide their speech by using their own mobile device instead of using desktop or laptop computers, which makes it possible to recruit more collaborators.","PeriodicalId":115963,"journal":{"name":"IberSPEECH Conference","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IberSPEECH Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4995/Thesis/10251/86137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Natural Language Processing (NLP) is an interdisciplinary research field of Computer Science, Linguistics, and Pattern Recognition that studies, among others, the use of human natural languages in Human-Computer Interaction (HCI). Most of NLP research tasks can be applied for solving real-world problems. This is the case of natural language recognition and natural language translation, that can be used for building automatic systems for document transcription and document translation. Regarding digitalised handwritten text documents, transcription is used to obtain an easy digital access to the contents, since simple image digitalisation only provides, in most cases, search by image and not by linguistic contents (keywords, expressions, syntactic or semantic categories). Transcription is even more important in historical manuscripts, since most of these documents are unique and the preservation of their contents is crucial for cultural and historical reasons. The transcription of historical manuscripts is usually done by paleographers, who are experts on ancient script and vocabulary. Recently, Handwritten Text Recognition (HTR) has become a common tool for assisting paleographers in their task, by providing a draft transcription that they may amend with more or less sophisticated methods. This draft transcription is useful when it presents an error rate low enough to make the amending process more comfortable than a complete transcription from scratch. Thus, obtaining a draft transcription with an acceptable low error rate is crucial to have this NLP technology incorporated into the transcription process. The work described in this thesis is focused on the improvement of the draft transcription offered by an HTR system, with the aim of reducing the effort made by paleographers for obtaining the actual transcription on digitalised historical manuscripts. This problem is faced from three different, but complementary, scenarios: · Multimodality: The use of HTR systems allow paleographers to speed up the manual transcription process, since they are able to correct on a draft transcription. Another alternative is to obtain the draft transcription by dictating the contents to an Automatic Speech Recognition (ASR) system. When both sources (image and speech) are available, a multimodal combination is possible and an iterative process can be used in order to refine the final hypothesis. · Interactivity: The use of assistive technologies in the transcription process allows one to reduce the time and human effort required for obtaining the actual transcription, given that the assistive system and the palaeographer cooperate to generate a perfect transcription. Multimodal feedback can be used to provide the assistive system with additional sources of information by using signals that represent the whole same sequence of words to transcribe (e.g. a text image, and the speech of the dictation of the contents of this text image), or that represent just a word or character to correct (e.g. an on-line handwritten word). · Crowdsourcing: Open distributed collaboration emerges as a powerful tool for massive transcription at a relatively low cost, since the paleographer supervision effort may be dramatically reduced. Multimodal combination allows one to use the speech dictation of handwritten text lines in a multimodal crowdsourcing platform, where collaborators may provide their speech by using their own mobile device instead of using desktop or laptop computers, which makes it possible to recruit more collaborators.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于多模态、互动性和众包的历史抄写研究进展

自然语言处理(NLP)是计算机科学、语言学和模式识别的跨学科研究领域，主要研究人类自然语言在人机交互(HCI)中的应用。大多数NLP研究任务都可以应用于解决现实问题。这就是自然语言识别和自然语言翻译的例子，可以用来建立自动系统，用于文档转录和文档翻译。在数字化手写文本文件方面，由于简单的图像数字化在大多数情况下只提供按图像搜索，而不是按语言内容(关键字、表达式、句法或语义类别)搜索，因此转录是用来获得对内容的简单数字化访问。抄写在历史手稿中更为重要，因为大多数这些文件都是独一无二的，出于文化和历史原因，保存它们的内容至关重要。历史手稿的抄写通常由古文字学家完成，他们是古代文字和词汇方面的专家。最近，手写体文本识别(HTR)已经成为协助古抄写员完成任务的一种常用工具，通过提供抄写稿，他们可以用或多或少复杂的方法对其进行修改。当它呈现的错误率足够低，使修改过程比从头开始的完整转录更舒适时，这个草稿转录是有用的。因此，获得具有可接受的低错误率的草稿转录对于将这种NLP技术纳入转录过程至关重要。本文所描述的工作重点是改进HTR系统提供的草稿转录，目的是减少古文字工作者为获取数字化历史手稿的实际转录而付出的努力。这个问题可以从三种不同但互补的情况来面对:·多模式:使用HTR系统可以使古抄写员加快手动抄写过程，因为他们能够对草稿抄写进行纠正。另一种选择是通过向自动语音识别(ASR)系统口述内容来获得草稿转录。当两个来源(图像和语音)都可用时，多模态组合是可能的，并且可以使用迭代过程来改进最终假设。·互动性:在转录过程中使用辅助技术可以减少获得实际转录所需的时间和人力，因为辅助系统和古生物学家合作产生完美的转录。多模态反馈可用于为辅助系统提供额外的信息来源，通过使用代表整个相同单词序列的信号来转录(例如文本图像，以及该文本图像内容的听写语音)，或仅代表一个单词或字符来纠正(例如在线手写单词)。·众包:开放的分布式协作以相对较低的成本成为大规模转录的强大工具，因为古学家的监督工作可能会大大减少。多模态组合允许人们在多模态众包平台上使用手写文本行语音听写，合作者可以使用自己的移动设备而不是使用台式机或笔记本电脑提供他们的演讲，这使得招募更多的合作者成为可能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IberSPEECH Conference

自引率

0.00%

发文量