首页 > 最新文献

International journal of digital humanities最新文献

英文 中文
Assessing advanced handwritten text recognition engines for digitizing historical documents. 评估用于数字化历史文献的高级手写文本识别引擎。
Pub Date : 2025-01-01 Epub Date: 2025-05-12 DOI: 10.1007/s42803-025-00100-0
C A Romein, A Rabus, G Leifert, P B Ströbel

This study provides critical insights and evaluates the performance of state-of-the-art Handwritten Text Recognition (HTR) engines-PyLaia, HTR + , IDA, TrOCR-f, and Transkribus' proprietary Transformer-based "supermodel" Titan-to digitize historical documents. Using a diverse range of datasets that include different scripts, this research assesses each engine's accuracy and efficiency in handling multilingual content, complex styles, abbreviations, and historical orthography. Results indicate that, while all engines can be trained or fine-tuned to improve performance, Titan and TrOCR-f exhibit superior out-of-the-box capabilities for Latin-script documents. PyLaia, IDA, and HTR + excel in specific non-Latin scripts when specifically trained or fine-tuned. This study underscores the importance of training, fine-tuning, and integrating language models, providing critical insights for future advancements in HTR technology and its application in the digital humanities.

本研究提供了关键的见解,并评估了最先进的手写文本识别(HTR)引擎(pylaia, HTR +, IDA, TrOCR-f和Transkribus专有的基于transformer的“超模”titan)的性能,以数字化历史文档。使用包括不同脚本的各种数据集,本研究评估了每个引擎在处理多语言内容、复杂风格、缩写和历史正字法方面的准确性和效率。结果表明,虽然所有引擎都可以通过训练或微调来提高性能,但Titan和TrOCR-f在拉丁脚本文档方面表现出了卓越的开箱即用能力。PyLaia、IDA和HTR +在经过专门训练或微调后,在特定的非拉丁文字中表现出色。这项研究强调了培训、微调和整合语言模型的重要性,为HTR技术的未来发展及其在数字人文学科中的应用提供了重要的见解。
{"title":"Assessing advanced handwritten text recognition engines for digitizing historical documents.","authors":"C A Romein, A Rabus, G Leifert, P B Ströbel","doi":"10.1007/s42803-025-00100-0","DOIUrl":"https://doi.org/10.1007/s42803-025-00100-0","url":null,"abstract":"<p><p>This study provides critical insights and evaluates the performance of state-of-the-art Handwritten Text Recognition (HTR) engines-PyLaia, HTR + , IDA, TrOCR-f, and Transkribus' proprietary Transformer-based \"supermodel\" Titan-to digitize historical documents. Using a diverse range of datasets that include different scripts, this research assesses each engine's accuracy and efficiency in handling multilingual content, complex styles, abbreviations, and historical orthography. Results indicate that, while all engines can be trained or fine-tuned to improve performance, Titan and TrOCR-f exhibit superior out-of-the-box capabilities for Latin-script documents. PyLaia, IDA, and HTR + excel in specific non-Latin scripts when specifically trained or fine-tuned. This study underscores the importance of training, fine-tuning, and integrating language models, providing critical insights for future advancements in HTR technology and its application in the digital humanities.</p>","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"7 1","pages":"115-134"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12202554/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144531536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online cultural heritage as a social machine: a socio-technical approach to digital infrastructure and ecosystems. 作为社会机器的在线文化遗产:数字基础设施和生态系统的社会技术方法。
Pub Date : 2025-01-01 Epub Date: 2025-03-12 DOI: 10.1007/s42803-025-00097-6
Javier Pereda, Pip Willcox, Gustavo Candela, Alexander Sanchez, Patricia A Murrieta-Flores

The advent of digital technologies has profoundly transformed cultural and heritage sectors, providing new avenues for broader access and interactions with digital collections. This shift has enabled Online Cultural Heritage (OCH) to evolve into an extensive ecosystem. Given the complexity that emerges from these networks and stakeholders, it is crucial to develop a clearer understanding of the extensive terminology used in the sector and establish pathways to deconstruct this complexity. Therefore, this article's aim is threefold: 1) it examines how OCH ecosystems foster the ongoing reinterpretation and recontextualisation of cultural heritage collections through technologic innovations and the Web. In doing so, it highlights the relevance of policy development and the establishment of ethical frameworks that address both human and technical complexities of Cultural Heritage (CH) knowledge; 2) using the Open Archival Information System (OAIS) as a framework and its terminology, the article maps the workflows and socio-technical actors of the OCH ecosystem; and 3) the article applies Callon's Process of Translation, a methodology for understanding how socio-technical networks evolve and use it to critically deconstruct digital infrastructures in OCH. This methodology enables the contextualisation and reinterpretation of cultural narratives across digital platforms, both online and offline, underscoring the dynamic interplay between technology, human agency, and cultural context. We explore how OCH ecosystems and other infrastructural ecosystems aid in preserving and facilitating engagement with open knowledge and research, and function as complex networks of cultural institutions interconnected through knowledge infrastructures. Whilst the paper places the primary approach within UK infrastructures, it provides alternative perspectives from the Global South, particularly Latin America, to contrast and further illustrate a reflection on the current and future challenges behind a sustainable OCH ecosystem, its implications for further networks, and its potential as a model beyond the CH sector. Furthermore, this framework can become paramount to identifying obstacles and opportunities for digital infrastructures, establishing a nuanced understanding of OCH as a core infrastructural element in the generation of knowledge from digital collections or digital infrastructures around the world. Finally, we provide a glossary of terms to establish a common ground between the wide range of parties involved in OCH. CCS CONCEPTS • Digital libraries and archives • Information Integration • Cultural characteristics.

数字技术的出现深刻地改变了文化和遗产部门,为更广泛地获取和互动数字馆藏提供了新的途径。这种转变使在线文化遗产(OCH)发展成为一个广泛的生态系统。考虑到这些网络和利益相关者的复杂性,对该行业使用的广泛术语有更清晰的理解,并建立解构这种复杂性的途径至关重要。因此,本文的目的有三个:1)研究OCH生态系统如何通过技术创新和网络促进文化遗产收藏的重新诠释和重新语境化。在此过程中,它强调了政策制定和道德框架建立的相关性,以解决文化遗产(CH)知识的人文和技术复杂性;2)使用开放档案信息系统(OAIS)作为框架及其术语,本文绘制了OCH生态系统的工作流程和社会技术参与者;3)本文应用Callon的翻译过程,这是一种理解社会技术网络如何演变的方法,并使用它来批判性地解构OCH中的数字基础设施。这种方法使得跨数字平台(包括在线和离线)的文化叙事的语境化和重新解释成为可能,强调了技术、人类代理和文化背景之间的动态相互作用。我们探讨了OCH生态系统和其他基础设施生态系统如何帮助保护和促进对开放知识和研究的参与,并通过知识基础设施相互连接,作为复杂的文化机构网络发挥作用。虽然本文将主要方法放在英国的基础设施中,但它提供了来自全球南方,特别是拉丁美洲的另一种观点,以对比并进一步说明对可持续OCH生态系统背后当前和未来挑战的反思,其对进一步网络的影响,以及其作为CH行业之外的模型的潜力。此外,这一框架对于识别数字基础设施的障碍和机会至关重要,并将OCH作为从世界各地的数字馆藏或数字基础设施中生成知识的核心基础设施要素建立细致入微的理解。最后,我们提供了一个术语表,以便在OCH中涉及的广泛各方之间建立一个共同的基础。CCS概念•数字图书馆和档案•信息整合•文化特征。
{"title":"Online cultural heritage as a social machine: a socio-technical approach to digital infrastructure and ecosystems.","authors":"Javier Pereda, Pip Willcox, Gustavo Candela, Alexander Sanchez, Patricia A Murrieta-Flores","doi":"10.1007/s42803-025-00097-6","DOIUrl":"10.1007/s42803-025-00097-6","url":null,"abstract":"<p><p>The advent of digital technologies has profoundly transformed cultural and heritage sectors, providing new avenues for broader access and interactions with digital collections. This shift has enabled Online Cultural Heritage (OCH) to evolve into an extensive ecosystem. Given the complexity that emerges from these networks and stakeholders, it is crucial to develop a clearer understanding of the extensive terminology used in the sector and establish pathways to deconstruct this complexity. Therefore, this article's aim is threefold: 1) it examines how OCH ecosystems foster the ongoing reinterpretation and recontextualisation of cultural heritage collections through technologic innovations and the Web. In doing so, it highlights the relevance of policy development and the establishment of ethical frameworks that address both human and technical complexities of Cultural Heritage (CH) knowledge; 2) using the Open Archival Information System (OAIS) as a framework and its terminology, the article maps the workflows and socio-technical actors of the OCH ecosystem; and 3) the article applies Callon's Process of Translation, a methodology for understanding how socio-technical networks evolve and use it to critically deconstruct digital infrastructures in OCH. This methodology enables the contextualisation and reinterpretation of cultural narratives across digital platforms, both online and offline, underscoring the dynamic interplay between technology, human agency, and cultural context. We explore how OCH ecosystems and other infrastructural ecosystems aid in preserving and facilitating engagement with open knowledge and research, and function as complex networks of cultural institutions interconnected through knowledge infrastructures. Whilst the paper places the primary approach within UK infrastructures, it provides alternative perspectives from the Global South, particularly Latin America, to contrast and further illustrate a reflection on the current and future challenges behind a sustainable OCH ecosystem, its implications for further networks, and its potential as a model beyond the CH sector. Furthermore, this framework can become paramount to identifying obstacles and opportunities for digital infrastructures, establishing a nuanced understanding of OCH as a core infrastructural element in the generation of knowledge from digital collections or digital infrastructures around the world. Finally, we provide a glossary of terms to establish a common ground between the wide range of parties involved in OCH. CCS CONCEPTS • Digital libraries and archives • Information Integration • Cultural characteristics.</p>","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"7 1","pages":"39-69"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12202677/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144531537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RelChronVis: an interactive web application for visualizing the relative chronology of language changes RelChronVis:可视化语言变化相对年表的交互式网络应用程序
Pub Date : 2024-07-24 DOI: 10.1007/s42803-024-00086-1
Florian Wandl, Thilo H. K. Thelitz
{"title":"RelChronVis: an interactive web application for visualizing the relative chronology of language changes","authors":"Florian Wandl, Thilo H. K. Thelitz","doi":"10.1007/s42803-024-00086-1","DOIUrl":"https://doi.org/10.1007/s42803-024-00086-1","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"44 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141807749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DOLD: a digital platform for conducting online language experiments and surveys DOLD:开展在线语言实验和调查的数字平台
Pub Date : 2024-07-03 DOI: 10.1007/s42803-024-00085-2
Yik-Po Lai, Hin Tat Cheung
{"title":"DOLD: a digital platform for conducting online language experiments and surveys","authors":"Yik-Po Lai, Hin Tat Cheung","doi":"10.1007/s42803-024-00085-2","DOIUrl":"https://doi.org/10.1007/s42803-024-00085-2","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"70 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141683123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Committing to reproducibility and explainability: using Git as a research journal 更正:致力于可重现性和可解释性:将 Git 用作研究期刊
Pub Date : 2024-02-06 DOI: 10.1007/s42803-024-00084-3
Samuel J. Huskey
{"title":"Correction: Committing to reproducibility and explainability: using Git as a research journal","authors":"Samuel J. Huskey","doi":"10.1007/s42803-024-00084-3","DOIUrl":"https://doi.org/10.1007/s42803-024-00084-3","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139802443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Committing to reproducibility and explainability: using Git as a research journal 更正:致力于可重现性和可解释性:将 Git 用作研究期刊
Pub Date : 2024-02-06 DOI: 10.1007/s42803-024-00084-3
Samuel J. Huskey
{"title":"Correction: Committing to reproducibility and explainability: using Git as a research journal","authors":"Samuel J. Huskey","doi":"10.1007/s42803-024-00084-3","DOIUrl":"https://doi.org/10.1007/s42803-024-00084-3","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"1 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139861914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open Times: The future of critique in the age of (un)replicability 开放时代:可(不可)复制时代批评的未来
Pub Date : 2024-01-08 DOI: 10.1007/s42803-023-00081-y
Nathalie Cooke, Ronny Litvack-Katzman
{"title":"Open Times: The future of critique in the age of (un)replicability","authors":"Nathalie Cooke, Ronny Litvack-Katzman","doi":"10.1007/s42803-023-00081-y","DOIUrl":"https://doi.org/10.1007/s42803-023-00081-y","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"12 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139445792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Digital humanities in the era of digital reproducibility: towards a fairest and post-computational framework 数字可复制时代的数字人文:建立最公平的后计算框架
Pub Date : 2024-01-03 DOI: 10.1007/s42803-023-00079-6
Béatrice Joyeux-Prunel
{"title":"Digital humanities in the era of digital reproducibility: towards a fairest and post-computational framework","authors":"Béatrice Joyeux-Prunel","doi":"10.1007/s42803-023-00079-6","DOIUrl":"https://doi.org/10.1007/s42803-023-00079-6","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"20 6","pages":"1-21"},"PeriodicalIF":0.0,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139389208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
"I have always found the whole area a minefield": Wikidata, historical lives, and knowledge infrastructure. “我一直觉得整个领域都是雷区”:维基数据、历史生活和知识基础设施。
Pub Date : 2024-01-01 Epub Date: 2024-12-23 DOI: 10.1007/s42803-024-00090-5
James Baker, Ammandeep K Mahal

The rise of Wikidata represents a quiet revolution in knowledge infrastructure. This paper enquires into this knowledge base as an infrastructure and considers the implications of its centrality within our contemporary knowledge ecosystem. Rather than read Wikidata at scale, we employ of a narrow frame through which to explore the ideologies Wikidata has adopted and reproduces. This frame is Beyond Notability, a knowledge base that seeks to document women's work in archaeology, history, and heritage between 1870 and 1950 through original archival research. Beyond Notability draws on and responds to the Wikidata data model, and this paper emerges from our experiences interacting with Wikidata to produce linked data biography. In foregrounding the tensions between historically specific phenomena and classificatory logics, our work stresses the value of using practice-based ontology development to investigate large-scale knowledge infrastructures at a time when the fabric of knowledge is at stake.

维基数据的崛起代表了知识基础设施领域一场悄无声息的革命。本文将这个知识库作为基础设施进行探讨,并考虑其在我们当代知识生态系统中的中心地位的含义。我们不是大规模地阅读维基数据,而是采用一个狭窄的框架来探索维基数据所采用和复制的意识形态。这个框架就是超越名望,这是一个知识库,旨在通过原始档案研究记录1870年至1950年间女性在考古、历史和遗产方面的工作。Beyond noability借鉴并回应了维基数据模型,本文从我们与维基数据交互的经验中产生了关联数据传记。在强调历史特定现象和分类逻辑之间的紧张关系时,我们的工作强调了在知识结构处于危险之中时使用基于实践的本体开发来调查大规模知识基础设施的价值。
{"title":"\"I have always found the whole area a minefield\": Wikidata, historical lives, and knowledge infrastructure.","authors":"James Baker, Ammandeep K Mahal","doi":"10.1007/s42803-024-00090-5","DOIUrl":"10.1007/s42803-024-00090-5","url":null,"abstract":"<p><p>The rise of Wikidata represents a quiet revolution in knowledge infrastructure. This paper enquires into this knowledge base as an infrastructure and considers the implications of its centrality within our contemporary knowledge ecosystem. Rather than read Wikidata at scale, we employ of a narrow frame through which to explore the ideologies Wikidata has adopted and reproduces. This frame is Beyond Notability, a knowledge base that seeks to document women's work in archaeology, history, and heritage between 1870 and 1950 through original archival research. Beyond Notability draws on and responds to the Wikidata data model, and this paper emerges from our experiences interacting with Wikidata to produce linked data biography. In foregrounding the tensions between historically specific phenomena and classificatory logics, our work stresses the value of using practice-based ontology development to investigate large-scale knowledge infrastructures at a time when the fabric of knowledge is at stake.</p>","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"6 2","pages":"217-236"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12084240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144096199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging OCR and HTR cloud services towards data mobilisation of historical plant names. 利用OCR和HTR云服务实现历史工厂名称的数据动员。
Pub Date : 2024-01-01 Epub Date: 2024-11-28 DOI: 10.1007/s42803-024-00091-4
Jawad Sadek, Andreas Vlachidis, Victoria Pickering, Marco Humbel, Daniele Metilli, Mark Carine, Julianne Nyhan

We present our solution to the problem of how to mobilise (that is, extract and enrich) digital data from the analogue, printed book version Sir Hans Sloane's copy of John Ray's Historia Plantarum, to create the first searchable facility of its kind to the plants contained in the Sloane Herbarium, housed in the National History Museum UK. The data mobilisation workflow presented here enables the automatic detection of printed and handwritten marginalia text and annotations in Sir Hans Sloane" personal copy of John Ray's Historia Plantarum. The rationale of adopting AWS Amazon's Textract service and the development of a specialised information extraction workflow for mobilising printed text and handwritten annotations is discussed. Testing of our workflow demonstrates the need for human-checking of outputs to ensure the accuracy of a large set of structured data comprising 7600 plant names and 4540 handwritten marginalia annotation. The links we have created serve as the first digital index to Sloan's Herbarium, a unique development in the longer analogue and digital format-history of these resources.

我们提出了我们的解决方案,如何动员(即提取和丰富)数字数据的模拟,印刷书籍版本的汉斯·斯隆爵士的副本约翰·雷的植物历史,以创建第一个可搜索的设施,包含在斯隆标本馆的植物,安置在英国国家历史博物馆。这里展示的数据移动工作流程可以自动检测汉斯·斯隆爵士“约翰·雷的《植物历史》的私人副本”中的印刷和手写旁注文本和注释。讨论了采用AWS亚马逊的文本服务的基本原理,以及为动员印刷文本和手写注释而开发的专门信息提取工作流。对我们工作流程的测试表明,需要人工检查输出,以确保包含7600个植物名称和4540个手写旁注注释的大型结构化数据集的准确性。我们创建的链接是斯隆植物标本馆的第一个数字索引,这是这些资源长期模拟和数字格式历史的独特发展。
{"title":"Leveraging OCR and HTR cloud services towards data mobilisation of historical plant names.","authors":"Jawad Sadek, Andreas Vlachidis, Victoria Pickering, Marco Humbel, Daniele Metilli, Mark Carine, Julianne Nyhan","doi":"10.1007/s42803-024-00091-4","DOIUrl":"10.1007/s42803-024-00091-4","url":null,"abstract":"<p><p>We present our solution to the problem of how to mobilise (that is, extract and enrich) digital data from the analogue, printed book version Sir Hans Sloane's copy of John Ray's Historia Plantarum, to create the first searchable facility of its kind to the plants contained in the Sloane Herbarium, housed in the National History Museum UK. The data mobilisation workflow presented here enables the automatic detection of printed and handwritten marginalia text and annotations in Sir Hans Sloane\" personal copy of John Ray's Historia Plantarum. The rationale of adopting AWS Amazon's Textract service and the development of a specialised information extraction workflow for mobilising printed text and handwritten annotations is discussed. Testing of our workflow demonstrates the need for human-checking of outputs to ensure the accuracy of a large set of structured data comprising 7600 plant names and 4540 handwritten marginalia annotation. The links we have created serve as the first digital index to Sloan's Herbarium, a unique development in the longer analogue and digital format-history of these resources.</p>","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"6 3","pages":"237-261"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12106164/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144176215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International journal of digital humanities
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1