Digital Scholarship in the Humanities最新文献

英文中文

Epistemic consequences of unfair tools 不公平工具的认识论后果

IF 0.8 3区文学 0 HUMANITIES, MULTIDISCIPLINARY

Digital Scholarship in the Humanities

Pub Date : 2024-01-24 DOI: 10.1093/llc/fqad091

Ida Marie S Lassen, Ross Deans Kristensen-McLachlan, Mina Almasi, Kenneth Enevoldsen, Kristoffer L Nielbo

This article examines the epistemic consequences of unfair technologies used in digital humanities (DH). We connect bias analysis informed by the field of algorithmic fairness with perspectives on knowledge production in DH. We examine the fairness of Danish Named Entity Recognition tools through an innovative experimental method involving data augmentation and evaluate the performance disparities based on two metrics of algorithmic fairness: calibration within groups; and balance for the positive class. Our results show that only two of the ten tested models comply with the fairness criteria. From an intersectional perspective, we shed light on how unequal performance across groups can lead to the exclusion and marginalization of certain social groups, leading to voices and experiences being disregarded and silenced. We propose incorporating algorithmic fairness in the selection of tools in DH to help alleviate the risk of perpetuating silence and move towards fairer and more inclusive research.

本文探讨了数字人文（DH）中使用的不公平技术的认识论后果。我们将算法公平性领域的偏差分析与 DH 知识生产的视角联系起来。我们通过一种涉及数据增强的创新实验方法来检验丹麦命名实体识别工具的公平性，并根据算法公平性的两个指标来评估其性能差异：组内校准；正类平衡。我们的结果表明，十个测试模型中只有两个符合公平性标准。从交叉的角度来看，我们揭示了不同群体间的不平等表现如何导致某些社会群体被排斥和边缘化，从而导致声音和经验被忽视和沉默。我们建议在选择 DH 工具时纳入算法公平性，以帮助减轻沉默永久化的风险，并朝着更公平、更具包容性的研究方向迈进。

引用次数: 0

The analogy of computing 计算的类比

IF 0.8 3区文学 0 HUMANITIES, MULTIDISCIPLINARY

Digital Scholarship in the Humanities

Pub Date : 2024-01-21 DOI: 10.1093/llc/fqad104

Willard McCarty

The digital machine is analogical by design: with it, we construct models of phenomena that by definition of that term are necessarily partial approximations. For that reason, we learn more by conceiving of them as analogues rather than imperfect copies. As the foofaraw over AI would make clear to anyone who bothered to separate its strange wheat from the common chaff, analogy is key to the digital engine’s intellectual power, whether for good or for ill. (The one we must further, the other oppose, but in both cases, understand as fully as we are able.) Analogy is itself a Proteus, however, surfacing in different forms in different disciplines where the machine has found its applications. In the following essay, I chase it through a number of fields before returning to computing, with two examples of its application. I end with a brief note on worldmaking, which after all is what it’s all about, at whatever scale.

数字机器的设计具有类比性：我们用它来构建现象模型，而根据该术语的定义，这些模型必然是部分近似值。正因如此，我们将它们视为类似物而非不完全的复制品，就能学到更多东西。正如关于人工智能的争论所表明的那样，任何人只要费心将其奇异的麦子与普通的糠秕区分开来，类比就是数字引擎智力的关键所在，无论是好的还是坏的。我们必须促进其中一种，反对另一种，但在这两种情况下，我们都要尽可能充分地理解）。然而，类比本身就是一种普罗泰斯（Proteus），它以不同的形式出现在机器应用的不同学科中。在接下来的文章中，我将在多个领域追寻它的踪迹，然后再回到计算领域，举两个例子来说明它的应用。最后，我将简要说明世界的创造，毕竟无论规模大小，这都是它的意义所在。

引用次数: 0

AGREE: a new benchmark for the evaluation of distributional semantic models of ancient Greek AGREE：评估古希腊分布语义模型的新基准

IF 0.8 3区文学 0 HUMANITIES, MULTIDISCIPLINARY

Digital Scholarship in the Humanities

Pub Date : 2024-01-15 DOI: 10.1093/llc/fqad087

Silvia Stopponi, Saskia Peels-Matthey, Malvina Nissim

The last years have seen the application of Natural Language Processing, in particular, language models, to the study of the Semantics of ancient Greek, but only a little work has been done to create gold data for the evaluation of such models. In this contribution we introduce AGREE, the first benchmark for intrinsic evaluation of semantic models of ancient Greek created from expert judgements. In the absence of native speakers, eliciting expert judgements to create a gold standard is a way to leverage a competence that is the closest to that of natives. Moreover, this method allows for collecting data in a uniform way and giving precise instructions to participants. Human judgements about word relatedness were collected via two questionnaires: in the first, experts provided related lemmas to some proposed seeds, while in the second, they assigned relatedness judgements to pairs of lemmas. AGREE was built from a selection of the collected data.

近年来，自然语言处理技术，尤其是语言模型，被广泛应用于古希腊语语义学的研究，但在创建用于评估此类模型的黄金数据方面却鲜有建树。在这篇论文中，我们介绍了 AGREE，它是第一个对根据专家判断创建的古希腊语义模型进行内在评估的基准。在没有母语使用者的情况下，通过专家判断来创建黄金标准是一种利用最接近母语使用者能力的方法。此外，这种方法还能以统一的方式收集数据，并向参与者提供精确的指导。人类对词语关联性的判断是通过两份问卷收集的：在第一份问卷中，专家们为一些提议的种子提供了相关的词组，而在第二份问卷中，专家们为成对的词组分配了关联性判断。AGREE 系统就是从收集到的数据中精选出来的。

引用次数: 0

Digitizing the USPTO patent backfile 美国专利商标局专利档案数字化

IF 0.8 3区文学 0 HUMANITIES, MULTIDISCIPLINARY

Digital Scholarship in the Humanities

Pub Date : 2024-01-15 DOI: 10.1093/llc/fqad096

Simon Rowberry

The digitization of the US Patent and Trademark Office’s (USPTO) backfile of six million patents undertaken between 1951 and 2001 was a five-decade struggle, featuring several media transitions from print and microfilm to CD-ROMs and, finally, the Web. This mass digitization project is on a similar scale to Google Books and the Internet Archive, but it is rarely discussed within critical digitization scholarship or for its significance as a tool for knowledge production. In this article, I focus on the USPTO’s patent document’s digital and physical material form and how the current paradigm of access and storage of the digital backfile emerged. Through this case study, I build upon Ian Milligan’s distinction between the ‘text’ and ‘platform’ layers of a digitization project to demonstrate how historical decisions regarding format and metadata continue to influence how users retrieve and interpret documents, such as patents, online.

美国专利商标局（USPTO）在 1951 年至 2001 年间对 600 万项专利进行的数字化工作历时五十年，经历了从印刷品和缩微胶卷到 CD-ROM 再到网络的多次媒介转换。这一大规模数字化项目的规模与谷歌图书和互联网档案馆类似，但在批判性的数字化学术研究中却很少被讨论，也很少有人将其作为知识生产的工具。在本文中，我将重点关注美国专利商标局专利文件的数字和物理材料形式，以及当前数字回档的访问和存储范式是如何形成的。通过这一案例研究，我以伊恩-米利根（Ian Milligan）对数字化项目的 "文本 "层和 "平台 "层的区分为基础，展示了有关格式和元数据的历史性决定如何继续影响用户在网上检索和解读专利等文件的方式。

引用次数: 0

Mapping Germanness in early 20th century USA: topic modeling and GIS within a small corpus framework 绘制 20 世纪初美国的日耳曼人：小型语料库框架内的主题建模和地理信息系统

IF 0.8 3区文学 0 HUMANITIES, MULTIDISCIPLINARY

Digital Scholarship in the Humanities

Pub Date : 2024-01-11 DOI: 10.1093/llc/fqad102

Sijie Wang, Maciej Kurzynski

The increased emphasis on language and ethnicity among German immigrants in the USA at the beginning of the 20th century resulted from inter-ethnic competition as well as assimilation pressures on Germans as a minority in American society. Following the unification of Germany and the improvement of German international status, Germans in America claimed superiority of German culture; middle-class advocates attempted to build a more united German-American community, fighting for a stronger voice on issues such as prohibition and German language education. These processes eventually led to the establishment of the National German-American Alliance in Philadelphia in 1901. The present article employs topic modeling and GIS techniques to examine the little-known conference proceedings of the Alliance and discuss Prince Heinrich “Henry” of Prussia’s 1902 visit to the USA. On the humanities side, we foreground the dynamics of the German diaspora who sought their own ethnic uniqueness and constructed historical memory during this period. On the digital side, we discuss different statistical evaluations of topic models as well as their applicability within a small corpus research framework.

20 世纪初，在美国的德国移民越来越重视语言和种族问题，这是种族间竞争的结果，也是德国人作为美国社会中的少数族裔面临同化压力的结果。随着德国的统一和德国国际地位的提高，在美国的德国人声称德国文化具有优越性；中产阶级倡导者试图建立一个更加团结的德裔美国人社区，在禁酒令和德语教育等问题上争取更大的发言权。这些过程最终导致 1901 年在费城成立了全国德裔美国人联盟。本文利用主题建模和 GIS 技术研究了鲜为人知的联盟会议记录，并讨论了普鲁士王子海因里希-"亨利 "1902 年对美国的访问。在人文科学方面，我们强调了这一时期德国移民社群的动态，他们在这一时期寻求自己民族的独特性并构建历史记忆。在数字方面，我们讨论了主题模型的不同统计评估及其在小型语料库研究框架内的适用性。

{"title":"Mapping Germanness in early 20th century USA: topic modeling and GIS within a small corpus framework","authors":"Sijie Wang, Maciej Kurzynski","doi":"10.1093/llc/fqad102","DOIUrl":"https://doi.org/10.1093/llc/fqad102","url":null,"abstract":"The increased emphasis on language and ethnicity among German immigrants in the USA at the beginning of the 20th century resulted from inter-ethnic competition as well as assimilation pressures on Germans as a minority in American society. Following the unification of Germany and the improvement of German international status, Germans in America claimed superiority of German culture; middle-class advocates attempted to build a more united German-American community, fighting for a stronger voice on issues such as prohibition and German language education. These processes eventually led to the establishment of the National German-American Alliance in Philadelphia in 1901. The present article employs topic modeling and GIS techniques to examine the little-known conference proceedings of the Alliance and discuss Prince Heinrich “Henry” of Prussia’s 1902 visit to the USA. On the humanities side, we foreground the dynamics of the German diaspora who sought their own ethnic uniqueness and constructed historical memory during this period. On the digital side, we discuss different statistical evaluations of topic models as well as their applicability within a small corpus research framework.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"28 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139461817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsigned play by Milan Kundera? An authorship attribution study 米兰-昆德拉的未署名剧本？作者归属研究

IF 0.8 3区文学 0 HUMANITIES, MULTIDISCIPLINARY

Digital Scholarship in the Humanities

Pub Date : 2024-01-11 DOI: 10.1093/llc/fqad109

Lenka Jungmannová, Petr Plecháč

In addition to being a widely recognized novelist, Milan Kundera has also authored three pieces for theatre: The Owners of the Keys (Majitelé klíčů 1961), The Blunder (Ptákovina 1967), and Jacques and his Master (Jakub a jeho pán 1971). In recent years, however, the hypothesis has been raised that Kundera was the true author of a fourth play, Juro Jánošík, first performed in a 1974 production under the name of Karel Steigerwald, who was Kundera’s student at the time. In this study, we make use of supervised machine learning to settle the question of authorship attribution in the case of Juro Jánošík, with results strongly supporting the hypothesis of Kundera’s authorship.

米兰-昆德拉不仅是广为人知的小说家，还创作了三部戏剧作品：钥匙的主人》（Majitelé klíčů 1961 年）、《失误》（Ptákovina 1967 年）和《雅克和他的主人》（Jakub a jeho pán 1971 年）。然而，近年来有人提出了昆德拉是第四部剧本《Juro Jánošík》的真正作者的假设，该剧于 1974 年以卡雷尔-斯泰格瓦尔德（Karel Steigerwald）的名字首次公演，而斯泰格瓦尔德当时是昆德拉的学生。在本研究中，我们利用有监督的机器学习来解决《Juro Jánošík》的作者归属问题，结果有力地支持了昆德拉为作者的假设。

引用次数: 0

The internal structure of medieval Latin legendaries: a computational analysis 中世纪拉丁语传说的内部结构：计算分析

IF 0.8 3区文学 0 HUMANITIES, MULTIDISCIPLINARY

Digital Scholarship in the Humanities

Pub Date : 2024-01-11 DOI: 10.1093/llc/fqad097

Sébastien de Valeriola, Bastien Dubuisson

Since the middle of the 17th century, scholars have been systematically describing numerous medieval manuscripts preserved in libraries and religious institutions that contain hagiographic texts, that is texts recounting the lives of saints. In this article, we apply quantitative tools to the resulting database to consider these codices from a new point of view. As such, we study their internal organization, that is the order in which their texts are arranged. We first present a visualization tool that allows to grasp this structure at a glance. Then, we describe a model to automatically classify manuscripts according to their internal organization, based on a constrained spline regression. The results of this classification task make it possible to identify the manuscripts that have a particular internal organization, called per circulum anni (following the course of the year), and thus to study their properties. Furthermore, they open the possibility to obtain clues regarding the origin of some codices and potential kinship links between them.

自 17 世纪中叶以来，学者们一直在系统地描述保存在图书馆和宗教机构中的大量中世纪手稿，这些手稿中包含传记文本，即记述圣人生平的文本。在本文中，我们将定量工具应用到由此产生的数据库中，从一个新的角度来研究这些手抄本。因此，我们研究了它们的内部组织，即文本的排列顺序。我们首先介绍了一种可视化工具，它可以让我们一目了然地掌握这种结构。然后，我们描述了一个基于约束样条回归的模型，根据手稿的内部组织结构对其进行自动分类。这一分类任务的结果使我们有可能识别出具有特定内部组织结构的手稿，即所谓的 per circulum anni（按年份排列），从而研究它们的特性。此外，这些结果还为获得有关某些手抄本的起源以及它们之间潜在的亲缘关系的线索提供了可能。

引用次数: 0

Topic modelling literary interviews from The Paris Review 巴黎评论》的主题建模文学访谈

IF 0.8 3区文学 0 HUMANITIES, MULTIDISCIPLINARY

Digital Scholarship in the Humanities

Pub Date : 2024-01-11 DOI: 10.1093/llc/fqad098

Derek Greene, James O'Sullivan, Daragh O'Reilly

The interview has always proved to be a rich source for those hoping to better understand the figures behind a text, as well as any social contexts and writing practices which might have informed their aesthetic sentiments. Over the past two decades, research into the literary interview has made significant strides, both in terms of how this literary genre is conceptualized and how its emergence and development has been historically traced, the form remains somewhat neglected by literary and cultural theorists and scholars. There is also a remarkable absence of distant readings in this domain. With the rise of the digital humanities, particularly digital literary studies, one would expect more scholars to have used computer-assisted techniques to mine literary interviews, which are, in terms of dataset practicalities, somewhat ideal, semi-structured by nature, and typically available online. Such is the question to which this article attends, taking as its dataset seven decades’ worth of literary interviews from The Paris Review, and ‘topic modelling’ these documents to determine the key themes that dominate such a culturally significant set of materials while also exploring the value of topic modelling to socio-literary criticism.

事实证明，对于那些希望更好地了解文本背后的人物以及可能影响其审美情感的社会背景和写作实践的人来说，访谈一直是一个丰富的资料来源。在过去的二十年里，对文学访谈的研究取得了长足的进步，无论是在如何将这一文学体裁概念化方面，还是在如何追溯其出现和发展的历史方面，文学和文化理论家和学者对这一形式的研究仍然有些忽视。在这一领域也明显缺乏远距离阅读。随着数字人文学科的兴起，尤其是数字文学研究的兴起，人们期待有更多的学者使用计算机辅助技术来挖掘文学访谈，从数据集的实用性来看，这些访谈是比较理想的，具有半结构化的性质，通常可以在网上获得。这正是本文要探讨的问题，本文以《巴黎评论》七十年来的文学访谈为数据集，对这些文献进行 "主题建模"，以确定主导这样一组具有重要文化意义的资料的关键主题，同时探讨主题建模对社会文学批评的价值。

{"title":"Topic modelling literary interviews from The Paris Review","authors":"Derek Greene, James O'Sullivan, Daragh O'Reilly","doi":"10.1093/llc/fqad098","DOIUrl":"https://doi.org/10.1093/llc/fqad098","url":null,"abstract":"The interview has always proved to be a rich source for those hoping to better understand the figures behind a text, as well as any social contexts and writing practices which might have informed their aesthetic sentiments. Over the past two decades, research into the literary interview has made significant strides, both in terms of how this literary genre is conceptualized and how its emergence and development has been historically traced, the form remains somewhat neglected by literary and cultural theorists and scholars. There is also a remarkable absence of distant readings in this domain. With the rise of the digital humanities, particularly digital literary studies, one would expect more scholars to have used computer-assisted techniques to mine literary interviews, which are, in terms of dataset practicalities, somewhat ideal, semi-structured by nature, and typically available online. Such is the question to which this article attends, taking as its dataset seven decades’ worth of literary interviews from The Paris Review, and ‘topic modelling’ these documents to determine the key themes that dominate such a culturally significant set of materials while also exploring the value of topic modelling to socio-literary criticism.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"41 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139461758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using ontology to model time description in historical Chinese texts 用本体论模拟中国历史文本中的时间描述

IF 0.8 3区文学 0 HUMANITIES, MULTIDISCIPLINARY

Digital Scholarship in the Humanities

Pub Date : 2024-01-10 DOI: 10.1093/llc/fqad092

Linxu Wang, Jun Wang, Tong Wei

Temporal information plays a crucial role in historical research, as it enables scholars to gain insights into the events and processes that have shaped the past. However, the complexity and diversity of temporal descriptions found in Chinese historical texts pose significant challenges for analyzing and interpreting this information. This article addresses these challenges by introducing the traditional Chinese time ontology (TCT Ontology), which integrates relevant concepts and different timing methods into an ontology. The TCT Ontology comprises four classes, including the TCT Record class, Chinese Calendar class, Historical Interval class, and Person class, to represent time descriptions in Chinese texts. By separating time records and the traditional Chinese calendar, the ontology provides a reference model for understanding time information in Chinese historical archives and serves as a basis for converting those time records to the Gregorian calendar. This accurate conversion is critical for humanistic research in Chinese history, as it enables scholars to engage in meaningful reading, studying, and research of the historical record.

时间信息在历史研究中发挥着至关重要的作用，因为它使学者们能够深入了解塑造过去的事件和过程。然而，中国历史文本中时间描述的复杂性和多样性给分析和解释这些信息带来了巨大挑战。本文通过介绍中国传统时间本体论（TCT 本体论）来应对这些挑战，该本体论将相关概念和不同的计时方法整合到一个本体论中。TCT 本体由四个类组成，包括 TCT 记录类、中国日历类、历史间隔类和人物类，用于表示中文文本中的时间描述。通过将时间记录和中国传统历法分开，本体论为理解中国历史档案中的时间信息提供了一个参考模型，并为将这些时间记录转换为公历提供了依据。这种准确的转换对于中国历史的人文研究至关重要，因为它能让学者对历史记录进行有意义的阅读、学习和研究。

引用次数: 0

The Digital Humanities and Literary Studies. Martin Paul Eve 数字人文与文学研究。马丁-保罗-伊夫

IF 0.8 3区文学 0 HUMANITIES, MULTIDISCIPLINARY

Digital Scholarship in the Humanities

Pub Date : 2024-01-09 DOI: 10.1093/llc/fqad095

Tiping Su

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Digital Scholarship in the Humanities

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀