首页 > 最新文献

Digital Scholarship in the Humanities最新文献

英文 中文
Finding common features in multilingual fake news: a quantitative clustering approach 在多语言假新闻中寻找共同特征:一种定量聚类方法
IF 0.8 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY Pub Date : 2024-04-03 DOI: 10.1093/llc/fqae016
Wei Yuan, Haitao Liu
Since the Internet is a breeding ground for unconfirmed fake news, its automatic detection and clustering studies have become crucial. Most current studies focus on English texts, and the common features of multilingual fake news are not sufficiently studied. Therefore, this article uses English, Russian, and Chinese as examples and focuses on identifying the common quantitative features of fake news in different languages at the word, sentence, readability, and sentiment levels. These features are then utilized in principal component analysis, K-means clustering, hierarchical clustering, and two-step clustering experiments, which achieved satisfactory results. The common features we proposed play a greater role in achieving automatic cross-lingual clustering than the features proposed in previous studies. Simultaneously, we discovered a trend toward linguistic simplification and economy in fake news. Furthermore, fake news is easier to understand and uses negative emotional expressions in ways that real news does not. Our research provides new reference features for fake news detection tasks and facilitates research into their linguistic characteristics.
互联网是未经证实的假新闻的温床,因此对其进行自动检测和聚类研究变得至关重要。目前的研究大多集中在英文文本上,而对多语言假新闻的共同特征研究不足。因此,本文以英文、俄文和中文为例,重点从词、句、可读性和情感层面识别不同语言假新闻的共同量化特征。然后利用这些特征进行主成分分析、K-均值聚类、层次聚类和两步聚类实验,取得了令人满意的结果。与以往研究中提出的特征相比,我们提出的共同特征在实现跨语言自动聚类方面发挥了更大的作用。同时,我们发现假新闻在语言上有简化和经济的趋势。此外,假新闻更容易理解,并且使用了负面情绪表达方式,而真实新闻则没有。我们的研究为假新闻检测任务提供了新的参考特征,并促进了对其语言特点的研究。
{"title":"Finding common features in multilingual fake news: a quantitative clustering approach","authors":"Wei Yuan, Haitao Liu","doi":"10.1093/llc/fqae016","DOIUrl":"https://doi.org/10.1093/llc/fqae016","url":null,"abstract":"Since the Internet is a breeding ground for unconfirmed fake news, its automatic detection and clustering studies have become crucial. Most current studies focus on English texts, and the common features of multilingual fake news are not sufficiently studied. Therefore, this article uses English, Russian, and Chinese as examples and focuses on identifying the common quantitative features of fake news in different languages at the word, sentence, readability, and sentiment levels. These features are then utilized in principal component analysis, K-means clustering, hierarchical clustering, and two-step clustering experiments, which achieved satisfactory results. The common features we proposed play a greater role in achieving automatic cross-lingual clustering than the features proposed in previous studies. Simultaneously, we discovered a trend toward linguistic simplification and economy in fake news. Furthermore, fake news is easier to understand and uses negative emotional expressions in ways that real news does not. Our research provides new reference features for fake news detection tasks and facilitates research into their linguistic characteristics.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"4 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140572969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retractions in arts and humanities: an analysis of the retraction notices 艺术与人文领域的撤稿:对撤稿通知的分析
IF 0.8 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY Pub Date : 2024-03-18 DOI: 10.1093/llc/fqad093
Ivan Heibi, Silvio Peroni
The aim of this work is to understand the retraction phenomenon in the arts and humanities domain through an analysis of the retraction notices—formal documents stating and describing the retraction of a particular publication. The retractions and the corresponding notices are identified using the data provided by Retraction Watch. Our methodology for the analysis combines a metadata analysis and a content analysis (mainly performed using a topic modelling process) of the retraction notices. Considering 343 cases of retraction, we found that many retraction notices are neither identifiable nor findable. In addition, these were not always separated from the original papers, introducing ambiguity in understanding how these notices were perceived by the community (i.e. cited). Also, we noticed that there is no systematic way to write a retraction notice. Indeed, some retraction notices presented a complete discussion of the reasons for retraction, while others tended to be more direct and succinct. We have also reported many notices having similar text while addressing different retractions. We think a further study with a larger collection should be done using the same methodology to confirm and investigate our findings further.
这项工作的目的是通过分析撤稿通知--声明和描述某一出版物撤稿的正式文件--来了解艺术和人文领域的撤稿现象。我们使用 "撤稿观察"(Retraction Watch)提供的数据来识别撤稿和相应的撤稿通知。我们的分析方法结合了对撤稿通知的元数据分析和内容分析(主要使用主题建模过程)。通过对 343 个撤稿案例的分析,我们发现许多撤稿公告既无法识别也无法查找。此外,这些撤稿通知并不总是与原始论文分开,因此在了解这些通知是如何被社区感知(即被引用)方面存在模糊性。我们还注意到,撤稿通知的撰写没有系统的方法。事实上,一些撤稿通知对撤稿原因进行了完整的讨论,而另一些则倾向于更加直接和简洁。我们还发现,许多撤稿通知的内容大同小异,但却涉及不同的撤稿理由。我们认为,应该使用同样的方法对更多的撤稿通知进行进一步研究,以进一步证实和调查我们的发现。
{"title":"Retractions in arts and humanities: an analysis of the retraction notices","authors":"Ivan Heibi, Silvio Peroni","doi":"10.1093/llc/fqad093","DOIUrl":"https://doi.org/10.1093/llc/fqad093","url":null,"abstract":"The aim of this work is to understand the retraction phenomenon in the arts and humanities domain through an analysis of the retraction notices—formal documents stating and describing the retraction of a particular publication. The retractions and the corresponding notices are identified using the data provided by Retraction Watch. Our methodology for the analysis combines a metadata analysis and a content analysis (mainly performed using a topic modelling process) of the retraction notices. Considering 343 cases of retraction, we found that many retraction notices are neither identifiable nor findable. In addition, these were not always separated from the original papers, introducing ambiguity in understanding how these notices were perceived by the community (i.e. cited). Also, we noticed that there is no systematic way to write a retraction notice. Indeed, some retraction notices presented a complete discussion of the reasons for retraction, while others tended to be more direct and succinct. We have also reported many notices having similar text while addressing different retractions. We think a further study with a larger collection should be done using the same methodology to confirm and investigate our findings further.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"52 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140168566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital assemblages with AI for creative interpretation of short stories 利用人工智能进行数字组合,创造性地解读短篇小说
IF 0.8 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY Pub Date : 2024-03-06 DOI: 10.1093/llc/fqad050
Kieran O'Halloran
I demonstrate an approach fostering inventive interpretation of short stories in Literary Studies and higher education generally. It involves constructing an ‘assemblage’—at its simplest, an evolving network of unusual connections for creative outcome. The assemblage of this article combines freshly located research literature, directly and indirectly related to a story’s themes, and/or the personality type of protagonists. Importantly, this assemblage also utilizes text analysis software revealing the relatively invisible (e.g. (in)frequent words, parts of speech, and topics) and Large Language Model (LLM) Generative AI to enrich the interpretation. The use of all these elements helps productively exceed initial intuitions about the story, facilitating creativity. I model the approach using Edgar Allan Poe’s short story, The Black Cat, whose protagonist is a homicidal psychopath. Specifically, the assemblage here includes relevant software-based research (a corpus analysis of homicidal psychopathic language), non-software-based research (psychoanalytical literary criticism of The Black Cat using the empirically validated concept of transference), text analysis software (WMatrix and Datayze), and the LLM Generative AI, ‘ChatGPT’ (using the freely available LLM GPT-3.5). One use of this approach is as a pedagogy in Literary Studies employing text analysis software (e.g. on a digital stylistics course). Yet given creative adaptability is a key 21st-century skill, with digital literacy—including the use of Generative AI—an important contemporary competence, and with the short story genre universally known, I highlight too the utility of this approach as a university-wide pedagogy for enhancing creative thinking.
我展示了一种在文学研究和高等教育中促进对短篇小说进行创造性解读的方法。这种方法涉及构建一个 "组合"--最简单地说,就是一个不断发展的、具有创造性成果的不寻常联系网络。本文的组合结合了与故事主题和/或主人公性格类型直接或间接相关的最新研究文献。重要的是,这种组合还利用了文本分析软件来揭示相对不可见的内容(如(不)频繁出现的词语、语篇和主题),并利用大型语言模型(LLM)生成式人工智能来丰富解读内容。所有这些元素的使用有助于有效地超越对故事的最初直觉,从而促进创造力。我以埃德加-爱伦-坡的短篇小说《黑猫》为例,该小说的主人公是一个杀人不眨眼的精神病患者。具体来说,这里的组合包括基于软件的相关研究(对杀人精神病患者语言的语料库分析)、非基于软件的研究(使用经验验证的移情概念对《黑猫》进行精神分析文学批评)、文本分析软件(WMatrix 和 Datayze)以及 LLM 生成式人工智能 "ChatGPT"(使用免费提供的 LLM GPT-3.5)。这种方法的一种用途是在文学研究中使用文本分析软件进行教学(例如在数字文体学课程中)。然而,鉴于创造性适应能力是 21 世纪的一项关键技能,数字素养--包括生成式人工智能的使用--是当代的一项重要能力,而且短篇小说体裁已广为人知,我也强调了这种方法作为大学范围内提高创造性思维的教学法的实用性。
{"title":"Digital assemblages with AI for creative interpretation of short stories","authors":"Kieran O'Halloran","doi":"10.1093/llc/fqad050","DOIUrl":"https://doi.org/10.1093/llc/fqad050","url":null,"abstract":"I demonstrate an approach fostering inventive interpretation of short stories in Literary Studies and higher education generally. It involves constructing an ‘assemblage’—at its simplest, an evolving network of unusual connections for creative outcome. The assemblage of this article combines freshly located research literature, directly and indirectly related to a story’s themes, and/or the personality type of protagonists. Importantly, this assemblage also utilizes text analysis software revealing the relatively invisible (e.g. (in)frequent words, parts of speech, and topics) and Large Language Model (LLM) Generative AI to enrich the interpretation. The use of all these elements helps productively exceed initial intuitions about the story, facilitating creativity. I model the approach using Edgar Allan Poe’s short story, The Black Cat, whose protagonist is a homicidal psychopath. Specifically, the assemblage here includes relevant software-based research (a corpus analysis of homicidal psychopathic language), non-software-based research (psychoanalytical literary criticism of The Black Cat using the empirically validated concept of transference), text analysis software (WMatrix and Datayze), and the LLM Generative AI, ‘ChatGPT’ (using the freely available LLM GPT-3.5). One use of this approach is as a pedagogy in Literary Studies employing text analysis software (e.g. on a digital stylistics course). Yet given creative adaptability is a key 21st-century skill, with digital literacy—including the use of Generative AI—an important contemporary competence, and with the short story genre universally known, I highlight too the utility of this approach as a university-wide pedagogy for enhancing creative thinking.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"38 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140075469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using deep learning to analyse the times of the UN Security Council 利用深度学习分析联合国安理会的时间
IF 0.8 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY Pub Date : 2024-02-29 DOI: 10.1093/llc/fqae009
Tobias Blanke
This article analyses how digital humanities scholarship can make use of recent advances in deep learning to analyse the temporal relations in an online textual archive. We use transfer learning as well as data augmentation techniques to investigate changes in United Nations Security Council resolutions. Instead of pre-defined periods, as it is common, we target the years directly. Such a text regression task is novel in the digital humanities as far as we can see and has the advantage of speaking directly to historical relations. We present not only very good experimental results but also demonstrate how such text regressions can be interpreted directly and with surrogate topic models.
本文分析了数字人文学术如何利用深度学习的最新进展来分析在线文本档案中的时间关系。我们使用迁移学习和数据增强技术来研究联合国安理会决议的变化。我们没有采用常见的预定义时间段,而是直接以年份为目标。在我们看来,这样的文本回归任务在数字人文学科中是新颖的,其优势在于可以直接探讨历史关系。我们不仅展示了非常好的实验结果,还演示了如何直接解释此类文本回归,以及如何使用代用主题模型。
{"title":"Using deep learning to analyse the times of the UN Security Council","authors":"Tobias Blanke","doi":"10.1093/llc/fqae009","DOIUrl":"https://doi.org/10.1093/llc/fqae009","url":null,"abstract":"This article analyses how digital humanities scholarship can make use of recent advances in deep learning to analyse the temporal relations in an online textual archive. We use transfer learning as well as data augmentation techniques to investigate changes in United Nations Security Council resolutions. Instead of pre-defined periods, as it is common, we target the years directly. Such a text regression task is novel in the digital humanities as far as we can see and has the advantage of speaking directly to historical relations. We present not only very good experimental results but also demonstrate how such text regressions can be interpreted directly and with surrogate topic models.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"70 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140019758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What drives non-linguists’ hands (or mouse) when drawing mental dialect maps? 在绘制思维方言图时,是什么驱动了非语言学家的双手(或鼠标)?
IF 0.8 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY Pub Date : 2024-02-27 DOI: 10.1093/llc/fqae003
Péter Jeszenszky, Carina Steiner, Nina von Allmen, Adrian Leemann
In perceptual dialectology, mental mapping is a popular tool used for eliciting attitudes and the spatial imprint of linguistic cognition from non-linguists, through tasking them with drawing about linguistic variations on maps. Despite the popularity of this method, research on the geometrical parameters of the shapes drawn on these maps has been limited. In our study, we utilized 500 mental maps, both digital and hand-drawn, introducing a new digital implementation for mental mapping (source code available). Our contribution presents the first perceptual dialectological outcomes of the ‘Swiss German Dialects in Time and Space’ project, which recorded a socio-demographically balanced corpus containing a large amount of quantitative personal data about participants that represent the entire Swiss German dialect continuum. Our first research question explores how various sociolinguistic variables and other variables related to personal background influence the geometrical parameters of shapes drawn, such as the number of shapes, their coverage of the language area, and their compactness. Statistical modelling reveals that dialect identity plays the most important role, while educational background, urbanity, and regional differences also affect more parameters. The second research question investigates the comparability between hand-drawn and digital mental maps, showing that they are generally comparable in terms of geometrical aspects, with minor limitations due to specific technical considerations in our digital method.
在感知方言学中,心智图法是一种常用的工具,通过让非语言学家在地图上绘制语言变体来激发他们的态度和语言认知的空间印记。尽管这种方法很受欢迎,但对这些地图上所画图形的几何参数的研究却很有限。在我们的研究中,我们使用了 500 幅数字和手绘的心理地图,并引入了一种新的心理地图数字实现方法(可提供源代码)。我们的成果展示了 "瑞士德语方言时空 "项目的首批感知方言学成果,该项目记录了一个社会-人口统计学平衡的语料库,其中包含大量关于参与者的定量个人数据,代表了整个瑞士德语方言连续体。我们的第一个研究问题是探讨各种社会语言变量和其他与个人背景相关的变量如何影响所绘制图形的几何参数,如图形的数量、其语言区域的覆盖范围和紧凑程度。统计建模显示,方言身份起着最为重要的作用,而教育背景、城市和地区差异也会影响更多的参数。第二个研究问题是调查手绘心理地图与数字心理地图之间的可比性,结果表明,就几何方面而言,两者总体上具有可比性,但由于我们的数字方法在特定技术方面的考虑,两者之间存在一些微小的局限性。
{"title":"What drives non-linguists’ hands (or mouse) when drawing mental dialect maps?","authors":"Péter Jeszenszky, Carina Steiner, Nina von Allmen, Adrian Leemann","doi":"10.1093/llc/fqae003","DOIUrl":"https://doi.org/10.1093/llc/fqae003","url":null,"abstract":"In perceptual dialectology, mental mapping is a popular tool used for eliciting attitudes and the spatial imprint of linguistic cognition from non-linguists, through tasking them with drawing about linguistic variations on maps. Despite the popularity of this method, research on the geometrical parameters of the shapes drawn on these maps has been limited. In our study, we utilized 500 mental maps, both digital and hand-drawn, introducing a new digital implementation for mental mapping (source code available). Our contribution presents the first perceptual dialectological outcomes of the ‘Swiss German Dialects in Time and Space’ project, which recorded a socio-demographically balanced corpus containing a large amount of quantitative personal data about participants that represent the entire Swiss German dialect continuum. Our first research question explores how various sociolinguistic variables and other variables related to personal background influence the geometrical parameters of shapes drawn, such as the number of shapes, their coverage of the language area, and their compactness. Statistical modelling reveals that dialect identity plays the most important role, while educational background, urbanity, and regional differences also affect more parameters. The second research question investigates the comparability between hand-drawn and digital mental maps, showing that they are generally comparable in terms of geometrical aspects, with minor limitations due to specific technical considerations in our digital method.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"80 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140019702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gender relations in Spanish theatre during the Silver Age: a quantitative comparison of works in the Spanish Drama Corpus 白银时代西班牙戏剧中的性别关系:西班牙戏剧语料库作品的定量比较
IF 0.8 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY Pub Date : 2024-02-27 DOI: 10.1093/llc/fqae007
Monika Dabrowska, María Teresa Santa María Fernández
One of the many changes witnessed by Spanish society at the beginning of the 20th century was the early reshaping of the role of women, including in the realm of theatre. During the first three decades of the new century, Spanish theatre was thriving, favouring the emergence of new gender roles: there were new female playwrights, professional actresses, stage designers, costume designers, theatre company directors, etc. Against this background, i.e. the awakening of female consciousness, it is worth exploring whether the growing position of women in public life goes hand in hand with a greater presence of female characters in the plays composed at that time. With a view to assessing the position of women in playwriting in the Silver Age of Spanish literature, twenty-five stage plays by nine playwrights written between 1878 and 1936 have been analysed, taken from the Spanish Drama Corpus, which forms part of the DraCor project. The distribution of male and female protagonists on stage and the influence of female presence in dramatic conflict have been traced based on quantitative textual factors. The study thus tests the potential of quantitative methods and their scope for the structural analysis of plays and studies on dramatic corpora from a gender perspective.
20 世纪初,西班牙社会发生了许多变化,其中之一就是妇女角色的早期重塑,包括在戏剧领域。在新世纪的前三十年,西班牙戏剧蓬勃发展,出现了新的性别角色:新的女剧作家、职业女演员、舞台设计师、服装设计师、剧团导演等。在女性意识觉醒的背景下,值得探讨的是,女性在公共生活中的地位不断提高是否与当时创作的戏剧中女性角色的增多相辅相成。为了评估西班牙文学白银时代女性在剧本创作中的地位,我们分析了九位剧作家在 1878 年至 1936 年间创作的 25 部舞台剧,这些剧本来自西班牙戏剧语料库,是 DraCor 项目的一部分。根据定量文本因素追踪了男女主角在舞台上的分布情况以及女性在戏剧冲突中的影响。因此,该研究检验了定量方法的潜力及其在从性别角度进行戏剧结构分析和戏剧语料库研究方面的应用范围。
{"title":"Gender relations in Spanish theatre during the Silver Age: a quantitative comparison of works in the Spanish Drama Corpus","authors":"Monika Dabrowska, María Teresa Santa María Fernández","doi":"10.1093/llc/fqae007","DOIUrl":"https://doi.org/10.1093/llc/fqae007","url":null,"abstract":"One of the many changes witnessed by Spanish society at the beginning of the 20th century was the early reshaping of the role of women, including in the realm of theatre. During the first three decades of the new century, Spanish theatre was thriving, favouring the emergence of new gender roles: there were new female playwrights, professional actresses, stage designers, costume designers, theatre company directors, etc. Against this background, i.e. the awakening of female consciousness, it is worth exploring whether the growing position of women in public life goes hand in hand with a greater presence of female characters in the plays composed at that time. With a view to assessing the position of women in playwriting in the Silver Age of Spanish literature, twenty-five stage plays by nine playwrights written between 1878 and 1936 have been analysed, taken from the Spanish Drama Corpus, which forms part of the DraCor project. The distribution of male and female protagonists on stage and the influence of female presence in dramatic conflict have been traced based on quantitative textual factors. The study thus tests the potential of quantitative methods and their scope for the structural analysis of plays and studies on dramatic corpora from a gender perspective.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"223 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140019744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disentangling semantic and prosodic features of English poetry 解构英语诗歌的语义和韵律特征
IF 0.8 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY Pub Date : 2024-02-27 DOI: 10.1093/llc/fqae008
Wenyi Shang, Ted Underwood
The distinction between genre and form is still contested in literary studies. While scholars associated with the New Formalism are criticized for perceiving everything as a form, digital humanists tend to argue that everything is a genre. In this research, we employed machine learning models to classify 36,635 English poems in the Chadwyck-Healey Literature Collections into twenty-seven categories, focusing on their semantic features (lexicons) and prosodic features (meters and rhymes) independently. Our findings reveal that different categories of poetry are distinguished by different groups of characteristics, without a clear-cut division between those driven predominantly by semantic features and those driven predominantly by prosodic features. Instead, poetry categories manifest a combination of semantic and prosodic elements, spanning a spectrum of different strengths in both domains. These findings suggest that the colloquial distinction between “genre” and “form” is based on real differences between poetic categories, although those differences may not be quite as crisply binary as the vocabulary implies.
在文学研究中,体裁与形式之间的区别仍然存在争议。与新形式主义相关的学者被批评为将一切都视为形式,而数字人文主义者则倾向于认为一切都属于体裁。在这项研究中,我们采用机器学习模型将《Chadwyck-Healey 文学作品集》中的 36,635 首英文诗歌分为二十七个类别,分别关注其语义特征(词典)和韵律特征(节拍和韵律)。我们的研究结果表明,不同类别的诗歌由不同的特征组区分开来,并没有明确区分主要由语义特征驱动的诗歌类别和主要由韵律特征驱动的诗歌类别。相反,诗歌类别表现出语义和拟声元素的结合,在这两个领域都有不同的优势。这些研究结果表明,"体裁 "与 "形式 "之间的俗称区别是基于诗歌类别之间的实际差异,尽管这些差异可能并不像词汇所暗示的那样具有明确的二元性。
{"title":"Disentangling semantic and prosodic features of English poetry","authors":"Wenyi Shang, Ted Underwood","doi":"10.1093/llc/fqae008","DOIUrl":"https://doi.org/10.1093/llc/fqae008","url":null,"abstract":"The distinction between genre and form is still contested in literary studies. While scholars associated with the New Formalism are criticized for perceiving everything as a form, digital humanists tend to argue that everything is a genre. In this research, we employed machine learning models to classify 36,635 English poems in the Chadwyck-Healey Literature Collections into twenty-seven categories, focusing on their semantic features (lexicons) and prosodic features (meters and rhymes) independently. Our findings reveal that different categories of poetry are distinguished by different groups of characteristics, without a clear-cut division between those driven predominantly by semantic features and those driven predominantly by prosodic features. Instead, poetry categories manifest a combination of semantic and prosodic elements, spanning a spectrum of different strengths in both domains. These findings suggest that the colloquial distinction between “genre” and “form” is based on real differences between poetic categories, although those differences may not be quite as crisply binary as the vocabulary implies.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"48 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140019592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Whose Anthropocene?: a data-driven look at the prospects for collaboration between natural science, social science, and the humanities 谁的人类世:以数据为导向审视自然科学、社会科学和人文学科之间的合作前景
IF 0.8 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY Pub Date : 2024-02-09 DOI: 10.1093/llc/fqae004
Carlos Santana, Kathryn Petrozzo, T J Perkins
Although the idea of the Anthropocene originated in the earth sciences, there have been increasing calls for questions about the Anthropocene to be addressed by pan-disciplinary groups of researchers from across the natural sciences, social sciences, and humanities. We use data analysis techniques from corpus linguistics to examine academic texts about the Anthropocene from these disciplinary families. We read the data to suggest that barriers to a broadly interdisciplinary study of the Anthropocene are high, but we are also able to identify some areas of common ground that could serve as interdisciplinary bridges.
尽管 "人类世 "的概念起源于地球科学,但越来越多的人呼吁由来自自然科学、社会科学和人文科学的泛学科研究小组来解决有关 "人类世 "的问题。我们使用语料库语言学的数据分析技术来研究这些学科中有关 "人类世 "的学术文本。我们读取的数据表明,对人类世进行广泛的跨学科研究障碍重重,但我们也能够找出一些共同点,作为跨学科的桥梁。
{"title":"Whose Anthropocene?: a data-driven look at the prospects for collaboration between natural science, social science, and the humanities","authors":"Carlos Santana, Kathryn Petrozzo, T J Perkins","doi":"10.1093/llc/fqae004","DOIUrl":"https://doi.org/10.1093/llc/fqae004","url":null,"abstract":"Although the idea of the Anthropocene originated in the earth sciences, there have been increasing calls for questions about the Anthropocene to be addressed by pan-disciplinary groups of researchers from across the natural sciences, social sciences, and humanities. We use data analysis techniques from corpus linguistics to examine academic texts about the Anthropocene from these disciplinary families. We read the data to suggest that barriers to a broadly interdisciplinary study of the Anthropocene are high, but we are also able to identify some areas of common ground that could serve as interdisciplinary bridges.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"19 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139758892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding poetry using natural language processing tools: a survey 使用自然语言处理工具理解诗歌:一项调查
IF 0.8 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY Pub Date : 2024-02-07 DOI: 10.1093/llc/fqae001
Mirella De Sisto, Laura Hernández-Lorenzo, Javier De la Rosa, Salvador Ros, Elena González-Blanco
Analyzing poetry with automatic tools has great potential for improving verse-related research. Over the last few decades, this field has expanded notably and a large number of tools aiming at analyzing various aspects of poetry have been developed. However, the concrete connection between these tools and traditional scholars investigating poetry and metrics is often missing. The purpose of this article is to bridge this gap by providing a comprehensive survey of the automatic poetry analysis tools available for European languages. The tools are described and classified according to the language for which they are primarily developed, and to their functionalities and purpose. Particular attention is given to those that have open-source code or provide an online version with the same functionality. Combining more traditional research with these tools has clear advantages: it provides the opportunity to address theoretical questions with the support of large amounts of data; also, it allows for the development of new and diversified approaches.
使用自动工具分析诗歌对于改进诗歌相关研究具有巨大潜力。在过去的几十年里,这一领域得到了显著的发展,开发出了大量旨在分析诗歌各个方面的工具。然而,这些工具与研究诗歌和度量衡的传统学者之间往往缺乏具体联系。本文旨在通过对欧洲语言的自动诗歌分析工具进行全面调查来弥补这一不足。本文对这些工具进行了描述,并根据其主要开发的语言、功能和目的进行了分类。特别关注那些拥有开放源代码或提供具有相同功能的在线版本的工具。将更传统的研究与这些工具相结合具有明显的优势:它提供了在大量数据支持下解决理论问题的机会;同时,它还允许开发新的和多样化的方法。
{"title":"Understanding poetry using natural language processing tools: a survey","authors":"Mirella De Sisto, Laura Hernández-Lorenzo, Javier De la Rosa, Salvador Ros, Elena González-Blanco","doi":"10.1093/llc/fqae001","DOIUrl":"https://doi.org/10.1093/llc/fqae001","url":null,"abstract":"Analyzing poetry with automatic tools has great potential for improving verse-related research. Over the last few decades, this field has expanded notably and a large number of tools aiming at analyzing various aspects of poetry have been developed. However, the concrete connection between these tools and traditional scholars investigating poetry and metrics is often missing. The purpose of this article is to bridge this gap by providing a comprehensive survey of the automatic poetry analysis tools available for European languages. The tools are described and classified according to the language for which they are primarily developed, and to their functionalities and purpose. Particular attention is given to those that have open-source code or provide an online version with the same functionality. Combining more traditional research with these tools has clear advantages: it provides the opportunity to address theoretical questions with the support of large amounts of data; also, it allows for the development of new and diversified approaches.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"4 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139758884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linguistic annotation of cuneiform texts using treebanks and deep learning 利用树库和深度学习对楔形文字进行语言注释
IF 0.8 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY Pub Date : 2024-02-02 DOI: 10.1093/llc/fqae002
Matthew Ong, Shai Gordin
We describe an efficient pipeline for morpho-syntactically annotating an ancient language corpus which takes advantage of bootstrapping techniques. This pipeline is designed for ancient language scholars looking to jump-start their own treebank projects, which can in turn serve further pedagogical research projects in the target language. We situate our work in the field of similar ancient language treebank projects, arguing that our approach shows that individual humanities scholars can leverage current machine-learning tools to produce their own richly annotated corpora. We illustrate this pipeline by producing a new Akkadian-language treebank based on two volumes from the online editions of the State Archives of Assyria project hosted on Oracc, as well as a spaCy language model named AkkParser trained on that treebank. Both of these are made publicly available for annotating other Akkadian corpora. In addition, we discuss linguistic issues particular to the Neo-Assyrian letter corpus and data-encoding complications of cuneiform texts in Oracc. The strategies, language models, and processing scripts we developed to handle both linguistic and data-encoding issues in this project will be of special interest to scholars seeking to develop their own cuneiform treebanks.
我们介绍了一种利用引导技术对古语语料进行形态-句法注释的高效方法。该管道专为希望启动自己的树状库项目的古语学者而设计,这些项目反过来又能为目标语言的进一步教学研究项目服务。我们将自己的工作定位在类似的古语树库项目领域,认为我们的方法表明,人文学者个人可以利用当前的机器学习工具制作自己的丰富注释语料库。我们在 Oracc 上托管的亚述国家档案馆(State Archives of Assyria)项目在线版本的两卷基础上制作了一个新的阿卡德语树状库,并在该树状库的基础上训练了一个名为 AkkParser 的 spaCy 语言模型,以此来说明这一方法。这两个模型都已公开发布,可用于注释其他阿卡德语语料库。此外,我们还讨论了新亚述字母语料库特有的语言问题以及 Oracc 中楔形文字的数据编码复杂性。在这个项目中,我们为处理语言和数据编码问题而开发的策略、语言模型和处理脚本将对寻求开发自己的楔形文字树库的学者有特别的意义。
{"title":"Linguistic annotation of cuneiform texts using treebanks and deep learning","authors":"Matthew Ong, Shai Gordin","doi":"10.1093/llc/fqae002","DOIUrl":"https://doi.org/10.1093/llc/fqae002","url":null,"abstract":"We describe an efficient pipeline for morpho-syntactically annotating an ancient language corpus which takes advantage of bootstrapping techniques. This pipeline is designed for ancient language scholars looking to jump-start their own treebank projects, which can in turn serve further pedagogical research projects in the target language. We situate our work in the field of similar ancient language treebank projects, arguing that our approach shows that individual humanities scholars can leverage current machine-learning tools to produce their own richly annotated corpora. We illustrate this pipeline by producing a new Akkadian-language treebank based on two volumes from the online editions of the State Archives of Assyria project hosted on Oracc, as well as a spaCy language model named AkkParser trained on that treebank. Both of these are made publicly available for annotating other Akkadian corpora. In addition, we discuss linguistic issues particular to the Neo-Assyrian letter corpus and data-encoding complications of cuneiform texts in Oracc. The strategies, language models, and processing scripts we developed to handle both linguistic and data-encoding issues in this project will be of special interest to scholars seeking to develop their own cuneiform treebanks.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"245 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139677891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Digital Scholarship in the Humanities
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1