支持基于语料库的莎士比亚语言研究:加强《第一对开本》的语料库

ICAME journal : computers in English linguistics Pub Date : 2021-05-01 DOI:10.2478/icame-2021-0002

Jonathan Culpeper, A. Hardie, J. Demmen, Jennifer Hughes, Matt Timperley

{"title":"支持基于语料库的莎士比亚语言研究:加强《第一对开本》的语料库","authors":"Jonathan Culpeper, A. Hardie, J. Demmen, Jennifer Hughes, Matt Timperley","doi":"10.2478/icame-2021-0002","DOIUrl":null,"url":null,"abstract":"Abstract This article explores challenges in the corpus linguistic analysis of Shakespeare’s language, and Early Modern English more generally, with particular focus on elaborating possible solutions and the benefits they bring. An account of work that took place within the Encyclopedia of Shakespeare’s Language Project (2016–2019) is given, which discusses the development of the project’s data resources, specifically, the Enhanced Shakespearean Corpus. Topics covered include the composition of the corpus and its subcomponents; the structure of the XML markup; the design of the extensive character metadata; and the word-level corpus annotation, including spelling regularisation, part-of-speech tagging, lemmatisation and semantic tagging. The challenges that arise from each of these undertakings are not exclusive to a corpus-based treatment of Shakespeare’s plays but it is in the context of Shakespeare’s language that they are so severe as to seem almost insurmountable. The solutions developed for the Enhanced Shakespearean Corpus – often combining automated manipulation with manual interventions, and always principled – offer a way through.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"4 1","pages":"37 - 86"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Supporting the corpus-based study of Shakespeare’s language: Enhancing a corpus of the First Folio\",\"authors\":\"Jonathan Culpeper, A. Hardie, J. Demmen, Jennifer Hughes, Matt Timperley\",\"doi\":\"10.2478/icame-2021-0002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract This article explores challenges in the corpus linguistic analysis of Shakespeare’s language, and Early Modern English more generally, with particular focus on elaborating possible solutions and the benefits they bring. An account of work that took place within the Encyclopedia of Shakespeare’s Language Project (2016–2019) is given, which discusses the development of the project’s data resources, specifically, the Enhanced Shakespearean Corpus. Topics covered include the composition of the corpus and its subcomponents; the structure of the XML markup; the design of the extensive character metadata; and the word-level corpus annotation, including spelling regularisation, part-of-speech tagging, lemmatisation and semantic tagging. The challenges that arise from each of these undertakings are not exclusive to a corpus-based treatment of Shakespeare’s plays but it is in the context of Shakespeare’s language that they are so severe as to seem almost insurmountable. The solutions developed for the Enhanced Shakespearean Corpus – often combining automated manipulation with manual interventions, and always principled – offer a way through.\",\"PeriodicalId\":73271,\"journal\":{\"name\":\"ICAME journal : computers in English linguistics\",\"volume\":\"4 1\",\"pages\":\"37 - 86\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICAME journal : computers in English linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/icame-2021-0002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICAME journal : computers in English linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/icame-2021-0002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

摘要:本文探讨了莎士比亚语言和早期现代英语语料库语言学分析中面临的挑战，并重点阐述了可能的解决方案及其带来的好处。本文介绍了在《莎士比亚语言百科全书》项目(2016-2019)中进行的工作，其中讨论了该项目数据资源的开发，特别是增强型莎士比亚语料库。涵盖的主题包括语料库及其子组件的组成;XML标记的结构;扩展字符元数据的设计;词级语料库标注，包括拼写规则化、词性标注、词源化和语义标注。这些挑战并不仅限于基于语料库的莎士比亚戏剧处理，但在莎士比亚的语言背景下，这些挑战是如此严峻，以至于几乎无法克服。为增强型莎士比亚语料库开发的解决方案——通常将自动操作与人工干预相结合，并且始终具有原则性——提供了一种解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Supporting the corpus-based study of Shakespeare’s language: Enhancing a corpus of the First Folio

Abstract This article explores challenges in the corpus linguistic analysis of Shakespeare’s language, and Early Modern English more generally, with particular focus on elaborating possible solutions and the benefits they bring. An account of work that took place within the Encyclopedia of Shakespeare’s Language Project (2016–2019) is given, which discusses the development of the project’s data resources, specifically, the Enhanced Shakespearean Corpus. Topics covered include the composition of the corpus and its subcomponents; the structure of the XML markup; the design of the extensive character metadata; and the word-level corpus annotation, including spelling regularisation, part-of-speech tagging, lemmatisation and semantic tagging. The challenges that arise from each of these undertakings are not exclusive to a corpus-based treatment of Shakespeare’s plays but it is in the context of Shakespeare’s language that they are so severe as to seem almost insurmountable. The solutions developed for the Enhanced Shakespearean Corpus – often combining automated manipulation with manual interventions, and always principled – offer a way through.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICAME journal : computers in English linguistics

自引率

0.00%

发文量

审稿时长

32 weeks