支持基于语料库的莎士比亚语言研究:加强《第一对开本》的语料库

Jonathan Culpeper, A. Hardie, J. Demmen, Jennifer Hughes, Matt Timperley
{"title":"支持基于语料库的莎士比亚语言研究:加强《第一对开本》的语料库","authors":"Jonathan Culpeper, A. Hardie, J. Demmen, Jennifer Hughes, Matt Timperley","doi":"10.2478/icame-2021-0002","DOIUrl":null,"url":null,"abstract":"Abstract This article explores challenges in the corpus linguistic analysis of Shakespeare’s language, and Early Modern English more generally, with particular focus on elaborating possible solutions and the benefits they bring. An account of work that took place within the Encyclopedia of Shakespeare’s Language Project (2016–2019) is given, which discusses the development of the project’s data resources, specifically, the Enhanced Shakespearean Corpus. Topics covered include the composition of the corpus and its subcomponents; the structure of the XML markup; the design of the extensive character metadata; and the word-level corpus annotation, including spelling regularisation, part-of-speech tagging, lemmatisation and semantic tagging. The challenges that arise from each of these undertakings are not exclusive to a corpus-based treatment of Shakespeare’s plays but it is in the context of Shakespeare’s language that they are so severe as to seem almost insurmountable. The solutions developed for the Enhanced Shakespearean Corpus – often combining automated manipulation with manual interventions, and always principled – offer a way through.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Supporting the corpus-based study of Shakespeare’s language: Enhancing a corpus of the First Folio\",\"authors\":\"Jonathan Culpeper, A. Hardie, J. Demmen, Jennifer Hughes, Matt Timperley\",\"doi\":\"10.2478/icame-2021-0002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract This article explores challenges in the corpus linguistic analysis of Shakespeare’s language, and Early Modern English more generally, with particular focus on elaborating possible solutions and the benefits they bring. An account of work that took place within the Encyclopedia of Shakespeare’s Language Project (2016–2019) is given, which discusses the development of the project’s data resources, specifically, the Enhanced Shakespearean Corpus. Topics covered include the composition of the corpus and its subcomponents; the structure of the XML markup; the design of the extensive character metadata; and the word-level corpus annotation, including spelling regularisation, part-of-speech tagging, lemmatisation and semantic tagging. The challenges that arise from each of these undertakings are not exclusive to a corpus-based treatment of Shakespeare’s plays but it is in the context of Shakespeare’s language that they are so severe as to seem almost insurmountable. The solutions developed for the Enhanced Shakespearean Corpus – often combining automated manipulation with manual interventions, and always principled – offer a way through.\",\"PeriodicalId\":73271,\"journal\":{\"name\":\"ICAME journal : computers in English linguistics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICAME journal : computers in English linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/icame-2021-0002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICAME journal : computers in English linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/icame-2021-0002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

摘要:本文探讨了莎士比亚语言和早期现代英语语料库语言学分析中面临的挑战,并重点阐述了可能的解决方案及其带来的好处。本文介绍了在《莎士比亚语言百科全书》项目(2016-2019)中进行的工作,其中讨论了该项目数据资源的开发,特别是增强型莎士比亚语料库。涵盖的主题包括语料库及其子组件的组成;XML标记的结构;扩展字符元数据的设计;词级语料库标注,包括拼写规则化、词性标注、词源化和语义标注。这些挑战并不仅限于基于语料库的莎士比亚戏剧处理,但在莎士比亚的语言背景下,这些挑战是如此严峻,以至于几乎无法克服。为增强型莎士比亚语料库开发的解决方案——通常将自动操作与人工干预相结合,并且始终具有原则性——提供了一种解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Supporting the corpus-based study of Shakespeare’s language: Enhancing a corpus of the First Folio
Abstract This article explores challenges in the corpus linguistic analysis of Shakespeare’s language, and Early Modern English more generally, with particular focus on elaborating possible solutions and the benefits they bring. An account of work that took place within the Encyclopedia of Shakespeare’s Language Project (2016–2019) is given, which discusses the development of the project’s data resources, specifically, the Enhanced Shakespearean Corpus. Topics covered include the composition of the corpus and its subcomponents; the structure of the XML markup; the design of the extensive character metadata; and the word-level corpus annotation, including spelling regularisation, part-of-speech tagging, lemmatisation and semantic tagging. The challenges that arise from each of these undertakings are not exclusive to a corpus-based treatment of Shakespeare’s plays but it is in the context of Shakespeare’s language that they are so severe as to seem almost insurmountable. The solutions developed for the Enhanced Shakespearean Corpus – often combining automated manipulation with manual interventions, and always principled – offer a way through.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
32 weeks
期刊最新文献
Ole Schützler and Julia Schlüter (eds.). Data and methods in corpus linguistics. Comparative approaches. Cambridge: Cambridge University Press, 2022. 357 pp. ISBN 978-1-10849964-4 Compiling a corpus of South Asian online Englishes: A report, some reflections and a pilot study A comparative corpus-based investigation of results sections of research articles in Applied Linguistics and Physics Tony McEnery and Vaclav Brezina. Fundamental principles of corpus linguistics. Cambridge: Cambridge University Press, 2022. 313 pp. ISBN 978-1-1071-1062-5 Gender and evaluation in contemporary American English: A corpus study based on pronominal and nominal expressions with male and female reference
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1