Early Modern Multiloquent Authors (EMMA): Designing a large-scale corpus of individuals’ languages

P. Petré, Lynn Anthonissen, Sara Budts, Enrique Manjavacas, Emma-Louise Silva, William H. Standing, Odile A. O. Strik
{"title":"Early Modern Multiloquent Authors (EMMA): Designing a large-scale corpus of individuals’ languages","authors":"P. Petré, Lynn Anthonissen, Sara Budts, Enrique Manjavacas, Emma-Louise Silva, William H. Standing, Odile A. O. Strik","doi":"10.2478/icame-2019-0004","DOIUrl":null,"url":null,"abstract":"Abstract The present article provides a detailed description of the corpus of Early Modern Multiloquent Authors (EMMA), as well as two small case studies that illustrate its benefits. As a large-scale specialized corpus, EMMA tries to strike the right balance between big data and sociolinguistic coverage. It comprises the writings of 50 carefully selected authors across five generations, mostly taken from the 17th-century London society. EMMA enables the study of language as both a social and cognitive phenomenon and allows us to explore the interaction between the individual and aggregate levels. The first part of the article is a detailed description of EMMA’s first release as well as the sociolinguistic and methodological principles that underlie its design and compilation. We cover the conceptual decisions and practical implementations at various stages of the compilation process: from text-markup, encoding and data preprocessing to metadata enrichment and verification. In the second part, we present two small case studies to illustrate how rich contextualization can guide the interpretation of quantitative corpus-linguistic findings. The first case study compares the past tense formation of strong verbs in writers without access to higher education to that of writers with an extensive training in Latin. The second case study relates s/th-variation in the language of a single writer, Margaret Cavendish, to major shifts in her personal life.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICAME journal : computers in English linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/icame-2019-0004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

Abstract The present article provides a detailed description of the corpus of Early Modern Multiloquent Authors (EMMA), as well as two small case studies that illustrate its benefits. As a large-scale specialized corpus, EMMA tries to strike the right balance between big data and sociolinguistic coverage. It comprises the writings of 50 carefully selected authors across five generations, mostly taken from the 17th-century London society. EMMA enables the study of language as both a social and cognitive phenomenon and allows us to explore the interaction between the individual and aggregate levels. The first part of the article is a detailed description of EMMA’s first release as well as the sociolinguistic and methodological principles that underlie its design and compilation. We cover the conceptual decisions and practical implementations at various stages of the compilation process: from text-markup, encoding and data preprocessing to metadata enrichment and verification. In the second part, we present two small case studies to illustrate how rich contextualization can guide the interpretation of quantitative corpus-linguistic findings. The first case study compares the past tense formation of strong verbs in writers without access to higher education to that of writers with an extensive training in Latin. The second case study relates s/th-variation in the language of a single writer, Margaret Cavendish, to major shifts in her personal life.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
早期现代多语作者(EMMA):设计一个大规模的个人语言语料库
摘要本文提供了早期现代多语作者(EMMA)语料库的详细描述,以及两个小的案例研究,说明其好处。作为一个大规模的专业语料库,EMMA试图在大数据和社会语言学覆盖之间取得适当的平衡。它包括50位精心挑选的五代作家的作品,大部分来自17世纪的伦敦社会。EMMA使语言作为一种社会现象和认知现象进行研究,并使我们能够探索个体和集体层面之间的相互作用。文章的第一部分详细描述了EMMA的第一个版本,以及作为其设计和编译基础的社会语言学和方法论原则。我们涵盖了编译过程各个阶段的概念决策和实际实现:从文本标记、编码和数据预处理到元数据充实和验证。在第二部分中,我们提出了两个小的案例研究来说明丰富的语境化如何指导定量语料库语言发现的解释。第一个案例研究比较了没有受过高等教育的作家和受过广泛拉丁语训练的作家的强烈动词的过去时形式。第二个案例研究将一位作家玛格丽特·卡文迪什的语言变化与她个人生活的重大转变联系起来。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
32 weeks
期刊最新文献
Ole Schützler and Julia Schlüter (eds.). Data and methods in corpus linguistics. Comparative approaches. Cambridge: Cambridge University Press, 2022. 357 pp. ISBN 978-1-10849964-4 Compiling a corpus of South Asian online Englishes: A report, some reflections and a pilot study A comparative corpus-based investigation of results sections of research articles in Applied Linguistics and Physics Tony McEnery and Vaclav Brezina. Fundamental principles of corpus linguistics. Cambridge: Cambridge University Press, 2022. 313 pp. ISBN 978-1-1071-1062-5 Gender and evaluation in contemporary American English: A corpus study based on pronominal and nominal expressions with male and female reference
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1