首页 > 最新文献

ICAME journal : computers in English linguistics最新文献

英文 中文
Better data for more researchers – using the audio features of BNCweb 为更多的研究人员提供更好的数据——利用BNCweb的音频功能
Pub Date : 2021-05-01 DOI: 10.2478/icame-2021-0004
S. Hoffmann, Sabine Arndt-Lappe
Abstract In spite of the wide agreement among linguists as to the significance of spoken language data, actual speech data have not formed the basis of empirical work on English as much as one would think. The present paper is intended to contribute to changing this situation, on a theoretical and on a practical level. On a theoretical level, we discuss different research traditions within (English) linguistics. Whereas speech data have become increasingly important in various linguistic disciplines, major corpora of English developed within the corpus-linguistic community, carefully sampled to be representative of language usage, are usually restricted to orthographic transcriptions of spoken language. As a result, phonological phenomena have remained conspicuously understudied within traditional corpus linguistics. At the same time, work with current speech corpora often requires a considerable level of specialist knowledge and tailor-made solutions. On a practical level, we present a new feature of BNCweb (Hoffmann et al. 2008), a user-friendly interface to the British National Corpus, which gives users access to audio and phonemic transcriptions of more than five million words of spontaneous speech. With the help of a pilot study on the variability of intrusive r we illustrate the scope of the new possibilities.
尽管语言学家对口语数据的重要性达成了广泛的共识,但实际的口语数据并没有像人们想象的那样成为英语实证研究的基础。本文旨在从理论和实践两方面为改变这种情况作出贡献。在理论层面上,我们讨论(英语)语言学中不同的研究传统。虽然语音数据在各个语言学科中变得越来越重要,但在语料库语言社区中开发的主要英语语料库,经过仔细采样以代表语言使用,通常仅限于口语的正字法转录。因此,在传统语料库语言学中,语音现象的研究明显不足。同时,使用当前的语音语料库通常需要相当水平的专业知识和量身定制的解决方案。在实践层面上,我们提出了BNCweb的一个新功能(Hoffmann et al. 2008),这是一个对英国国家语料库的用户友好界面,使用户可以访问超过500万单词的自发语音的音频和音位转录。通过对侵入性r变异性的初步研究,我们阐明了新的可能性的范围。
{"title":"Better data for more researchers – using the audio features of BNCweb","authors":"S. Hoffmann, Sabine Arndt-Lappe","doi":"10.2478/icame-2021-0004","DOIUrl":"https://doi.org/10.2478/icame-2021-0004","url":null,"abstract":"Abstract In spite of the wide agreement among linguists as to the significance of spoken language data, actual speech data have not formed the basis of empirical work on English as much as one would think. The present paper is intended to contribute to changing this situation, on a theoretical and on a practical level. On a theoretical level, we discuss different research traditions within (English) linguistics. Whereas speech data have become increasingly important in various linguistic disciplines, major corpora of English developed within the corpus-linguistic community, carefully sampled to be representative of language usage, are usually restricted to orthographic transcriptions of spoken language. As a result, phonological phenomena have remained conspicuously understudied within traditional corpus linguistics. At the same time, work with current speech corpora often requires a considerable level of specialist knowledge and tailor-made solutions. On a practical level, we present a new feature of BNCweb (Hoffmann et al. 2008), a user-friendly interface to the British National Corpus, which gives users access to audio and phonemic transcriptions of more than five million words of spontaneous speech. With the help of a pilot study on the variability of intrusive r we illustrate the scope of the new possibilities.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"5 1","pages":"125 - 154"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88831729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Supporting the corpus-based study of Shakespeare’s language: Enhancing a corpus of the First Folio 支持基于语料库的莎士比亚语言研究:加强《第一对开本》的语料库
Pub Date : 2021-05-01 DOI: 10.2478/icame-2021-0002
Jonathan Culpeper, A. Hardie, J. Demmen, Jennifer Hughes, Matt Timperley
Abstract This article explores challenges in the corpus linguistic analysis of Shakespeare’s language, and Early Modern English more generally, with particular focus on elaborating possible solutions and the benefits they bring. An account of work that took place within the Encyclopedia of Shakespeare’s Language Project (2016–2019) is given, which discusses the development of the project’s data resources, specifically, the Enhanced Shakespearean Corpus. Topics covered include the composition of the corpus and its subcomponents; the structure of the XML markup; the design of the extensive character metadata; and the word-level corpus annotation, including spelling regularisation, part-of-speech tagging, lemmatisation and semantic tagging. The challenges that arise from each of these undertakings are not exclusive to a corpus-based treatment of Shakespeare’s plays but it is in the context of Shakespeare’s language that they are so severe as to seem almost insurmountable. The solutions developed for the Enhanced Shakespearean Corpus – often combining automated manipulation with manual interventions, and always principled – offer a way through.
摘要:本文探讨了莎士比亚语言和早期现代英语语料库语言学分析中面临的挑战,并重点阐述了可能的解决方案及其带来的好处。本文介绍了在《莎士比亚语言百科全书》项目(2016-2019)中进行的工作,其中讨论了该项目数据资源的开发,特别是增强型莎士比亚语料库。涵盖的主题包括语料库及其子组件的组成;XML标记的结构;扩展字符元数据的设计;词级语料库标注,包括拼写规则化、词性标注、词源化和语义标注。这些挑战并不仅限于基于语料库的莎士比亚戏剧处理,但在莎士比亚的语言背景下,这些挑战是如此严峻,以至于几乎无法克服。为增强型莎士比亚语料库开发的解决方案——通常将自动操作与人工干预相结合,并且始终具有原则性——提供了一种解决方案。
{"title":"Supporting the corpus-based study of Shakespeare’s language: Enhancing a corpus of the First Folio","authors":"Jonathan Culpeper, A. Hardie, J. Demmen, Jennifer Hughes, Matt Timperley","doi":"10.2478/icame-2021-0002","DOIUrl":"https://doi.org/10.2478/icame-2021-0002","url":null,"abstract":"Abstract This article explores challenges in the corpus linguistic analysis of Shakespeare’s language, and Early Modern English more generally, with particular focus on elaborating possible solutions and the benefits they bring. An account of work that took place within the Encyclopedia of Shakespeare’s Language Project (2016–2019) is given, which discusses the development of the project’s data resources, specifically, the Enhanced Shakespearean Corpus. Topics covered include the composition of the corpus and its subcomponents; the structure of the XML markup; the design of the extensive character metadata; and the word-level corpus annotation, including spelling regularisation, part-of-speech tagging, lemmatisation and semantic tagging. The challenges that arise from each of these undertakings are not exclusive to a corpus-based treatment of Shakespeare’s plays but it is in the context of Shakespeare’s language that they are so severe as to seem almost insurmountable. The solutions developed for the Enhanced Shakespearean Corpus – often combining automated manipulation with manual interventions, and always principled – offer a way through.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"4 1","pages":"37 - 86"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91336014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Comparing written Indian Englishes with the new Corpus of Regional Indian Newspaper Englishes (CORINNE) 印度书面英语与新印度地区报纸英语语料库的比较
Pub Date : 2021-05-01 DOI: 10.2478/icame-2021-0006
Asya Yurchenko, S. Leuckert, C. Lange
Abstract This article introduces the new Corpus of Regional Indian Newspaper Englishes (CORINNE). The current version of CORINNE contains news and other text types from regional Indian newspapers published between 2015 and 2020, covering 13 states and regions so far. The corpus complements previous corpora, such as the Indian component of the International Corpus of English (ICE) as well as the Indian section of the South Asian Varieties of English (SAVE) corpus, by giving researchers the opportunity to analyse and compare regional (written) Englishes in India. In the first sections of the paper we discuss the rationale for creating CORINNE as well as the development of the corpus. We stress the potential of CORINNE and go into detail about selection criteria for the inclusion of newspapers as well as corpus compilation and the current word count. In order to show the potential of the corpus, the paper presents a case study of ‘intrusive as’, a syntactic feature that has made its way into formal registers of Indian English. Based on two subcorpora covering newspapers from Tamil Nadu and Uttarakhand, we compare frequencies and usage patterns of call (as) and term (as). The case study lends further weight to the hypothesis that the presence or absence of a quotative in the majority language spoken in an Indian state has an impact on the frequency of ‘intrusive as’. Finally, we foreshadow the next steps in the development of CORINNE as well as potential studies that can be carried out using the corpus.
摘要:本文介绍了新的印度地区报纸英语语料库(CORINNE)。当前版本的CORINNE包含了2015年至2020年间出版的印度地方报纸的新闻和其他文本类型,迄今为止覆盖了13个邦和地区。该语料库补充了以前的语料库,如国际英语语料库(ICE)的印度部分以及南亚英语变体语料库(SAVE)的印度部分,为研究人员提供了分析和比较印度地区(书面)英语的机会。在本文的第一部分中,我们讨论了创建CORINNE的基本原理以及语料库的发展。我们强调了CORINNE的潜力,并详细介绍了包括报纸以及语料库汇编和当前字数统计的选择标准。为了展示语料库的潜力,本文提出了一个关于“侵入性as”的案例研究,这是一个已经进入印度英语正式语域的句法特征。基于两个覆盖泰米尔纳德邦和北阿坎德邦报纸的子语料库,我们比较了呼叫(as)和术语(as)的频率和使用模式。该案例研究进一步支持了一种假设,即在印度邦使用的多数语言中,引号的存在与否会影响“侵入性as”的频率。最后,我们预测了CORINNE的下一步发展,以及使用语料库可以进行的潜在研究。
{"title":"Comparing written Indian Englishes with the new Corpus of Regional Indian Newspaper Englishes (CORINNE)","authors":"Asya Yurchenko, S. Leuckert, C. Lange","doi":"10.2478/icame-2021-0006","DOIUrl":"https://doi.org/10.2478/icame-2021-0006","url":null,"abstract":"Abstract This article introduces the new Corpus of Regional Indian Newspaper Englishes (CORINNE). The current version of CORINNE contains news and other text types from regional Indian newspapers published between 2015 and 2020, covering 13 states and regions so far. The corpus complements previous corpora, such as the Indian component of the International Corpus of English (ICE) as well as the Indian section of the South Asian Varieties of English (SAVE) corpus, by giving researchers the opportunity to analyse and compare regional (written) Englishes in India. In the first sections of the paper we discuss the rationale for creating CORINNE as well as the development of the corpus. We stress the potential of CORINNE and go into detail about selection criteria for the inclusion of newspapers as well as corpus compilation and the current word count. In order to show the potential of the corpus, the paper presents a case study of ‘intrusive as’, a syntactic feature that has made its way into formal registers of Indian English. Based on two subcorpora covering newspapers from Tamil Nadu and Uttarakhand, we compare frequencies and usage patterns of call (as) and term (as). The case study lends further weight to the hypothesis that the presence or absence of a quotative in the majority language spoken in an Indian state has an impact on the frequency of ‘intrusive as’. Finally, we foreshadow the next steps in the development of CORINNE as well as potential studies that can be carried out using the corpus.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"29 1","pages":"179 - 205"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80451439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evidentiality in gendered styles in spoken English 英语口语中性别风格的证据性
Pub Date : 2020-03-01 DOI: 10.2478/icame-2020-0001
E. Söderqvist
{"title":"Evidentiality in gendered styles in spoken English","authors":"E. Söderqvist","doi":"10.2478/icame-2020-0001","DOIUrl":"https://doi.org/10.2478/icame-2020-0001","url":null,"abstract":"","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"34 1","pages":"35 - 5"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78428150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Corpus linguistics and the description of English 语料库语言学与英语描述
Pub Date : 2020-03-01 DOI: 10.2478/icame-2020-0006
Stefan Diemer
{"title":"Corpus linguistics and the description of English","authors":"Stefan Diemer","doi":"10.2478/icame-2020-0006","DOIUrl":"https://doi.org/10.2478/icame-2020-0006","url":null,"abstract":"","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"12 1","pages":"105 - 109"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75267780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applications of pattern-driven methods in corpus linguistics 模式驱动方法在语料库语言学中的应用
Pub Date : 2020-03-01 DOI: 10.2478/icame-2020-0005
C. Geisler
{"title":"Applications of pattern-driven methods in corpus linguistics","authors":"C. Geisler","doi":"10.2478/icame-2020-0005","DOIUrl":"https://doi.org/10.2478/icame-2020-0005","url":null,"abstract":"","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"119 1","pages":"102 - 104"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77481583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corpus linguistics and African Englishes 语料库语言学与非洲英语
Pub Date : 2020-03-01 DOI: 10.2478/icame-2020-0004
Frederic Zähres
{"title":"Corpus linguistics and African Englishes","authors":"Frederic Zähres","doi":"10.2478/icame-2020-0004","DOIUrl":"https://doi.org/10.2478/icame-2020-0004","url":null,"abstract":"","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"4 1","pages":"101 - 97"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73979055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Issues and challenges in compiling a corpus of Early Modern English plays for comparison with those of William Shakespeare 编纂早期现代英语戏剧语料库以与莎士比亚戏剧进行比较的问题和挑战
Pub Date : 2020-03-01 DOI: 10.2478/icame-2020-0002
J. Demmen
Abstract In this article I discuss the issues and challenges of compiling a corpus of historical plays by a range of playwrights that is highly suitable for use in comparative, corpus-based research into language style in Shakespeare’s plays. In discussing sources for digitised historical play-texts and criteria for making a selection for the present study, I argue that not just any set of Early Modern English plays constitutes a suitable basis upon which to make reliable claims about language style in Shakespeare’s plays relative to those of his peers. I point out factors outside of authorial choice which potentially have bearing on language style, such as sub-genre features and change over time. I also highlight some particular difficulties in compiling a corpus of historical texts, notably dating and spelling variation, and I explain how these were addressed. The corpus detailed in this article extends the prospects for investigating Shakespeare’s language style by providing a context into which it can be set and, as I indicate, is a valuable new publicly accessible resource for future research.
在这篇文章中,我讨论了编纂一系列剧作家的历史剧语料库的问题和挑战,这些语料库非常适合用于比较,基于语料库的莎士比亚戏剧语言风格研究。在讨论数字化历史戏剧文本的来源和为本研究选择的标准时,我认为,并不是任何一套早期现代英语戏剧都构成了一个合适的基础,可以在此基础上对莎士比亚戏剧的语言风格做出可靠的断言。我指出了作者选择之外的因素,这些因素可能会影响语言风格,比如子体裁特征和随时间的变化。我还强调了编写历史文本语料库的一些特别困难,特别是日期和拼写变化,并解释了如何解决这些问题。本文中详细介绍的语料库通过提供一个可以设置的上下文,扩展了研究莎士比亚语言风格的前景,正如我所指出的,这是一个有价值的新的公共资源,可以用于未来的研究。
{"title":"Issues and challenges in compiling a corpus of Early Modern English plays for comparison with those of William Shakespeare","authors":"J. Demmen","doi":"10.2478/icame-2020-0002","DOIUrl":"https://doi.org/10.2478/icame-2020-0002","url":null,"abstract":"Abstract In this article I discuss the issues and challenges of compiling a corpus of historical plays by a range of playwrights that is highly suitable for use in comparative, corpus-based research into language style in Shakespeare’s plays. In discussing sources for digitised historical play-texts and criteria for making a selection for the present study, I argue that not just any set of Early Modern English plays constitutes a suitable basis upon which to make reliable claims about language style in Shakespeare’s plays relative to those of his peers. I point out factors outside of authorial choice which potentially have bearing on language style, such as sub-genre features and change over time. I also highlight some particular difficulties in compiling a corpus of historical texts, notably dating and spelling variation, and I explain how these were addressed. The corpus detailed in this article extends the prospects for investigating Shakespeare’s language style by providing a context into which it can be set and, as I indicate, is a valuable new publicly accessible resource for future research.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"192 1","pages":"37 - 68"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73366773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
There’s more to alternations than the main diagonal of a 2×2 confusion matrix: Improvements of MuPDAR and other classificatory alternation studies 除了2×2混淆矩阵的主对角线之外,还有更多关于交替的内容:MuPDAR和其他分类交替研究的改进
Pub Date : 2020-03-01 DOI: 10.2478/icame-2020-0003
S. Gries, Santa Barbara, J. Liebig, Sandra C. Deshors
Abstract Corpus-based studies of learner language and (especially) English varieties have become more quantitative in nature and increasingly use regression-based methods and classifiers such as classification trees, random forests, etc. One recent development more widely used is the MuPDAR (Multifactorial Prediction and Deviation Analysis using Regressions) approach of Gries and Deshors (2014) and Gries and Adelman (2014). This approach attempts to improve on traditional regression- or tree-based approaches by, firstly, training a model on the reference speakers (often native speakers (NS) in learner corpus studies or British English speakers in variety studies), then, secondly, using this model to predict what such a reference speaker would produce in the situation the target speaker is in (often non-native speakers (NNS) or indigenized-variety speakers). Crucially, the third step then consists of determining whether the target speakers made a canonical choice or not and explore that variability with a second regression model or classifier. Both regression-based modeling in general and MuPDAR in particular have led to many interesting results, but we want to propose two changes in perspective on the results they produce. First, we want to focus attention on the middle ground of the prediction space, i.e. the predictions of a regression/classifier that, essentially, are made non-confidently and translate into a statement such as ‘in this context, both/all alternants would be fine’. Second, we want to make a plug for a greater attention to misclassifications/-predictions and propose a method to identify those as well as discuss what we can learn from studying them. We exemplify our two suggestions based on a brief case study, namely the dative alternation in native and learner corpus data.
基于语料库的学习者语言特别是英语变体的研究在本质上变得更加定量化,并且越来越多地使用基于回归的方法和分类树、随机森林等分类器。Gries和Deshors(2014)和Gries和Adelman(2014)的MuPDAR(使用回归的多因子预测和偏差分析)方法最近得到了更广泛的应用。这种方法试图改进传统的基于回归或树的方法,首先,在参考说话者(通常是学习者语料库研究中的母语说话者(NS)或多样性研究中的英国英语说话者)上训练一个模型,然后,其次,使用这个模型来预测这样的参考说话者在目标说话者所处的情况下会产生什么(通常是非母语说话者(NNS)或本土化的多样性说话者)。至关重要的是,第三步包括确定目标说话者是否做出了规范选择,并使用第二个回归模型或分类器探索这种可变性。基于回归的建模和基于MuPDAR的建模都产生了许多有趣的结果,但是我们想从两个角度对它们产生的结果提出改变。首先,我们想把注意力集中在预测空间的中间地带,即回归/分类器的预测,本质上是不自信的,并转化为“在这种情况下,两种/所有替代方案都可以”这样的陈述。其次,我们希望对错误分类/预测给予更多的关注,并提出一种识别这些错误的方法,并讨论我们可以从研究中学到什么。通过一个简短的案例研究,我们举例说明了我们的两个建议,即母语和学习者语料库数据的替代替代。
{"title":"There’s more to alternations than the main diagonal of a 2×2 confusion matrix: Improvements of MuPDAR and other classificatory alternation studies","authors":"S. Gries, Santa Barbara, J. Liebig, Sandra C. Deshors","doi":"10.2478/icame-2020-0003","DOIUrl":"https://doi.org/10.2478/icame-2020-0003","url":null,"abstract":"Abstract Corpus-based studies of learner language and (especially) English varieties have become more quantitative in nature and increasingly use regression-based methods and classifiers such as classification trees, random forests, etc. One recent development more widely used is the MuPDAR (Multifactorial Prediction and Deviation Analysis using Regressions) approach of Gries and Deshors (2014) and Gries and Adelman (2014). This approach attempts to improve on traditional regression- or tree-based approaches by, firstly, training a model on the reference speakers (often native speakers (NS) in learner corpus studies or British English speakers in variety studies), then, secondly, using this model to predict what such a reference speaker would produce in the situation the target speaker is in (often non-native speakers (NNS) or indigenized-variety speakers). Crucially, the third step then consists of determining whether the target speakers made a canonical choice or not and explore that variability with a second regression model or classifier. Both regression-based modeling in general and MuPDAR in particular have led to many interesting results, but we want to propose two changes in perspective on the results they produce. First, we want to focus attention on the middle ground of the prediction space, i.e. the predictions of a regression/classifier that, essentially, are made non-confidently and translate into a statement such as ‘in this context, both/all alternants would be fine’. Second, we want to make a plug for a greater attention to misclassifications/-predictions and propose a method to identify those as well as discuss what we can learn from studying them. We exemplify our two suggestions based on a brief case study, namely the dative alternation in native and learner corpus data.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"407 1","pages":"69 - 96"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84870247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
AusBrown: A new diachronic corpus of Australian English 一个新的澳洲英语历时语料库
Pub Date : 2019-03-01 DOI: 10.2478/icame-2019-0001
P. Collins, Xinyue Yao
Abstract This paper presents a newly-compiled diachronic corpus of Australian English (AusBrown). With four sampling time points (1931, 1961, 1991 and 2006), Aus-Brown is designed to match the current suite of British and American ‘Brown-family’ corpora in both sampling year and design. We provide details of the composition and compilation of AusBrown, and explore the broader context of its ‘Brown-family background’ and of complementary Australian corpora. We also overview research based on the Australian corpora presented, including several AusBrown-based papers.
摘要本文介绍了新编的澳大利亚英语历时语料库(AusBrown)。有四个采样时间点(1931年、1961年、1991年和2006年),Aus-Brown旨在在采样年份和设计上与当前英国和美国的“布朗家族”语料库相匹配。我们详细介绍了AusBrown的组成和汇编,并探讨了其“布朗家族背景”和互补的澳大利亚语料库的更广泛背景。我们还概述了基于澳大利亚语料库的研究,包括几篇基于ausbrown的论文。
{"title":"AusBrown: A new diachronic corpus of Australian English","authors":"P. Collins, Xinyue Yao","doi":"10.2478/icame-2019-0001","DOIUrl":"https://doi.org/10.2478/icame-2019-0001","url":null,"abstract":"Abstract This paper presents a newly-compiled diachronic corpus of Australian English (AusBrown). With four sampling time points (1931, 1961, 1991 and 2006), Aus-Brown is designed to match the current suite of British and American ‘Brown-family’ corpora in both sampling year and design. We provide details of the composition and compilation of AusBrown, and explore the broader context of its ‘Brown-family background’ and of complementary Australian corpora. We also overview research based on the Australian corpora presented, including several AusBrown-based papers.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"8 1","pages":"21 - 5"},"PeriodicalIF":0.0,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87858347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
ICAME journal : computers in English linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1