除了2×2混淆矩阵的主对角线之外，还有更多关于交替的内容:MuPDAR和其他分类交替研究的改进

ICAME journal : computers in English linguistics Pub Date : 2020-03-01 DOI:10.2478/icame-2020-0003

S. Gries, Santa Barbara, J. Liebig, Sandra C. Deshors

{"title":"除了2×2混淆矩阵的主对角线之外，还有更多关于交替的内容:MuPDAR和其他分类交替研究的改进","authors":"S. Gries, Santa Barbara, J. Liebig, Sandra C. Deshors","doi":"10.2478/icame-2020-0003","DOIUrl":null,"url":null,"abstract":"Abstract Corpus-based studies of learner language and (especially) English varieties have become more quantitative in nature and increasingly use regression-based methods and classifiers such as classification trees, random forests, etc. One recent development more widely used is the MuPDAR (Multifactorial Prediction and Deviation Analysis using Regressions) approach of Gries and Deshors (2014) and Gries and Adelman (2014). This approach attempts to improve on traditional regression- or tree-based approaches by, firstly, training a model on the reference speakers (often native speakers (NS) in learner corpus studies or British English speakers in variety studies), then, secondly, using this model to predict what such a reference speaker would produce in the situation the target speaker is in (often non-native speakers (NNS) or indigenized-variety speakers). Crucially, the third step then consists of determining whether the target speakers made a canonical choice or not and explore that variability with a second regression model or classifier. Both regression-based modeling in general and MuPDAR in particular have led to many interesting results, but we want to propose two changes in perspective on the results they produce. First, we want to focus attention on the middle ground of the prediction space, i.e. the predictions of a regression/classifier that, essentially, are made non-confidently and translate into a statement such as ‘in this context, both/all alternants would be fine’. Second, we want to make a plug for a greater attention to misclassifications/-predictions and propose a method to identify those as well as discuss what we can learn from studying them. We exemplify our two suggestions based on a brief case study, namely the dative alternation in native and learner corpus data.","PeriodicalId":73271,"journal":{"name":"ICAME journal : computers in English linguistics","volume":"407 1","pages":"69 - 96"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"There’s more to alternations than the main diagonal of a 2×2 confusion matrix: Improvements of MuPDAR and other classificatory alternation studies\",\"authors\":\"S. Gries, Santa Barbara, J. Liebig, Sandra C. Deshors\",\"doi\":\"10.2478/icame-2020-0003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Corpus-based studies of learner language and (especially) English varieties have become more quantitative in nature and increasingly use regression-based methods and classifiers such as classification trees, random forests, etc. One recent development more widely used is the MuPDAR (Multifactorial Prediction and Deviation Analysis using Regressions) approach of Gries and Deshors (2014) and Gries and Adelman (2014). This approach attempts to improve on traditional regression- or tree-based approaches by, firstly, training a model on the reference speakers (often native speakers (NS) in learner corpus studies or British English speakers in variety studies), then, secondly, using this model to predict what such a reference speaker would produce in the situation the target speaker is in (often non-native speakers (NNS) or indigenized-variety speakers). Crucially, the third step then consists of determining whether the target speakers made a canonical choice or not and explore that variability with a second regression model or classifier. Both regression-based modeling in general and MuPDAR in particular have led to many interesting results, but we want to propose two changes in perspective on the results they produce. First, we want to focus attention on the middle ground of the prediction space, i.e. the predictions of a regression/classifier that, essentially, are made non-confidently and translate into a statement such as ‘in this context, both/all alternants would be fine’. Second, we want to make a plug for a greater attention to misclassifications/-predictions and propose a method to identify those as well as discuss what we can learn from studying them. We exemplify our two suggestions based on a brief case study, namely the dative alternation in native and learner corpus data.\",\"PeriodicalId\":73271,\"journal\":{\"name\":\"ICAME journal : computers in English linguistics\",\"volume\":\"407 1\",\"pages\":\"69 - 96\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICAME journal : computers in English linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/icame-2020-0003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICAME journal : computers in English linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/icame-2020-0003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

基于语料库的学习者语言特别是英语变体的研究在本质上变得更加定量化，并且越来越多地使用基于回归的方法和分类树、随机森林等分类器。Gries和Deshors(2014)和Gries和Adelman(2014)的MuPDAR(使用回归的多因子预测和偏差分析)方法最近得到了更广泛的应用。这种方法试图改进传统的基于回归或树的方法，首先，在参考说话者(通常是学习者语料库研究中的母语说话者(NS)或多样性研究中的英国英语说话者)上训练一个模型，然后，其次，使用这个模型来预测这样的参考说话者在目标说话者所处的情况下会产生什么(通常是非母语说话者(NNS)或本土化的多样性说话者)。至关重要的是，第三步包括确定目标说话者是否做出了规范选择，并使用第二个回归模型或分类器探索这种可变性。基于回归的建模和基于MuPDAR的建模都产生了许多有趣的结果，但是我们想从两个角度对它们产生的结果提出改变。首先，我们想把注意力集中在预测空间的中间地带，即回归/分类器的预测，本质上是不自信的，并转化为“在这种情况下，两种/所有替代方案都可以”这样的陈述。其次，我们希望对错误分类/预测给予更多的关注，并提出一种识别这些错误的方法，并讨论我们可以从研究中学到什么。通过一个简短的案例研究，我们举例说明了我们的两个建议，即母语和学习者语料库数据的替代替代。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

There’s more to alternations than the main diagonal of a 2×2 confusion matrix: Improvements of MuPDAR and other classificatory alternation studies

Abstract Corpus-based studies of learner language and (especially) English varieties have become more quantitative in nature and increasingly use regression-based methods and classifiers such as classification trees, random forests, etc. One recent development more widely used is the MuPDAR (Multifactorial Prediction and Deviation Analysis using Regressions) approach of Gries and Deshors (2014) and Gries and Adelman (2014). This approach attempts to improve on traditional regression- or tree-based approaches by, firstly, training a model on the reference speakers (often native speakers (NS) in learner corpus studies or British English speakers in variety studies), then, secondly, using this model to predict what such a reference speaker would produce in the situation the target speaker is in (often non-native speakers (NNS) or indigenized-variety speakers). Crucially, the third step then consists of determining whether the target speakers made a canonical choice or not and explore that variability with a second regression model or classifier. Both regression-based modeling in general and MuPDAR in particular have led to many interesting results, but we want to propose two changes in perspective on the results they produce. First, we want to focus attention on the middle ground of the prediction space, i.e. the predictions of a regression/classifier that, essentially, are made non-confidently and translate into a statement such as ‘in this context, both/all alternants would be fine’. Second, we want to make a plug for a greater attention to misclassifications/-predictions and propose a method to identify those as well as discuss what we can learn from studying them. We exemplify our two suggestions based on a brief case study, namely the dative alternation in native and learner corpus data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICAME journal : computers in English linguistics

自引率

0.00%

发文量

审稿时长

32 weeks