Moving to continuous classifications of bilingualism through machine learning trained on language production

IF 2.5 1区 文学 Q1 LINGUISTICS Bilingualism: Language and Cognition Pub Date : 2024-05-24 DOI:10.1017/s1366728924000361
M. I. Coco, G. Smith, R. Spelorzi, M. Garraffa
{"title":"Moving to continuous classifications of bilingualism through machine learning trained on language production","authors":"M. I. Coco, G. Smith, R. Spelorzi, M. Garraffa","doi":"10.1017/s1366728924000361","DOIUrl":null,"url":null,"abstract":"<p>Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to (“monolingual”, “attriters” and “heritage”). All classes can be predicted above chance (&gt;33%), even if the classifier's performance substantially varies, with monolinguals identified much better (<span>f</span>-score &gt;70%) than attriters (<span>f</span>-score &lt;50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.</p>","PeriodicalId":8758,"journal":{"name":"Bilingualism: Language and Cognition","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bilingualism: Language and Cognition","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1017/s1366728924000361","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to (“monolingual”, “attriters” and “heritage”). All classes can be predicted above chance (>33%), even if the classifier's performance substantially varies, with monolinguals identified much better (f-score >70%) than attriters (f-score <50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过对语言生产进行机器学习训练,实现双语的连续分类
最近对双语的概念化正在从严格的分类转向连续的方法。本研究将心理语言学实证数据与机器学习分类建模相结合,支持了这一趋势。支持向量分类器在两个数据集上进行了训练,这些数据集包含了意大利语使用者的编码作品,用于预测他们所属的类别("单语"、"外来语 "和 "遗产")。所有类别的预测结果都高于概率(33%),即使分类器的性能差异很大,单语者的识别率(f-score >70%)远高于后裔(f-score <50%),后裔反而是最容易混淆的类别。对混淆矩阵中的分类错误进行进一步分析后发现,外来语使用者被识别为遗产语使用者的频率几乎与他们被正确分类的频率相同。聚类词是最能识别分类结果的特征。总之,这项研究支持将双语作为语言行为的连续体,而不是先验的既定类别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
8.90
自引率
16.70%
发文量
86
期刊最新文献
Bilinguals show evidence of brain maintenance in Alzheimer's disease Effects of dominance on language switching: a longitudinal study of Turkish–Dutch children with and without developmental language disorder Individual differences in L2 proficiency moderate the effect of L1 translation knowledge on L2 lexical retrieval Moving to continuous classifications of bilingualism through machine learning trained on language production Understanding the impact of foreign language on social norms through lies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1