Moving to continuous classifications of bilingualism through machine learning trained on language production

IF 2.6 1区文学 Q1 LINGUISTICS Bilingualism: Language and Cognition Pub Date : 2024-05-24 DOI:10.1017/s1366728924000361

M. I. Coco, G. Smith, R. Spelorzi, M. Garraffa

{"title":"Moving to continuous classifications of bilingualism through machine learning trained on language production","authors":"M. I. Coco, G. Smith, R. Spelorzi, M. Garraffa","doi":"10.1017/s1366728924000361","DOIUrl":null,"url":null,"abstract":"Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to (“monolingual”, “attriters” and “heritage”). All classes can be predicted above chance (>33%), even if the classifier's performance substantially varies, with monolinguals identified much better (f-score >70%) than attriters (f-score <50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.","PeriodicalId":8758,"journal":{"name":"Bilingualism: Language and Cognition","volume":"51 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bilingualism: Language and Cognition","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1017/s1366728924000361","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LINGUISTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to (“monolingual”, “attriters” and “heritage”). All classes can be predicted above chance (>33%), even if the classifier's performance substantially varies, with monolinguals identified much better (f-score >70%) than attriters (f-score <50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过对语言生产进行机器学习训练，实现双语的连续分类

最近对双语的概念化正在从严格的分类转向连续的方法。本研究将心理语言学实证数据与机器学习分类建模相结合，支持了这一趋势。支持向量分类器在两个数据集上进行了训练，这些数据集包含了意大利语使用者的编码作品，用于预测他们所属的类别（"单语"、"外来语 "和 "遗产"）。所有类别的预测结果都高于概率（33%），即使分类器的性能差异很大，单语者的识别率（f-score >70%）远高于后裔（f-score <50%），后裔反而是最容易混淆的类别。对混淆矩阵中的分类错误进行进一步分析后发现，外来语使用者被识别为遗产语使用者的频率几乎与他们被正确分类的频率相同。聚类词是最能识别分类结果的特征。总之，这项研究支持将双语作为语言行为的连续体，而不是先验的既定类别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Bilingualism: Language and Cognition Multiple-

CiteScore

8.90

自引率

16.70%

发文量