首页 > 最新文献

Research Methods in Applied Linguistics最新文献

英文 中文
EFL students’ engagement in machine translation-assisted writing: Scale development and validation 英语学生参与机器翻译辅助写作:规模发展与验证
Pub Date : 2025-09-12 DOI: 10.1016/j.rmal.2025.100260
Mariko Yuasa , Osamu Takeuchi
The use of machine translation (MT) in second language (L2) writing has been criticised for potentially disengaging low-proficiency English as a foreign language (EFL) learners from the writing process. However, little is known about how these learners use MT, let alone how their engagement with the writing process should be empirically measured. Therefore, this study developed and validated a multidimensional scale to assess low-proficiency EFL students’ engagement in MT-assisted writing. A 24-item instrument encompassing behavioural, cognitive, affective, social, and agentic engagement subscales was administered to 773 Japanese university students at the CEFR-A2 level (final sample, N = 708). Exploratory factor analysis with half the participants (n = 354) identified five engagement constructs, with cognitive engagement subdivided into pre-editing and post-editing. Agentic engagement was excluded due to a low factor loading. The hypothesised five-factor model was then tested on the remaining participants (n = 354) through confirmatory factor analysis, which yielded satisfactory reliability, as well as construct, convergent, and discriminant validity. This newly developed scale is a valuable tool for investigating how low-proficiency EFL students engage in L2 writing with MT, which often remains unobservable in classrooms where MT use is discouraged or prohibited.
机器翻译(MT)在第二语言(L2)写作中的使用一直受到批评,因为它可能会使低水平的英语作为外语(EFL)学习者脱离写作过程。然而,人们对这些学习者如何使用机器翻译知之甚少,更不用说如何对他们在写作过程中的参与进行实证测量了。因此,本研究开发并验证了一个多维度量表来评估低水平英语学生在mt辅助写作中的参与程度。本研究对773名CEFR-A2水平的日本大学生(最终样本,N = 708)实施了一个包含行为、认知、情感、社会和代理参与量表的24项工具。对一半参与者(n = 354)的探索性因素分析确定了五种参与结构,其中认知参与被细分为编辑前和编辑后。由于低因子负荷,排除了人工接合。然后通过验证性因子分析对剩余参与者(n = 354)进行假设的五因素模型检验,获得了令人满意的信度,以及构建效度,收敛效度和区分效度。这个新开发的量表是一个有价值的工具,用于调查低水平的EFL学生如何使用MT进行第二语言写作,这在不鼓励或禁止使用MT的教室中通常是无法观察到的。
{"title":"EFL students’ engagement in machine translation-assisted writing: Scale development and validation","authors":"Mariko Yuasa ,&nbsp;Osamu Takeuchi","doi":"10.1016/j.rmal.2025.100260","DOIUrl":"10.1016/j.rmal.2025.100260","url":null,"abstract":"<div><div>The use of machine translation (MT) in second language (L2) writing has been criticised for potentially disengaging low-proficiency English as a foreign language (EFL) learners from the writing process. However, little is known about how these learners use MT, let alone how their engagement with the writing process should be empirically measured. Therefore, this study developed and validated a multidimensional scale to assess low-proficiency EFL students’ engagement in MT-assisted writing. A 24-item instrument encompassing behavioural, cognitive, affective, social, and agentic engagement subscales was administered to 773 Japanese university students at the CEFR-A2 level (final sample, <em>N</em> = 708). Exploratory factor analysis with half the participants (<em>n</em> = 354) identified five engagement constructs, with cognitive engagement subdivided into pre-editing and post-editing. Agentic engagement was excluded due to a low factor loading. The hypothesised five-factor model was then tested on the remaining participants (<em>n</em> = 354) through confirmatory factor analysis, which yielded satisfactory reliability, as well as construct, convergent, and discriminant validity. This newly developed scale is a valuable tool for investigating how low-proficiency EFL students engage in L2 writing with MT, which often remains unobservable in classrooms where MT use is discouraged or prohibited.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100260"},"PeriodicalIF":0.0,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145048798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequentist vs. Bayesian methods: Choosing appropriate statistical methods in second language research 频率论与贝叶斯方法:第二语言研究中统计方法的选择
Pub Date : 2025-09-03 DOI: 10.1016/j.rmal.2025.100256
Shotaro Ueno , Osamu Takeuchi
Null hypothesis significance testing (NHST) with p-values is one of the most commonly used statistical procedures in second language research. This statistical approach follows the principles of the frequentist method, and although it has various advantages, some researchers have noted its limitations and proposed Bayesian methods as an alternative. To contribute to this debate, this article introduces the basic principles of Bayesian statistics, specifically Bayesian hypothesis testing (BHT), and explores its advantages and limitations compared to the frequentist approach, particularly NHST. The article first outlines the foundational concepts of NHST and reviews the main criticisms associated with its use. It then presents the core ideas of Bayesian methods, with a primary focus on the Bayes factor, followed by a description of general procedures for conducting BHT and an overview of its potential benefits in applied research contexts. Additionally, several challenges and criticisms of Bayesian methods are discussed, emphasizing that they are not always a superior alternative. Based on these discussions, the article argues that both frequentist and Bayesian methods have strengths and limitations, and that specific research goals, questions, and contexts should guide the choice of statistical framework.
带p值的零假设显著性检验是第二语言研究中最常用的统计方法之一。这种统计方法遵循频率方法的原则,尽管它有各种优点,但一些研究人员已经注意到它的局限性,并提出贝叶斯方法作为一种替代方法。为了促进这场辩论,本文介绍了贝叶斯统计的基本原理,特别是贝叶斯假设检验(BHT),并探讨了与频率论方法(特别是NHST)相比,它的优点和局限性。本文首先概述了NHST的基本概念,并回顾了与其使用相关的主要批评。然后介绍了贝叶斯方法的核心思想,主要关注贝叶斯因素,随后描述了进行BHT的一般程序,并概述了其在应用研究背景下的潜在优势。此外,讨论了贝叶斯方法的一些挑战和批评,强调它们并不总是一个更好的选择。基于这些讨论,本文认为频率论和贝叶斯方法都有各自的优势和局限性,具体的研究目标、问题和背景应该指导统计框架的选择。
{"title":"Frequentist vs. Bayesian methods: Choosing appropriate statistical methods in second language research","authors":"Shotaro Ueno ,&nbsp;Osamu Takeuchi","doi":"10.1016/j.rmal.2025.100256","DOIUrl":"10.1016/j.rmal.2025.100256","url":null,"abstract":"<div><div>Null hypothesis significance testing (NHST) with <em>p</em>-values is one of the most commonly used statistical procedures in second language research. This statistical approach follows the principles of the frequentist method, and although it has various advantages, some researchers have noted its limitations and proposed Bayesian methods as an alternative. To contribute to this debate, this article introduces the basic principles of Bayesian statistics, specifically Bayesian hypothesis testing (BHT), and explores its advantages and limitations compared to the frequentist approach, particularly NHST. The article first outlines the foundational concepts of NHST and reviews the main criticisms associated with its use. It then presents the core ideas of Bayesian methods, with a primary focus on the Bayes factor, followed by a description of general procedures for conducting BHT and an overview of its potential benefits in applied research contexts. Additionally, several challenges and criticisms of Bayesian methods are discussed, emphasizing that they are not always a superior alternative. Based on these discussions, the article argues that both frequentist and Bayesian methods have strengths and limitations, and that specific research goals, questions, and contexts should guide the choice of statistical framework.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100256"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144931891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward research inclusivity in applied linguistics: Methodological considerations for inclusive online experimentation 迈向应用语言学研究的包容性:对包容性在线实验的方法论思考
Pub Date : 2025-09-03 DOI: 10.1016/j.rmal.2025.100255
Kathy Kim, Erning Henry Chen
Building on the growing shift toward remote research and the need for inclusive, participant-centered methodologies, this commentary explores the potential of online experimentation in diversification—both at the stages of recruitment and experimental implementation. We first discuss recruitment approaches, including crowdsourcing and social media, and their potential—along with their limitations—for broadening participation among underrepresented learner populations. We then draw on a framework grounded in three interconnected principles—Value, Trust, and Agency—to explore ways inclusivity can be meaningfully incorporated into experimental implementation. Informed by research across disciplines, we offer practical suggestions for designing online experiments that aim to be both methodologically robust and responsive to the varied realities of participants.
基于日益向远程研究的转变,以及对包容的、以参与者为中心的方法的需求,本文探讨了在招聘和实验实施阶段进行多样化在线实验的潜力。我们首先讨论了包括众包和社交媒体在内的招聘方法,以及它们在扩大代表性不足的学习者群体参与方面的潜力和局限性。然后,我们利用基于三个相互关联的原则(价值、信任和代理)的框架,探索将包容性有意义地纳入实验性实施的方法。通过跨学科的研究,我们为设计在线实验提供了实用的建议,这些实验旨在既在方法上稳健,又能对参与者的不同现实做出反应。
{"title":"Toward research inclusivity in applied linguistics: Methodological considerations for inclusive online experimentation","authors":"Kathy Kim,&nbsp;Erning Henry Chen","doi":"10.1016/j.rmal.2025.100255","DOIUrl":"10.1016/j.rmal.2025.100255","url":null,"abstract":"<div><div>Building on the growing shift toward remote research and the need for inclusive, participant-centered methodologies, this commentary explores the potential of online experimentation in diversification—both at the stages of recruitment and experimental implementation. We first discuss recruitment approaches, including crowdsourcing and social media, and their potential—along with their limitations—for broadening participation among underrepresented learner populations. We then draw on a framework grounded in three interconnected principles—Value, Trust, and Agency—to explore ways inclusivity can be meaningfully incorporated into experimental implementation. Informed by research across disciplines, we offer practical suggestions for designing online experiments that aim to be both methodologically robust and responsive to the varied realities of participants.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100255"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144931892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the scoring system of an AI-integrated app to assess foreign language phonological decoding 评估ai集成应用程序的评分系统,以评估外语语音解码
Pub Date : 2025-09-01 DOI: 10.1016/j.rmal.2025.100257
James Turner , Alison Porter , Suzanne Graham , Travis Ralph-Donaldson , Heike Krüsemann , Pengchong Zhang , Kate Borthwick
Phonological decoding in a foreign language (FL)—a two-part process involving first the ability to map written symbols to their corresponding sounds and second to pronounce them intelligibly—is foundational for reading and vocabulary acquisition. Yet assessing this skill efficiently and at scale in young learners remains a persistent challenge. Here, we introduce and evaluate the accuracy and effectiveness of a novel method for assessing FL phonological decoding using an AI-driven app that automatically scores children's pronunciation of symbol-sound correspondences. In a study involving 254 learners of French and Spanish (aged 10–11) across five UK primary schools, pupils completed a read-aloud task (14 symbol-sound correspondences) that was scored by the app’s automatic speech recognition (ASR) technology. The validity of these automated scores was tested by fitting them as independent variables in regression models predicting human auditory coding. The multiple significant relationships between automated and human scores that were established indicate that there is great potential for ASR-based tools to reliably assess phonological decoding in this population. These findings provide the first large-scale empirical validation of an AI-based assessment of FL decoding in children, opening new possibilities, applicable to a range of languages being learnt, for scalable and efficient assessment.
外语的语音解码是阅读和词汇习得的基础,这一过程分为两个部分,首先是将书写符号映射到相应的声音上,其次是清晰地发音。然而,有效和大规模地评估年轻学习者的这一技能仍然是一项持续的挑战。在这里,我们介绍并评估了一种使用人工智能驱动的应用程序来评估FL语音解码的新方法的准确性和有效性,该应用程序可以自动对儿童的符号-声音对应的发音进行评分。在一项涉及英国五所小学254名法语和西班牙语学习者(10-11岁)的研究中,学生们完成了一项朗读任务(14个符号-声音对应),并通过应用程序的自动语音识别(ASR)技术进行评分。通过将这些自动得分拟合为预测人类听觉编码的回归模型中的独立变量,测试了它们的有效性。已建立的自动化和人类评分之间的多重重要关系表明,基于asr的工具在可靠地评估该人群的语音解码方面具有很大的潜力。这些发现为基于人工智能的儿童FL解码评估提供了第一次大规模的经验验证,为可扩展和有效的评估开辟了新的可能性,适用于一系列正在学习的语言。
{"title":"Evaluating the scoring system of an AI-integrated app to assess foreign language phonological decoding","authors":"James Turner ,&nbsp;Alison Porter ,&nbsp;Suzanne Graham ,&nbsp;Travis Ralph-Donaldson ,&nbsp;Heike Krüsemann ,&nbsp;Pengchong Zhang ,&nbsp;Kate Borthwick","doi":"10.1016/j.rmal.2025.100257","DOIUrl":"10.1016/j.rmal.2025.100257","url":null,"abstract":"<div><div>Phonological decoding in a foreign language (FL)—a two-part process involving first the ability to map written symbols to their corresponding sounds and second to pronounce them intelligibly—is foundational for reading and vocabulary acquisition. Yet assessing this skill efficiently and at scale in young learners remains a persistent challenge. Here, we introduce and evaluate the accuracy and effectiveness of a novel method for assessing FL phonological decoding using an AI-driven app that automatically scores children's pronunciation of symbol-sound correspondences. In a study involving 254 learners of French and Spanish (aged 10–11) across five UK primary schools, pupils completed a read-aloud task (14 symbol-sound correspondences) that was scored by the app’s automatic speech recognition (ASR) technology. The validity of these automated scores was tested by fitting them as independent variables in regression models predicting human auditory coding. The multiple significant relationships between automated and human scores that were established indicate that there is great potential for ASR-based tools to reliably assess phonological decoding in this population. These findings provide the first large-scale empirical validation of an AI-based assessment of FL decoding in children, opening new possibilities, applicable to a range of languages being learnt, for scalable and efficient assessment.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100257"},"PeriodicalIF":0.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144925646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automate the ‘boring bits’: An assessment of AI-assisted systematic review (AIASR) 自动化“无聊的部分”:人工智能辅助系统审查(AIASR)的评估
Pub Date : 2025-09-01 DOI: 10.1016/j.rmal.2025.100258
Timothy Hampson , Kelly Cargos , Jim McKinley
Systematic review is a powerful tool for disseminating the findings of research, particularly in applied linguistics where we hope to provide insights for practising language teachers. Yet, systematic review is also often prohibitively time-consuming, particularly for small, underfunded teams or solo researchers. In this study, we explore the use of generative artificial intelligence to ease the burden of screening and organising papers. Our findings suggest that AI excels in some tasks, particularly when those tasks involve explicitly stated information, and struggles in others, particularly when information is more implicit. A comparison of generative artificial intelligence for filtering papers with ASReview, a popular non-generative tool, reveals trade-offs, with Generative AI being replicable and more efficient, but with concerns about accuracy. We conclude that generative artificial intelligence can be a useful tool for systematic review but requires rigorous validation before use. We conclude by emphasising the importance of testing AI for systematic review tasks and exploring how this can practically be achieved.
系统评论是传播研究成果的有力工具,特别是在应用语言学方面,我们希望为实践语言教师提供见解。然而,系统评价通常也非常耗时,特别是对于小型的、资金不足的团队或单独的研究人员。在这项研究中,我们探索使用生成人工智能来减轻筛选和组织文件的负担。我们的研究结果表明,人工智能在某些任务中表现出色,特别是当这些任务涉及明确陈述的信息时,而在其他任务中则表现不佳,特别是当信息更为隐含时。将用于过滤论文的生成式人工智能与流行的非生成式工具ASReview进行比较,揭示了权衡,生成式人工智能具有可复制性和更高效,但存在准确性问题。我们得出的结论是,生成式人工智能可以是一个有用的系统审查工具,但在使用前需要严格的验证。最后,我们强调了在系统审查任务中测试人工智能的重要性,并探索了如何实现这一目标。
{"title":"Automate the ‘boring bits’: An assessment of AI-assisted systematic review (AIASR)","authors":"Timothy Hampson ,&nbsp;Kelly Cargos ,&nbsp;Jim McKinley","doi":"10.1016/j.rmal.2025.100258","DOIUrl":"10.1016/j.rmal.2025.100258","url":null,"abstract":"<div><div>Systematic review is a powerful tool for disseminating the findings of research, particularly in applied linguistics where we hope to provide insights for practising language teachers. Yet, systematic review is also often prohibitively time-consuming, particularly for small, underfunded teams or solo researchers. In this study, we explore the use of generative artificial intelligence to ease the burden of screening and organising papers. Our findings suggest that AI excels in some tasks, particularly when those tasks involve explicitly stated information, and struggles in others, particularly when information is more implicit. A comparison of generative artificial intelligence for filtering papers with ASReview, a popular non-generative tool, reveals trade-offs, with Generative AI being replicable and more efficient, but with concerns about accuracy. We conclude that generative artificial intelligence can be a useful tool for systematic review but requires rigorous validation before use. We conclude by emphasising the importance of testing AI for systematic review tasks and exploring how this can practically be achieved.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100258"},"PeriodicalIF":0.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144922554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating synthetic data for CALL research with GenAI: A proof-of-concept study 利用GenAI为CALL研究生成合成数据:一项概念验证研究
Pub Date : 2025-08-26 DOI: 10.1016/j.rmal.2025.100248
Dennis Foung , Lucas Kohnke
Popular tools like ChatGPT have placed generative artificial intelligence (GenAI) in the spotlight in recent years. One use of GenAI tools is to generate simulated data—or synthetic data—when the full scope of the required microdata is unavailable. Despite suggestions for educational researchers to use synthetic data, little (if any) computer-assisted language learning (CALL) research has used synthetic data thus far. This study addresses this research gap by exploring the possibility of using synthetic datasets in CALL. The publicly available dataset resembles a typical study with a small sample size (n = 55) performed using a CALL platform. Two synthetic datasets are generated from the original datasets using the synthpop package and generative adversarial networks (GAN) in R (via the RGAN package), which are both common synthetic data generation methods. This study evaluates the synthetic datasets by (a) comparing the distribution between the synthetic and original datasets, (b) examining the model parameters of the rebuilt linear models using the synthetic and original datasets, and (c) examining the privacy disclosure metrics. The results suggest that synthpop better represents the original data and preserves privacy. Notably, the GAN-generated dataset does not produce satisfactory results. This demonstrates GAN’s key challenges alongside the potential benefits of generating synthetic data with synthpop.
近年来,ChatGPT等流行工具将生成式人工智能(GenAI)置于聚光灯下。GenAI工具的一个用途是在无法获得所需微数据的全部范围时生成模拟数据或合成数据。尽管建议教育研究人员使用合成数据,但迄今为止,计算机辅助语言学习(CALL)研究很少(如果有的话)使用合成数据。本研究通过探索在CALL中使用合成数据集的可能性来解决这一研究缺口。公开可用的数据集类似于使用CALL平台执行的小样本量(n = 55)的典型研究。使用R中的synthpop包和生成对抗网络(GAN)(通过RGAN包)从原始数据集生成两个合成数据集,这两种方法都是常见的合成数据生成方法。本研究通过(a)比较合成数据集和原始数据集之间的分布,(b)检查使用合成数据集和原始数据集重建的线性模型的模型参数,以及(c)检查隐私披露指标来评估合成数据集。结果表明,synthpop更好地代表了原始数据并保护了隐私。值得注意的是,gan生成的数据集没有产生令人满意的结果。这展示了GAN的主要挑战以及使用synthpop生成合成数据的潜在好处。
{"title":"Generating synthetic data for CALL research with GenAI: A proof-of-concept study","authors":"Dennis Foung ,&nbsp;Lucas Kohnke","doi":"10.1016/j.rmal.2025.100248","DOIUrl":"10.1016/j.rmal.2025.100248","url":null,"abstract":"<div><div>Popular tools like ChatGPT have placed generative artificial intelligence (GenAI) in the spotlight in recent years. One use of GenAI tools is to generate simulated data—or synthetic data—when the full scope of the required microdata is unavailable. Despite suggestions for educational researchers to use synthetic data, little (if any) computer-assisted language learning (CALL) research has used synthetic data thus far. This study addresses this research gap by exploring the possibility of using synthetic datasets in CALL. The publicly available dataset resembles a typical study with a small sample size (<em>n</em> = 55) performed using a CALL platform. Two synthetic datasets are generated from the original datasets using the <em>synthpop</em> package and generative adversarial networks (GAN) in <em>R</em> (via the <em>RGAN</em> package), which are both common synthetic data generation methods. This study evaluates the synthetic datasets by (a) comparing the distribution between the synthetic and original datasets, (b) examining the model parameters of the rebuilt linear models using the synthetic and original datasets, and (c) examining the privacy disclosure metrics. The results suggest that <em>synthpop</em> better represents the original data and preserves privacy. Notably, the GAN-generated dataset does not produce satisfactory results. This demonstrates GAN’s key challenges alongside the potential benefits of generating synthetic data with <em>synthpop</em>.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100248"},"PeriodicalIF":0.0,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144895543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Eight reasons not to test for baseline group equivalence in a parallel groups pretest-posttest study 在平行组的前测后测研究中不测试基线组等效性的八个理由
Pub Date : 2025-08-25 DOI: 10.1016/j.rmal.2025.100254
Seth Lindstromberg
The parallel groups pretest-posttest design has long been prominent in quantitative research of SLA. Ideally, groups are formed by random assignment of individuals. But with or without random assignment, groups may differ substantially on key pre-treatment measures such as pretest scores. When faced with non-equivalent groups, many SLA researchers have tested the difference(s) for statistical significance in the belief that p > .05 allows a main statistical analysis which assumes that the pretreatment group means do not differ. The literature of applied statistics includes numerous accounts of why such “baseline equivalence” (BE) testing is misguided. Yet BE tests continue to be reported in SLA journals at all levels of reputation. This paper describes BE testing, reviews its flaws, shows that the practice persists, and discusses possible reasons why BE tests may be thought to be legitimate, and considers options in study planning that lead to superior results and avoid conditions that appear to make BE testing necessary.
平行组前测后测设计一直是二语习得定量研究的重点。理想情况下,群体是由随机分配的个体组成的。但无论是否随机分配,各组在关键的预处理措施(如测试前得分)上可能存在很大差异。当面对非等效组时,许多SLA研究者已经对差异进行了统计显著性检验,他们认为p >; 0.05允许进行主统计分析,假设预处理组的均值没有差异。应用统计学的文献中包含了许多关于为什么这种“基线等效”(BE)测试被误导的解释。然而,在所有级别声誉的SLA期刊上继续报道BE测试。本文描述了BE测试,回顾了它的缺陷,表明这种做法仍然存在,并讨论了为什么BE测试可能被认为是合法的可能原因,并考虑了学习计划中的选择,这些选择导致了更好的结果,并避免了似乎使BE测试成为必要的条件。
{"title":"Eight reasons not to test for baseline group equivalence in a parallel groups pretest-posttest study","authors":"Seth Lindstromberg","doi":"10.1016/j.rmal.2025.100254","DOIUrl":"10.1016/j.rmal.2025.100254","url":null,"abstract":"<div><div>The parallel groups pretest-posttest design has long been prominent in quantitative research of SLA. Ideally, groups are formed by random assignment of individuals. But with or without random assignment, groups may differ substantially on key pre-treatment measures such as pretest scores. When faced with non-equivalent groups, many SLA researchers have tested the difference(s) for statistical significance in the belief that <em>p</em> &gt; .05 allows a main statistical analysis which assumes that the pretreatment group means do not differ. The literature of applied statistics includes numerous accounts of why such “baseline equivalence” (BE) testing is misguided. Yet BE tests continue to be reported in SLA journals at all levels of reputation. This paper describes BE testing, reviews its flaws, shows that the practice persists, and discusses possible reasons why BE tests may be thought to be legitimate, and considers options in study planning that lead to superior results and avoid conditions that appear to make BE testing necessary.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100254"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a preschooler corpus of Italian: an experimental journey 迈向学前意大利语语料库:实验之旅
Pub Date : 2025-08-25 DOI: 10.1016/j.rmal.2025.100252
Chiara Bolognesi, Alessandra Cinini, Paola Cutugno, Melissa Ferretti, Davide Chiarella
The paper surveys the process and reasonings behind the written sources section of the Corpus of Italian for Preschoolers (CIP), a corpus collecting child-directed speech targeted at Italian children aged 3–6. Beginning from an overview of the available child-speech and child-directed speech corpora, the article underlines the need for an Italian Corpus focusing on children’s passive vocabulary and how such a tool would be useful for future comparative studies on children’s own production and as a tool for professionals in children’s needs. The CIP aims at collecting 250,000 linguistic tokens across a selection of different sources (Written, Spoken, Signed) gathered with the help of schools and families. This paper focuses specifically on the selection criteria for the written sources and the first steps of their linguistic processing, explaining through a set of three experiments how three different linguistic annotation tools performed on the tasks of tokenizing, lemmatizing and POS-tagging three different children’s literature texts. The last part presents the results of the experiments with insight on the NLP tools’ performances, as well as the reasons for our choice of tool for the large-scale annotation process and the still-ongoing challenges for the finalization of our corpus.
本文调查了意大利语语料库(CIP)的书面来源部分背后的过程和推理,该语料库收集针对3-6岁意大利儿童的儿童定向演讲。本文首先概述了现有的儿童言语语料库和儿童导向语料库,强调了建立一个以儿童被动词汇为重点的意大利语语料库的必要性,以及这种工具如何在未来对儿童自身生产的比较研究中发挥作用,并作为儿童需求专业人员的工具。CIP的目标是在学校和家庭的帮助下,从不同的来源(书面、口头、签名)中收集25万个语言符号。本文特别关注书面来源的选择标准及其语言处理的第一步,通过一组三个实验来解释三种不同的语言注释工具如何执行三种不同的儿童文学文本的标记、归纳和pos标记任务。最后一部分介绍了对NLP工具性能的实验结果,以及我们选择大规模注释过程工具的原因,以及我们的语料库最终确定的仍然存在的挑战。
{"title":"Towards a preschooler corpus of Italian: an experimental journey","authors":"Chiara Bolognesi,&nbsp;Alessandra Cinini,&nbsp;Paola Cutugno,&nbsp;Melissa Ferretti,&nbsp;Davide Chiarella","doi":"10.1016/j.rmal.2025.100252","DOIUrl":"10.1016/j.rmal.2025.100252","url":null,"abstract":"<div><div>The paper surveys the process and reasonings behind the written sources section of the Corpus of Italian for Preschoolers (CIP), a corpus collecting child-directed speech targeted at Italian children aged 3–6. Beginning from an overview of the available child-speech and child-directed speech corpora, the article underlines the need for an Italian Corpus focusing on children’s passive vocabulary and how such a tool would be useful for future comparative studies on children’s own production and as a tool for professionals in children’s needs. The CIP aims at collecting 250,000 linguistic tokens across a selection of different sources (Written, Spoken, Signed) gathered with the help of schools and families. This paper focuses specifically on the selection criteria for the written sources and the first steps of their linguistic processing, explaining through a set of three experiments how three different linguistic annotation tools performed on the tasks of tokenizing, lemmatizing and POS-tagging three different children’s literature texts. The last part presents the results of the experiments with insight on the NLP tools’ performances, as well as the reasons for our choice of tool for the large-scale annotation process and the still-ongoing challenges for the finalization of our corpus.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100252"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated vs. manual linguistic annotation for assessing pragmatic competence in English classes 评估英语课堂语用能力的自动与手动语言注释
Pub Date : 2025-08-20 DOI: 10.1016/j.rmal.2025.100253
Mohsen Mahmoudi-Dehaki , Nasim Nasr-Esfahani
Evaluating pragmatic competence remains a complex and critical challenge in applied linguistics, particularly in English as a Foreign Language (EFL) contexts. This study aims to address this gap by examining the potential of automating pragmatic competence assessment using AI-powered text analytics. Employing an explanatory sequential mixed-methods design, the quantitative phase compares the accuracy of automated versus manual linguistic annotation in evaluating the pragmatic skills of EFL learners. In the qualitative phase, factors influencing the accuracy of manual annotation are explored. For automated annotation, ChatGPT-4 Omni (ChatGPT-4o) processed 116 transcriptions representing participants' performances across six verbal discourse completion tasks (DCTs), encompassing prosodic features and pragmatic functions such as requesting favors, apologizing, suggesting, complaining, inviting, and refusing invitations. The AI model was fine-tuned using a human-in-the-loop approach, incorporating ensemble techniques such as few-shot learning and instructional prompts. Manual annotation employed trained EFL teachers using standardized assessment cards. Results indicate that automated annotation surpasses manual accuracy in evaluating most pragmatic components, except cultural norms, where both methods exhibit limitations. Focus group findings reveal that annotator bias, fatigue, technological influences, linguistic background differences, and subjectivity impact manual annotation accuracy. This interdisciplinary investigation expands the methodological toolkit for pragmatic competence evaluation and holds significant implications for fields such as digital humanities, computational pragmatics, language education, machine learning, and natural language processing.
评估语用能力在应用语言学中仍然是一个复杂而关键的挑战,特别是在英语作为外语(EFL)的背景下。本研究旨在通过研究使用人工智能文本分析自动化语用能力评估的潜力来解决这一差距。定量阶段采用解释性顺序混合方法设计,比较了自动和手动语言注释在评估英语学习者语用技能方面的准确性。在定性阶段,探讨了影响人工标注准确性的因素。对于自动注释,ChatGPT-4 Omni (chatgpt - 40)处理了116个转录,代表了参与者在六个口头话语完成任务(dct)中的表现,包括韵律特征和实用功能,如请求帮助、道歉、建议、抱怨、邀请和拒绝邀请。人工智能模型采用了“人在循环”的方法进行了微调,并结合了诸如少镜头学习和教学提示等集成技术。手工注释使用标准化评估卡,由训练有素的英语教师进行。结果表明,除了文化规范之外,自动化注释在评估大多数实用成分时都超过了人工准确性,这两种方法都有局限性。焦点小组调查结果显示,注释者偏见、疲劳、技术影响、语言背景差异和主观性影响人工注释的准确性。这项跨学科的研究扩展了语用能力评估的方法论工具包,并对数字人文、计算语用学、语言教育、机器学习和自然语言处理等领域具有重要意义。
{"title":"Automated vs. manual linguistic annotation for assessing pragmatic competence in English classes","authors":"Mohsen Mahmoudi-Dehaki ,&nbsp;Nasim Nasr-Esfahani","doi":"10.1016/j.rmal.2025.100253","DOIUrl":"10.1016/j.rmal.2025.100253","url":null,"abstract":"<div><div>Evaluating pragmatic competence remains a complex and critical challenge in applied linguistics, particularly in English as a Foreign Language (EFL) contexts. This study aims to address this gap by examining the potential of automating pragmatic competence assessment using AI-powered text analytics. Employing an explanatory sequential mixed-methods design, the quantitative phase compares the accuracy of automated versus manual linguistic annotation in evaluating the pragmatic skills of EFL learners. In the qualitative phase, factors influencing the accuracy of manual annotation are explored. For automated annotation, ChatGPT-4 Omni (ChatGPT-4o) processed 116 transcriptions representing participants' performances across six verbal discourse completion tasks (DCTs), encompassing prosodic features and pragmatic functions such as requesting favors, apologizing, suggesting, complaining, inviting, and refusing invitations. The AI model was fine-tuned using a human-in-the-loop approach, incorporating ensemble techniques such as few-shot learning and instructional prompts. Manual annotation employed trained EFL teachers using standardized assessment cards. Results indicate that automated annotation surpasses manual accuracy in evaluating most pragmatic components, except cultural norms, where both methods exhibit limitations. Focus group findings reveal that annotator bias, fatigue, technological influences, linguistic background differences, and subjectivity impact manual annotation accuracy. This interdisciplinary investigation expands the methodological toolkit for pragmatic competence evaluation and holds significant implications for fields such as digital humanities, computational pragmatics, language education, machine learning, and natural language processing.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100253"},"PeriodicalIF":0.0,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144863503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The L-maze task and web-based data collection in second language sentence processing research 第二语言句子加工研究中的l -迷宫任务和基于网络的数据收集
Pub Date : 2025-08-19 DOI: 10.1016/j.rmal.2025.100251
Hiroki Fujita
In recent years, an increasing number of studies on sentence processing have used web-based data collection and the L-maze task. Web-based data collection has become particularly popular since the coronavirus pandemic, when access to laboratory-based experiments was severely restricted. In the L-maze task, participants read sentences word by word, with each word presented alongside a pseudoword that does not continue the sentence. During the task, participants need to select a word that continues the sentence. Previous research has shown that both web-based data collection and the L-maze task are useful for investigating first language sentence processing. However, little is known about their usefulness for second language sentence processing research. To address this gap in the literature, I conducted replication experiments using the web-based L-maze and self-paced reading (SPR) tasks, and investigated whether these tasks could detect garden path and gender mismatch effects during the processing of locally ambiguous sentences. The results showed these effects in both tasks, with the effects being more localised in the L-maze task. A prospective power analysis suggested that these tasks would be effective for detecting these effects, and that the L-maze task would be more reliable than the SPR task for detecting gender mismatch effects. These findings suggest that web-based data collection and the L-maze task are potentially useful tools for investigating second language sentence processing.
近年来,越来越多的句子处理研究使用基于网络的数据收集和l -迷宫任务。自冠状病毒大流行以来,基于网络的数据收集变得特别受欢迎,当时基于实验室的实验受到严格限制。在l迷宫任务中,参与者一个字一个字地阅读句子,每个单词旁边都有一个不延续句子的假词。在这个任务中,参与者需要选择一个单词来继续这个句子。先前的研究表明,基于网络的数据收集和l迷宫任务对于研究母语句子处理都是有用的。然而,人们对它们在第二语言句子加工研究中的作用知之甚少。为了解决这一文献空白,作者利用基于网络的l -迷宫和自定节奏阅读(SPR)任务进行了重复实验,并研究了这些任务在局部歧义句子加工过程中是否能检测到花园路径和性别错配效应。结果表明,这两项任务都有这种影响,但在l形迷宫任务中,这种影响更为局限。前瞻性功率分析表明,l -迷宫任务在检测性别错配效应方面比SPR任务更可靠。这些发现表明,基于网络的数据收集和l迷宫任务是研究第二语言句子处理的潜在有用工具。
{"title":"The L-maze task and web-based data collection in second language sentence processing research","authors":"Hiroki Fujita","doi":"10.1016/j.rmal.2025.100251","DOIUrl":"10.1016/j.rmal.2025.100251","url":null,"abstract":"<div><div>In recent years, an increasing number of studies on sentence processing have used web-based data collection and the L-maze task. Web-based data collection has become particularly popular since the coronavirus pandemic, when access to laboratory-based experiments was severely restricted. In the L-maze task, participants read sentences word by word, with each word presented alongside a pseudoword that does not continue the sentence. During the task, participants need to select a word that continues the sentence. Previous research has shown that both web-based data collection and the L-maze task are useful for investigating first language sentence processing. However, little is known about their usefulness for second language sentence processing research. To address this gap in the literature, I conducted replication experiments using the web-based L-maze and self-paced reading (SPR) tasks, and investigated whether these tasks could detect garden path and gender mismatch effects during the processing of locally ambiguous sentences. The results showed these effects in both tasks, with the effects being more localised in the L-maze task. A prospective power analysis suggested that these tasks would be effective for detecting these effects, and that the L-maze task would be more reliable than the SPR task for detecting gender mismatch effects. These findings suggest that web-based data collection and the L-maze task are potentially useful tools for investigating second language sentence processing.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100251"},"PeriodicalIF":0.0,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144863502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Research Methods in Applied Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1