首页 > 最新文献

Research Methods in Applied Linguistics最新文献

英文 中文
Validation of online L2 vocabulary tests: Test performance across laboratory, virtual meeting, and crowdsourcing contexts 在线第二语言词汇测试的验证:跨实验室、虚拟会议和众包环境的测试性能
Pub Date : 2025-07-24 DOI: 10.1016/j.rmal.2025.100246
Ayako Aizawa
Online data collection has become increasingly common in diverse fields, including marketing and psychology, and is gaining ground in applied linguistics. Although concerns have been raised about the validity and reliability of online assessments, previous research on online data collection suggests that, with appropriate precautions, data quality can be comparable to that obtained using in-person methods. However, the validity and reliability of online vocabulary tests have not been thoroughly investigated. To fill this gap, the present study compared the results of online vocabulary tests with those of face-to-face administration. In this study, 159 Japanese university students took the Vocabulary Size Test and Phrasal Vocabulary Size Test in three environments: (a) in-person (laboratory), (b) online with supervision (virtual meeting), and (c) online without supervision (crowdsourcing). Reliability and validity were analysed, and results showed that test performance was largely comparable: test environment and presence or absence of supervision had minimal effects on three out of the four tests, with only the meaning recall format of the Vocabulary Size Test showing significantly inflated scores in the crowdsourcing condition. While the findings suggest that pooling data online and aggregating data from different environments are feasible for vocabulary testing research, they also highlight the need for careful planning in research design to achieve a desirable environment for the participants to take the tests.
在线数据收集在包括市场营销和心理学在内的各个领域变得越来越普遍,并且在应用语言学方面也取得了进展。尽管人们对在线评估的有效性和可靠性提出了担忧,但之前关于在线数据收集的研究表明,通过适当的预防措施,数据质量可以与使用面对面方法获得的数据相当。然而,在线词汇测试的效度和信度尚未得到充分的研究。为了填补这一空白,本研究将在线词汇测试的结果与面对面管理的结果进行了比较。本研究以159名日本大学生为研究对象,在(a)面对面(实验室)、(b)有监督的在线(虚拟会议)和(c)无监督的在线(众包)三种环境下进行了词汇量测试和短语词汇量测试。信度和效度分析的结果表明,测试表现在很大程度上是可比性的:测试环境和监督的存在与否对四个测试中的三个测试的影响最小,只有词汇量测试的意义回忆格式在众包条件下显示显着膨胀的分数。虽然研究结果表明,在线汇集数据和汇总来自不同环境的数据对于词汇测试研究是可行的,但它们也强调了在研究设计中需要仔细规划,以实现参与者参加测试的理想环境。
{"title":"Validation of online L2 vocabulary tests: Test performance across laboratory, virtual meeting, and crowdsourcing contexts","authors":"Ayako Aizawa","doi":"10.1016/j.rmal.2025.100246","DOIUrl":"10.1016/j.rmal.2025.100246","url":null,"abstract":"<div><div>Online data collection has become increasingly common in diverse fields, including marketing and psychology, and is gaining ground in applied linguistics. Although concerns have been raised about the validity and reliability of online assessments, previous research on online data collection suggests that, with appropriate precautions, data quality can be comparable to that obtained using in-person methods. However, the validity and reliability of online vocabulary tests have not been thoroughly investigated. To fill this gap, the present study compared the results of online vocabulary tests with those of face-to-face administration. In this study, 159 Japanese university students took the Vocabulary Size Test and Phrasal Vocabulary Size Test in three environments: (a) in-person (laboratory), (b) online with supervision (virtual meeting), and (c) online without supervision (crowdsourcing). Reliability and validity were analysed, and results showed that test performance was largely comparable: test environment and presence or absence of supervision had minimal effects on three out of the four tests, with only the meaning recall format of the Vocabulary Size Test showing significantly inflated scores in the crowdsourcing condition. While the findings suggest that pooling data online and aggregating data from different environments are feasible for vocabulary testing research, they also highlight the need for careful planning in research design to achieve a desirable environment for the participants to take the tests.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100246"},"PeriodicalIF":0.0,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144696442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring sentiment across disciplines and argumentative moves: A sentiment analysis of open-access comments 探索跨学科的情感和论证动作:对开放获取评论的情感分析
Pub Date : 2025-07-23 DOI: 10.1016/j.rmal.2025.100243
Wenjuan Qin , Yueling Sun , Tan Jin
Sentiment analysis, a computational method originating from natural language processing, has recently gained interest in applied linguistics as a tool for examining evaluative language in academic discourse. This study applies sentiment analysis to analyzing open-access comments (OA comments), a novel academic genre designed to engage a broad readership across disciplines. Studying sentiment in these comments is crucial, as it reveals how scholars express not only factual information but also their emotion and attitudes towards the topics under discussion. The corpus includes 361 open-access comments published in Nature. The results reveal significant differences in sentiment scores across hard and soft science disciplines and in different argumentative moves. These findings highlight the potential of sentiment analysis as a promising method to explore open-assess comments as a unique academic genre, deepening our understanding of academic writing and informing academic writing pedagogy, particularly in emerging hybrid genres such as OA comments.
情感分析是一种源自自然语言处理的计算方法,近年来在应用语言学领域引起了人们的兴趣,它被用作研究学术话语中评价性语言的工具。本研究应用情感分析来分析开放获取评论(OA评论),这是一种新颖的学术类型,旨在吸引跨学科的广泛读者。研究这些评论中的情绪是至关重要的,因为它揭示了学者如何不仅表达事实信息,还表达了他们对所讨论主题的情感和态度。该语料库包括发表在《自然》杂志上的361条开放获取评论。结果显示,在硬科学和软科学学科以及不同的辩论方式中,情绪得分存在显著差异。这些发现突出了情感分析作为一种有前途的方法的潜力,可以探索开放评估评论作为一种独特的学术类型,加深我们对学术写作的理解,并为学术写作教学提供信息,特别是在新兴的混合类型中,如OA评论。
{"title":"Exploring sentiment across disciplines and argumentative moves: A sentiment analysis of open-access comments","authors":"Wenjuan Qin ,&nbsp;Yueling Sun ,&nbsp;Tan Jin","doi":"10.1016/j.rmal.2025.100243","DOIUrl":"10.1016/j.rmal.2025.100243","url":null,"abstract":"<div><div>Sentiment analysis, a computational method originating from natural language processing, has recently gained interest in applied linguistics as a tool for examining evaluative language in academic discourse. This study applies sentiment analysis to analyzing open-access comments (OA comments), a novel academic genre designed to engage a broad readership across disciplines. Studying sentiment in these comments is crucial, as it reveals how scholars express not only factual information but also their emotion and attitudes towards the topics under discussion. The corpus includes 361 open-access comments published in <em>Nature</em>. The results reveal significant differences in sentiment scores across hard and soft science disciplines and in different argumentative moves. These findings highlight the potential of sentiment analysis as a promising method to explore open-assess comments as a unique academic genre, deepening our understanding of academic writing and informing academic writing pedagogy, particularly in emerging hybrid genres such as OA comments.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100243"},"PeriodicalIF":0.0,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144694345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing and piloting SemiMed—A resource for semi-technical medical vocabulary 开发和试点semimed -一个半技术医学词汇资源
Pub Date : 2025-07-22 DOI: 10.1016/j.rmal.2025.100239
Chinh Ngan Nguyen Le, Julia Miller
Semi-technical medical vocabulary—words that often convey different meanings depending on context—commonly poses challenges for teaching and learning. These difficulties are largely due to polysemy and homography, which are not fully addressed in conventional dictionaries or frequency wordlists. This study aimed to develop and pilot a new lexical resource, named SemiMed, that explicitly accounts for polysemy and homography in semi-technical medical vocabulary. The starting point was Hsu’s (2013) corpus-based Medical Word List, which is useful for the teaching and learning of words with single but not multiple meanings. Multi-meaning semi-technical medical words in Hsu’s list were analyzed using a lexical semantic approach to polysemy and homography. A corpus-based analysis followed, to quantify word meaning frequency. Cantos and Sanchez’s (2001) Lexical Constellations were then adapted to showcase intricate interrelations between general and specialized meanings of semi-technical medical words. To examine SemiMed’s usefulness, a pilot study was conducted where 18 EFL medical students were provided with lexical resources, including SemiMed samples and conventional dictionaries, to help them use appropriate vocabulary while role-playing targeted medical scenarios. Focus groups were conducted to gain participants’ feedback on the usefulness of SemiMed.
半技术性的医学词汇——通常根据上下文表达不同含义的词汇——通常给教学和学习带来挑战。这些困难主要是由于一词多义和同形异义,这在传统词典或频率词表中没有得到充分解决。本研究旨在开发和试点一个新的词汇资源,命名为SemiMed,明确说明了半技术医学词汇的一词多义和同形词。本文的出发点是Hsu(2013)基于语料库的医学词表,该词表适用于单义而非多义词汇的教学。从词汇语义的角度对徐氏词典中的多义半技术性医学词汇进行了分析。随后进行了基于语料库的分析,以量化词义频率。Cantos和Sanchez(2001)的词汇星座(Lexical constellation)随后被改编,以展示半技术性医学词汇的一般意义和专门意义之间复杂的相互关系。为了检验SemiMed的有用性,我们进行了一项试点研究,向18名英语医学院学生提供了词汇资源,包括SemiMed样本和传统词典,以帮助他们在角色扮演目标医学场景时使用适当的词汇。进行焦点小组以获得参与者对SemiMed有用性的反馈。
{"title":"Developing and piloting SemiMed—A resource for semi-technical medical vocabulary","authors":"Chinh Ngan Nguyen Le,&nbsp;Julia Miller","doi":"10.1016/j.rmal.2025.100239","DOIUrl":"10.1016/j.rmal.2025.100239","url":null,"abstract":"<div><div>Semi-technical medical vocabulary—words that often convey different meanings depending on context—commonly poses challenges for teaching and learning. These difficulties are largely due to polysemy and homography, which are not fully addressed in conventional dictionaries or frequency wordlists. This study aimed to develop and pilot a new lexical resource, named SemiMed, that explicitly accounts for polysemy and homography in semi-technical medical vocabulary. The starting point was Hsu’s (2013) corpus-based Medical Word List, which is useful for the teaching and learning of words with single but not multiple meanings. Multi-meaning semi-technical medical words in Hsu’s list were analyzed using a lexical semantic approach to polysemy and homography. A corpus-based analysis followed, to quantify word meaning frequency. Cantos and Sanchez’s (2001) Lexical Constellations were then adapted to showcase intricate interrelations between general and specialized meanings of semi-technical medical words. To examine SemiMed’s usefulness, a pilot study was conducted where 18 EFL medical students were provided with lexical resources, including SemiMed samples and conventional dictionaries, to help them use appropriate vocabulary while role-playing targeted medical scenarios. Focus groups were conducted to gain participants’ feedback on the usefulness of SemiMed.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100239"},"PeriodicalIF":0.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144685765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From black box to transparency: Enhancing automated interpreting assessment with explainable AI in college classrooms 从黑箱到透明:用可解释的人工智能加强大学课堂上的自动口译评估
Pub Date : 2025-07-14 DOI: 10.1016/j.rmal.2025.100237
Zhaokun Jiang , Ziyin Zhang
Recent advancements in machine learning have spurred growing interests in automated interpreting quality assessment. Nevertheless, existing research is subject to certain limitations, including the insufficient examination of language use quality, restricted modeling effectiveness due to data scarcity at the highest and lowest performance tiers, and a lack of efforts to explain model predictions. To address these gaps, the present study proposes a multi-dimensional modeling framework that integrates feature engineering, data augmentation, and explainable machine learning. This approach prioritizes explainability over “black box” predictions by utilizing only construct-relevant, transparent features and conducting SHAP analysis, an explainable AI (XAI) method. Our results demonstrated relatively strong predictive performance on a self-compiled English-Chinese consecutive interpreting dataset: XGBoost excelled in predicting fluency (ρ = 0.86, RMSE = 0.61) and target language use (ρ = 0.79, RMSE = 0.75), while Random Forest was optimal for modeling information completeness (ρ = 0.68, RMSE = 1.05). SHAP analysis identified the strongest predictive features for each dimension: BLEURT and CometKiwi scores for information completeness, pause-related features for fluency, and Chinese-specific phraseological diversity metrics for language use. Overall, this study presents a scalable, reliable, and transparent alternative to traditional human evaluation, holding significant implications for automated language assessment. Notably, the emphasis on explainability facilitates the provision of detailed diagnostic feedback for learners and supports self-regulated learning—advantages not afforded by automated scores in isolation.
机器学习的最新进展激发了人们对自动口译质量评估的兴趣。然而,现有的研究存在一定的局限性,包括对语言使用质量的检查不足,由于最高和最低性能层的数据稀缺而限制了建模的有效性,以及缺乏解释模型预测的努力。为了解决这些差距,本研究提出了一个多维建模框架,该框架集成了特征工程、数据增强和可解释的机器学习。这种方法通过只利用与结构相关的、透明的特征并进行SHAP分析(一种可解释的AI (XAI)方法),将可解释性优先于“黑匣子”预测。我们的研究结果在自编译的英汉交替传译数据集上显示了相对较强的预测性能:XGBoost在预测流利性(ρ = 0.86, RMSE = 0.61)和目标语言使用(ρ = 0.79, RMSE = 0.75)方面表现出色,而随机森林在建模信息完整性方面表现最佳(ρ = 0.68, RMSE = 1.05)。SHAP分析确定了每个维度的最强预测特征:BLEURT和CometKiwi得分用于信息完整性,暂停相关特征用于流利性,以及中文特定短语多样性指标用于语言使用。总的来说,这项研究提出了一种可扩展的、可靠的、透明的替代传统的人类评估的方法,对自动化语言评估具有重要意义。值得注意的是,对可解释性的强调有助于为学习者提供详细的诊断反馈,并支持自我调节的学习——这是孤立的自动评分所不能提供的优势。
{"title":"From black box to transparency: Enhancing automated interpreting assessment with explainable AI in college classrooms","authors":"Zhaokun Jiang ,&nbsp;Ziyin Zhang","doi":"10.1016/j.rmal.2025.100237","DOIUrl":"10.1016/j.rmal.2025.100237","url":null,"abstract":"<div><div>Recent advancements in machine learning have spurred growing interests in automated interpreting quality assessment. Nevertheless, existing research is subject to certain limitations, including the insufficient examination of language use quality, restricted modeling effectiveness due to data scarcity at the highest and lowest performance tiers, and a lack of efforts to explain model predictions. To address these gaps, the present study proposes a multi-dimensional modeling framework that integrates feature engineering, data augmentation, and explainable machine learning. This approach prioritizes explainability over “black box” predictions by utilizing only construct-relevant, transparent features and conducting SHAP analysis, an explainable AI (XAI) method. Our results demonstrated relatively strong predictive performance on a self-compiled English-Chinese consecutive interpreting dataset: XGBoost excelled in predicting fluency (<em>ρ</em> = 0.86, RMSE = 0.61) and target language use (<em>ρ</em> = 0.79, RMSE = 0.75), while Random Forest was optimal for modeling information completeness (<em>ρ</em> <strong>=</strong> 0.68, RMSE = 1.05). SHAP analysis identified the strongest predictive features for each dimension: BLEURT and CometKiwi scores for information completeness, pause-related features for fluency, and Chinese-specific phraseological diversity metrics for language use. Overall, this study presents a scalable, reliable, and transparent alternative to traditional human evaluation, holding significant implications for automated language assessment. Notably, the emphasis on explainability facilitates the provision of detailed diagnostic feedback for learners and supports self-regulated learning—advantages not afforded by automated scores in isolation.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100237"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144631669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying a polytomous Rasch model to investigate Likert scale functioning and L2 writing strategy use 运用多同体Rasch模型研究李克特量表功能和二语写作策略的使用
Pub Date : 2025-07-11 DOI: 10.1016/j.rmal.2025.100240
Apichat Khamboonruang
While Rasch models have been increasingly employed in applied linguistics research, their use remains underexplored in L2 writing strategy research, which has relied primarily on statistical methods that assume continuous data. This study aimed to address this methodological gap by applying a polytomous Rasch modelling approach to investigate Likert scale functioning in the context of L2 writing strategy use. Participants were 172 Thai EFL English-major undergraduates who completed a 26-item, 5-category Likert-type scale designed to measure five strategy domains: metacognitive, effort-regulation, cognitive, social, and affective strategies. The data were analysed using a Rasch rating scale model (RSM) implemented in Winsteps and Facets software programmes. The main results indicated that the RSM analysis provided sound evidence of appropriate item and category functioning, while revealing specific areas for refinement, such as limited item coverage, item redundancy, and category disordering. The RSM analysis also revealed systematic trends in Thai EFL students’ writing strategy use across domains and proficiency levels: metacognitive strategies were used most often and clearly differentiated higher- and lower-achieving students, while social strategies were less common and more frequently used by lower achievers. These findings highlight the value of a polytomous Rasch modelling approach in examining not only rating scale functioning but also writing strategy use. The present findings have implications for rating scale validation and L2 writing strategy instruction.
虽然Rasch模型在应用语言学研究中的应用越来越多,但在二语写作策略研究中的应用仍未得到充分探索,这主要依赖于假设连续数据的统计方法。本研究旨在解决这一方法学上的差距,采用多分体Rasch建模方法来调查二语写作策略使用背景下的李克特量表功能。参与者是172名泰国英语专业的本科生,他们完成了一份26项5类李克特式量表,该量表旨在测量五个策略领域:元认知策略、努力调节策略、认知策略、社会策略和情感策略。使用在Winsteps和Facets软件程序中实现的Rasch评定量表模型(RSM)对数据进行分析。主要结果表明,RSM分析为适当的项目和类别功能提供了可靠的证据,同时揭示了需要改进的特定领域,如有限的项目覆盖,项目冗余和类别无序。RSM分析还揭示了泰国英语学生在不同领域和熟练程度上使用写作策略的系统性趋势:元认知策略的使用频率最高,并且明显区分了成绩较高和较差的学生,而社会策略的使用频率较低,但成绩较差的学生使用频率较高。这些发现突出了多元拉赫建模方法的价值,不仅检查评分量表功能,而且还检查写作策略的使用。本研究结果对评定量表的验证和二语写作策略的指导具有启示意义。
{"title":"Applying a polytomous Rasch model to investigate Likert scale functioning and L2 writing strategy use","authors":"Apichat Khamboonruang","doi":"10.1016/j.rmal.2025.100240","DOIUrl":"10.1016/j.rmal.2025.100240","url":null,"abstract":"<div><div>While Rasch models have been increasingly employed in applied linguistics research, their use remains underexplored in L2 writing strategy research, which has relied primarily on statistical methods that assume continuous data. This study aimed to address this methodological gap by applying a polytomous Rasch modelling approach to investigate Likert scale functioning in the context of L2 writing strategy use. Participants were 172 Thai EFL English-major undergraduates who completed a 26-item, 5-category Likert-type scale designed to measure five strategy domains: metacognitive, effort-regulation, cognitive, social, and affective strategies. The data were analysed using a Rasch rating scale model (RSM) implemented in Winsteps and Facets software programmes. The main results indicated that the RSM analysis provided sound evidence of appropriate item and category functioning, while revealing specific areas for refinement, such as limited item coverage, item redundancy, and category disordering. The RSM analysis also revealed systematic trends in Thai EFL students’ writing strategy use across domains and proficiency levels: metacognitive strategies were used most often and clearly differentiated higher- and lower-achieving students, while social strategies were less common and more frequently used by lower achievers. These findings highlight the value of a polytomous Rasch modelling approach in examining not only rating scale functioning but also writing strategy use. The present findings have implications for rating scale validation and L2 writing strategy instruction.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100240"},"PeriodicalIF":0.0,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144597529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Indonesian cross-linguistic named entity recognition 印尼语跨语言命名实体识别
Pub Date : 2025-07-08 DOI: 10.1016/j.rmal.2025.100236
Danang Arbian Sulistyo , Aji Prasetya Wibawa , Didik Dwi Prasetya , Fadhli Almu’iini Ahda
This study examines the potential of Named Entity Recognition (NER) in translating cross-biblical texts of Indonesian, Madurese, and Javanese. The goal is to enhance translation precision by incorporating entity categorization. The approach involves training an NER model using Conditional Random Fields (CRF) and evaluating its performance on the Book of Joshua. The annotated dataset includes features such as word identity, shape, part-of-speech identifiers, and semantic information. Tagging the data with labels such as Person, Location, and Organization reveals variations in effectiveness across languages. Indonesian yields the highest F1 score (78.69), reflecting consistent performance across all parameters. Although Madurese achieves a high recall for Location entities (82.16), its precision is lower (74.99). Javanese demonstrates strong precision in identifying locations (77.46), but a slightly lower recall score (77.21). The findings suggest the need to tailor the NER model to suit the specific characteristics of low-resource languages for improved translation quality.
本研究探讨了命名实体识别(NER)在印尼语、马杜雷语和爪哇语跨圣经文本翻译中的潜力。目标是通过结合实体分类来提高翻译精度。该方法包括使用条件随机场(Conditional Random Fields, CRF)训练一个NER模型,并评估其在约书亚记上的表现。带注释的数据集包括单词标识、形状、词性标识符和语义信息等特征。用诸如Person、Location和Organization之类的标签标记数据,揭示了不同语言之间有效性的差异。印度尼西亚获得了最高的F1分数(78.69),反映了在所有参数上的一致表现。虽然Madurese对Location实体的查全率较高(82.16),但准确率较低(74.99)。爪哇语在识别位置方面表现出很高的精确度(77.46),但召回率略低(77.21)。研究结果表明,为了提高翻译质量,需要调整NER模型以适应低资源语言的具体特征。
{"title":"Indonesian cross-linguistic named entity recognition","authors":"Danang Arbian Sulistyo ,&nbsp;Aji Prasetya Wibawa ,&nbsp;Didik Dwi Prasetya ,&nbsp;Fadhli Almu’iini Ahda","doi":"10.1016/j.rmal.2025.100236","DOIUrl":"10.1016/j.rmal.2025.100236","url":null,"abstract":"<div><div>This study examines the potential of Named Entity Recognition (NER) in translating cross-biblical texts of Indonesian, Madurese, and Javanese. The goal is to enhance translation precision by incorporating entity categorization. The approach involves training an NER model using Conditional Random Fields (CRF) and evaluating its performance on the Book of Joshua. The annotated dataset includes features such as word identity, shape, part-of-speech identifiers, and semantic information. Tagging the data with labels such as Person, Location, and Organization reveals variations in effectiveness across languages. Indonesian yields the highest F1 score (78.69), reflecting consistent performance across all parameters. Although Madurese achieves a high recall for Location entities (82.16), its precision is lower (74.99). Javanese demonstrates strong precision in identifying locations (77.46), but a slightly lower recall score (77.21). The findings suggest the need to tailor the NER model to suit the specific characteristics of low-resource languages for improved translation quality.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100236"},"PeriodicalIF":0.0,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144579186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating LID in four test conditions - Do instructions, test formats and item positioning matter? 在四种测试条件下调查LID -说明,测试格式和项目定位重要吗?
Pub Date : 2025-07-04 DOI: 10.1016/j.rmal.2025.100233
Hung Tan Ha , Duyen Thi Bich Nguyen , Tim Stoeckel
Recent research has found the Updated Vocabulary Levels Test (UVLT) to have Local Item Dependence (LID), a violation to the central assumption of all Rasch and Item Response Theory models. LID in the UVLT is hypothesized to be caused by a feature of matching tasks: once an option is selected for one target word, it will not be selected for another. It is also hypothesized that if this feature is removed, LID will be reduced. The present study investigated the effects of LID in four test conditions. The first employed the 3:6 matching format of the UVLT with no instruction concerning option recycling. The second used the same format but with instructions encouraging option recycling. The third utilized a multiple-choice format, with items belonging to the same UVLT cluster using identical sets of 6 options and placed adjacently. The fourth also used a multiple-choice, 6-option format, but items sharing identical options were far apart, making them less “local”. Data from 231 Vietnamese EFL learners were analyzed using Rasch unidimensional modelling and Rasch Testlet Modelling (RTM). Person estimates from the unidimensional models and the general dimensions from the RTMs were compared and correlated. Substantial LID was present in Conditions 1–3. Significant distortions of person estimates were found in all test conditions. However, the findings showed that LID had a negligible impact on person ordering in all test conditions.
最近的研究发现,更新词汇水平测试(UVLT)具有局部项目依赖(LID),这违反了所有Rasch和项目反应理论模型的中心假设。假设UVLT中的LID是由匹配任务的一个特征引起的:一旦为一个目标单词选择了一个选项,就不会为另一个目标单词选择它。也有假设认为,如果这个特征被移除,LID将会减少。本研究考察了四种测试条件下LID的影响。第一个采用了UVLT的3:6匹配格式,没有关于选项回收的说明。第二组使用了相同的格式,但附带了鼓励选项回收的说明。第三个使用多项选择格式,属于同一UVLT集群的项目使用相同的6个选项集并相邻放置。第四个实验也采用了6个选项的多选题形式,但共享相同选项的项目相隔甚远,使其不那么“本地化”。采用Rasch一维模型和Rasch测试模型(RTM)对231名越南英语学习者的数据进行了分析。对单维模型的人员估计与rtm的一般维度进行了比较和关联。在条件1-3中存在大量LID。在所有测试条件下,对人的估计都存在显著的扭曲。然而,研究结果表明,在所有测试条件下,LID对人员订购的影响可以忽略不计。
{"title":"Investigating LID in four test conditions - Do instructions, test formats and item positioning matter?","authors":"Hung Tan Ha ,&nbsp;Duyen Thi Bich Nguyen ,&nbsp;Tim Stoeckel","doi":"10.1016/j.rmal.2025.100233","DOIUrl":"10.1016/j.rmal.2025.100233","url":null,"abstract":"<div><div>Recent research has found the Updated Vocabulary Levels Test (UVLT) to have Local Item Dependence (LID), a violation to the central assumption of all Rasch and Item Response Theory models. LID in the UVLT is hypothesized to be caused by a feature of matching tasks: once an option is selected for one target word, it will not be selected for another. It is also hypothesized that if this feature is removed, LID will be reduced. The present study investigated the effects of LID in four test conditions. The first employed the 3:6 matching format of the UVLT with no instruction concerning option recycling. The second used the same format but with instructions encouraging option recycling. The third utilized a multiple-choice format, with items belonging to the same UVLT cluster using identical sets of 6 options and placed adjacently. The fourth also used a multiple-choice, 6-option format, but items sharing identical options were far apart, making them less “local”. Data from 231 Vietnamese EFL learners were analyzed using Rasch unidimensional modelling and Rasch Testlet Modelling (RTM). Person estimates from the unidimensional models and the general dimensions from the RTMs were compared and correlated. Substantial LID was present in Conditions 1–3. Significant distortions of person estimates were found in all test conditions. However, the findings showed that LID had a negligible impact on person ordering in all test conditions.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100233"},"PeriodicalIF":0.0,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Do data collection methods matter for self-reported L2 individual differences questionnaires? In-person vs crowdsourced data 数据收集方法对自我报告的第二语言个体差异问卷有影响吗?面对面vs众包数据
Pub Date : 2025-07-03 DOI: 10.1016/j.rmal.2025.100235
Ruirui Jia , Ekaterina Sudina , Kejun Du
Crowdsourcing offers great advantages in data collection by enabling researchers to recruit a large number of participants across geographical boundaries within a short period of time. Despite the benefits of crowdsourcing, no study has explored its validity in collecting self-reported individual differences (ID) data in second language (L2) research. The present study aims to address this gap by examining crowdsourcing as a viable alternative or complementary tool to traditional in-person data collection. We recruited a total of 209 in-person and 209 crowdsourced participants for comparison. Both groups completed the short versions of the Foreign Language Classroom Anxiety Scale and the Foreign Language Enjoyment Scale, provided their demographic and language learning background information, and completed the LexTALE test. Measurement invariance testing revealed that most (sub)constructs exhibited partial or full invariance, indicating stability in the measurement systems across both data collection settings. However, crowdsourced participants reported higher enjoyment and lower anxiety than in-person participants. These differences can be attributed to the more relaxed mental state of the crowdsourced participants who completed the survey outside of the classroom. Moreover, some crowdsourced participants tended to overrate their English proficiency and exhibited potentially dishonest behavior during the LexTALE test. These findings suggest that although crowdsourcing offers valuable opportunities for data collection in L2 ID research, the potential for inflated self-assessments and questionable behavior in an unsupervised online testing environment must be considered. Thus, the use of crowdsourcing platforms to collect self-reported L2 ID data requires caution and careful preparation.
众包在数据收集方面具有很大的优势,它使研究人员能够在短时间内招募到大量跨越地理边界的参与者。尽管众包有很多好处,但还没有研究探讨它在第二语言研究中收集自我报告的个体差异(ID)数据的有效性。本研究旨在通过检验众包作为传统现场数据收集的可行替代或补充工具来解决这一差距。我们共招募了209名现场参与者和众包参与者进行比较。两组都完成了简短版的外语课堂焦虑量表和外语享受量表,提供了他们的人口统计和语言学习背景信息,并完成了LexTALE测试。测量不变性测试显示,大多数(子)结构表现出部分或完全不变性,表明测量系统在两种数据收集设置中的稳定性。然而,与面对面的参与者相比,众包参与者报告了更高的享受和更低的焦虑。这些差异可以归因于在课堂外完成调查的众包参与者更放松的精神状态。此外,一些众包参与者倾向于高估自己的英语水平,并在LexTALE测试中表现出潜在的不诚实行为。这些发现表明,尽管众包为L2 ID研究中的数据收集提供了宝贵的机会,但必须考虑在无监督的在线测试环境中夸大自我评估和可疑行为的可能性。因此,使用众包平台收集自我报告的L2 ID数据需要谨慎和仔细的准备。
{"title":"Do data collection methods matter for self-reported L2 individual differences questionnaires? In-person vs crowdsourced data","authors":"Ruirui Jia ,&nbsp;Ekaterina Sudina ,&nbsp;Kejun Du","doi":"10.1016/j.rmal.2025.100235","DOIUrl":"10.1016/j.rmal.2025.100235","url":null,"abstract":"<div><div>Crowdsourcing offers great advantages in data collection by enabling researchers to recruit a large number of participants across geographical boundaries within a short period of time. Despite the benefits of crowdsourcing, no study has explored its validity in collecting self-reported individual differences (ID) data in second language (L2) research. The present study aims to address this gap by examining crowdsourcing as a viable alternative or complementary tool to traditional in-person data collection. We recruited a total of 209 in-person and 209 crowdsourced participants for comparison. Both groups completed the short versions of the Foreign Language Classroom Anxiety Scale and the Foreign Language Enjoyment Scale, provided their demographic and language learning background information, and completed the LexTALE test. Measurement invariance testing revealed that most (sub)constructs exhibited partial or full invariance, indicating stability in the measurement systems across both data collection settings. However, crowdsourced participants reported higher enjoyment and lower anxiety than in-person participants. These differences can be attributed to the more relaxed mental state of the crowdsourced participants who completed the survey outside of the classroom. Moreover, some crowdsourced participants tended to overrate their English proficiency and exhibited potentially dishonest behavior during the LexTALE test. These findings suggest that although crowdsourcing offers valuable opportunities for data collection in L2 ID research, the potential for inflated self-assessments and questionable behavior in an unsupervised online testing environment must be considered. Thus, the use of crowdsourcing platforms to collect self-reported L2 ID data requires caution and careful preparation.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100235"},"PeriodicalIF":0.0,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144549574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Oral task repetition research via videoconferencing 基于视频会议的口语任务重复研究
Pub Date : 2025-07-03 DOI: 10.1016/j.rmal.2025.100232
Joe Kakitani
A substantial body of research has demonstrated the benefits of oral task repetition in enhancing second language (L2) performance. However, empirical studies investigating its effects on L2 development through longitudinal designs remain limited. This limitation may be partly due to the methodological challenges of traditional classroom- and laboratory-based research, such as participant attrition and scheduling difficulties. This paper explores the potential of online oral experimentation via videoconferencing—experiments conducted through synchronous computer-mediated communication using platforms like Zoom and Microsoft Teams—to advance L2 oral task repetition research. After reviewing research on task repetition and the methodological characteristics of conventional classroom- and laboratory-based studies that may present challenges within this domain, this article discusses the advantages of online experiments conducted via videoconferencing, including greater convenience and flexibility, increased efficiency, improved control of extraneous factors, and automated speech transcription. In addition, it examines the ecological validity of online video-based oral experiments. Methodological recommendations are also provided to help researchers address some of the challenges associated with conducting experiments via videoconferencing.
大量的研究已经证明口语任务重复在提高第二语言(L2)表现方面的好处。然而,通过纵向设计调查其对第二语言发展影响的实证研究仍然有限。这种限制可能部分是由于传统的基于课堂和实验室的研究在方法上的挑战,例如参与者流失和安排困难。本文探讨了通过视频会议进行在线口语实验的潜力——使用Zoom和Microsoft teams等平台通过同步计算机媒介通信进行的实验——以推进第二语言口语任务重复研究。在回顾了关于任务重复的研究以及传统的基于课堂和实验室的研究的方法特征后,本文讨论了通过视频会议进行在线实验的优势,包括更大的便利性和灵活性,提高效率,改进对外来因素的控制,以及自动语音转录。此外,它还检验了基于在线视频的口腔实验的生态有效性。还提供了方法建议,以帮助研究人员解决与通过视频会议进行实验相关的一些挑战。
{"title":"Oral task repetition research via videoconferencing","authors":"Joe Kakitani","doi":"10.1016/j.rmal.2025.100232","DOIUrl":"10.1016/j.rmal.2025.100232","url":null,"abstract":"<div><div>A substantial body of research has demonstrated the benefits of oral task repetition in enhancing second language (L2) performance. However, empirical studies investigating its effects on L2 development through longitudinal designs remain limited. This limitation may be partly due to the methodological challenges of traditional classroom- and laboratory-based research, such as participant attrition and scheduling difficulties. This paper explores the potential of online oral experimentation via videoconferencing—experiments conducted through synchronous computer-mediated communication using platforms like Zoom and Microsoft Teams—to advance L2 oral task repetition research. After reviewing research on task repetition and the methodological characteristics of conventional classroom- and laboratory-based studies that may present challenges within this domain, this article discusses the advantages of online experiments conducted via videoconferencing, including greater convenience and flexibility, increased efficiency, improved control of extraneous factors, and automated speech transcription. In addition, it examines the ecological validity of online video-based oral experiments. Methodological recommendations are also provided to help researchers address some of the challenges associated with conducting experiments via videoconferencing.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100232"},"PeriodicalIF":0.0,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144549575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leading a scoping review on L2 pronunciation: Some key elements of methodology 领导二语发音的范围审查:方法论的一些关键要素
Pub Date : 2025-07-02 DOI: 10.1016/j.rmal.2025.100216
Linda Terrier , Marie Garnier , Saandia Ali
This article describes the methodology of a scoping review covering 25 years of research on L2 English pronunciation. We focus on two key methodological steps required in any scoping review: identifying the information source and selecting the studies. We present a rationale for employing a manual search across prominent journals in the fields of phonetics and phonology, second language acquisition, and second language learning and teaching. We describe how we delineated the scope of the review by identifying 35 prominent journals and how we organized teamwork to select relevant studies. We show that seemingly straightforward inclusion criteria (L2 English, empirical research, and pronunciation) raise questions about the objects of study in the field. The final corpus includes 463 articles published in the 35 identified journals between 1996 and 2020. We demonstrate that Arksey and O’Malley’s framework for scoping reviews can be applied and adapted to the specificities of L2 English pronunciation research, but we also highlight the challenge of iterativity in study selection. As we present the distribution of articles over time and across journals, we make recommendations for future scoping reviews regarding the time span of the review and the identification of the initial information source. In particular, the Journal of Second Language Pronunciation, which stands out as a central venue for L2 English pronunciation research, would have been missed had we used a more typical keyword search across academic databases.
本文描述了一项涵盖25年第二语言英语发音研究的范围审查方法。我们集中在两个关键的方法学步骤需要在任何范围审查:确定信息来源和选择研究。我们提出了在语音学、第二语言习得和第二语言学习与教学领域采用人工搜索的基本原理。我们描述了我们如何通过识别35种著名期刊来划定综述的范围,以及我们如何组织团队来选择相关的研究。我们表明,看似简单的纳入标准(第二语言英语、实证研究和发音)引发了对该领域研究对象的质疑。最终的语料库包括1996年至2020年间在35个确定的期刊上发表的463篇文章。我们证明了Arksey和O 'Malley的范围审查框架可以应用并适应第二语言英语发音研究的特殊性,但我们也强调了研究选择中的迭代性挑战。当我们呈现文章随时间和期刊的分布时,我们就审查的时间跨度和初始信息源的确定为未来的范围审查提出建议。特别是《第二语言发音杂志》,它作为第二语言英语发音研究的中心场所,如果我们在学术数据库中使用更典型的关键词搜索,就会错过它。
{"title":"Leading a scoping review on L2 pronunciation: Some key elements of methodology","authors":"Linda Terrier ,&nbsp;Marie Garnier ,&nbsp;Saandia Ali","doi":"10.1016/j.rmal.2025.100216","DOIUrl":"10.1016/j.rmal.2025.100216","url":null,"abstract":"<div><div>This article describes the methodology of a scoping review covering 25 years of research on L2 English pronunciation. We focus on two key methodological steps required in any scoping review: identifying the information source and selecting the studies. We present a rationale for employing a manual search across prominent journals in the fields of phonetics and phonology, second language acquisition, and second language learning and teaching. We describe how we delineated the scope of the review by identifying 35 prominent journals and how we organized teamwork to select relevant studies. We show that seemingly straightforward inclusion criteria (L2 English, empirical research, and pronunciation) raise questions about the objects of study in the field. The final corpus includes 463 articles published in the 35 identified journals between 1996 and 2020. We demonstrate that Arksey and O’Malley’s framework for scoping reviews can be applied and adapted to the specificities of L2 English pronunciation research, but we also highlight the challenge of iterativity in study selection. As we present the distribution of articles over time and across journals, we make recommendations for future scoping reviews regarding the time span of the review and the identification of the initial information source. In particular, the <em>Journal of Second Language Pronunciation</em>, which stands out as a central venue for L2 English pronunciation research, would have been missed had we used a more typical keyword search across academic databases.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100216"},"PeriodicalIF":0.0,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144524220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Research Methods in Applied Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1