Pub Date : 2024-08-08DOI: 10.1177/02655322241239379
Rie Koizumi, Ryo Maie, Akifumi Yanagisawa, Yo In’nami
{"title":"Considerations to promote and accelerate Open Science: A response to Winke","authors":"Rie Koizumi, Ryo Maie, Akifumi Yanagisawa, Yo In’nami","doi":"10.1177/02655322241239379","DOIUrl":"https://doi.org/10.1177/02655322241239379","url":null,"abstract":"","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"14 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141941876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1177/02655322241239377
Carol A. Chapelle, Gary J. Ockey
{"title":"Open Science in language assessment research contexts: A reply to Winke","authors":"Carol A. Chapelle, Gary J. Ockey","doi":"10.1177/02655322241239377","DOIUrl":"https://doi.org/10.1177/02655322241239377","url":null,"abstract":"","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"11 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141941883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1177/02655322241232361
Spiros Papageorgiou
In this letter, I first present examples of the adoption of Open Science by the language assessment industry. I then discuss some of the inevitable challenges language assessment professionals face as they continue to adopt Open Science.
{"title":"Can language test providers do more to support open science? A response to Winke","authors":"Spiros Papageorgiou","doi":"10.1177/02655322241232361","DOIUrl":"https://doi.org/10.1177/02655322241232361","url":null,"abstract":"In this letter, I first present examples of the adoption of Open Science by the language assessment industry. I then discuss some of the inevitable challenges language assessment professionals face as they continue to adopt Open Science.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"8 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141941875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1177/02655322241255709
J. Dylan Burton
Nonverbal behavior can impact language proficiency scores in speaking tests, but there is little empirical information of the size or consistency of its effects or whether language proficiency may be a moderating variable. In this study, 100 novice raters watched and scored 30 recordings of test takers taking an international, high stakes proficiency test. The speech samples were each 2 minutes long and ranged in proficiency levels. The raters scored each sample on fluency, vocabulary, grammar, and comprehensibility using 7-point semantic differential scales. Nonverbal behavior was extracted using an automated machine learning software called iMotions, and data was analyzed with ordinal mixed effects regression. Results showed that attentional variance predicted fluency, vocabulary, and grammar scores, but only when accounting for proficiency. Higher standard deviations of attention corresponded with lower scores for the lower-proficiency group, but not the mid/higher-proficiency group. Comprehensibility scores were only predicted by mean valence when proficiency was an interaction term. Higher mean valence, or positive emotional behavior, corresponded with higher scores in the lower-proficiency group, but not the mid/higher-proficiency group. Effect sizes for these predictors were quite small, with small amounts of variance explained. These results have implications for construct representation and test fairness.
{"title":"Evaluating the impact of nonverbal behavior on language ability ratings","authors":"J. Dylan Burton","doi":"10.1177/02655322241255709","DOIUrl":"https://doi.org/10.1177/02655322241255709","url":null,"abstract":"Nonverbal behavior can impact language proficiency scores in speaking tests, but there is little empirical information of the size or consistency of its effects or whether language proficiency may be a moderating variable. In this study, 100 novice raters watched and scored 30 recordings of test takers taking an international, high stakes proficiency test. The speech samples were each 2 minutes long and ranged in proficiency levels. The raters scored each sample on fluency, vocabulary, grammar, and comprehensibility using 7-point semantic differential scales. Nonverbal behavior was extracted using an automated machine learning software called iMotions, and data was analyzed with ordinal mixed effects regression. Results showed that attentional variance predicted fluency, vocabulary, and grammar scores, but only when accounting for proficiency. Higher standard deviations of attention corresponded with lower scores for the lower-proficiency group, but not the mid/higher-proficiency group. Comprehensibility scores were only predicted by mean valence when proficiency was an interaction term. Higher mean valence, or positive emotional behavior, corresponded with higher scores in the lower-proficiency group, but not the mid/higher-proficiency group. Effect sizes for these predictors were quite small, with small amounts of variance explained. These results have implications for construct representation and test fairness.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"23 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141941877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1177/02655322231211159
Paula Winke
The Open Science movement is taking hold around the world, and language testers are taking part. In this viewpoint, I discuss how sharing, collaborating, and building trust, guided by Open Science principles, benefit the language testing field. To help more language testers join in, I present a standard definition of Open Science and describe four ways language testing researchers can immediately partake. Overall, I share my views on how Open Science is an accelerating process that improves language testing as a scientific and humanistic field.
{"title":"Sharing, collaborating, and building trust: How Open Science advances language testing","authors":"Paula Winke","doi":"10.1177/02655322231211159","DOIUrl":"https://doi.org/10.1177/02655322231211159","url":null,"abstract":"The Open Science movement is taking hold around the world, and language testers are taking part. In this viewpoint, I discuss how sharing, collaborating, and building trust, guided by Open Science principles, benefit the language testing field. To help more language testers join in, I present a standard definition of Open Science and describe four ways language testing researchers can immediately partake. Overall, I share my views on how Open Science is an accelerating process that improves language testing as a scientific and humanistic field.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"59 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141941878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1177/02655322241260121
Atta Gebril, Maha Bali
{"title":"A Global South perspective on Open Science in language assessment: A response to Paula Winke","authors":"Atta Gebril, Maha Bali","doi":"10.1177/02655322241260121","DOIUrl":"https://doi.org/10.1177/02655322241260121","url":null,"abstract":"","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"15 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141941884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1177/02655322241261716
Geoffrey T. LaFlair
Open science practices are now at the forefront of discussions in the applied linguistics research community. Proponents of open science argue for its potential to enhance research quality and accessibility while promoting a collaborative and equitable environment. Winke advocates for integrating open science into language assessment research to enhance research quality, accessibility, and collaboration. This response introduces two additional perspectives to support open science practices. The first is a framework, which identifies five schools of thought on open science that emphasize understanding the various goals of open science and the scientific methods and tools that are used to pursue them. Second, I highlight two additional characteristics of open science: the need for community and the costs of open science. These additional perspectives underscore the significance of making research processes transparent and inclusive, extending beyond traditional academic boundaries to engage the public and industry stakeholders. By integrating these considerations, this response aims to offer a nuanced view of the challenges and opportunities that open science presents in the field of language assessment, suggesting ideas for how researchers outside and inside the language assessment industry can work toward improving open science practices in language assessment research.
{"title":"An industry perspective on open science: A response to Winke (2024)","authors":"Geoffrey T. LaFlair","doi":"10.1177/02655322241261716","DOIUrl":"https://doi.org/10.1177/02655322241261716","url":null,"abstract":"Open science practices are now at the forefront of discussions in the applied linguistics research community. Proponents of open science argue for its potential to enhance research quality and accessibility while promoting a collaborative and equitable environment. Winke advocates for integrating open science into language assessment research to enhance research quality, accessibility, and collaboration. This response introduces two additional perspectives to support open science practices. The first is a framework, which identifies five schools of thought on open science that emphasize understanding the various goals of open science and the scientific methods and tools that are used to pursue them. Second, I highlight two additional characteristics of open science: the need for community and the costs of open science. These additional perspectives underscore the significance of making research processes transparent and inclusive, extending beyond traditional academic boundaries to engage the public and industry stakeholders. By integrating these considerations, this response aims to offer a nuanced view of the challenges and opportunities that open science presents in the field of language assessment, suggesting ideas for how researchers outside and inside the language assessment industry can work toward improving open science practices in language assessment research.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"135 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141941784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-31DOI: 10.1177/02655322241263629
Ping-Lin Chuang
This experimental study explores how source use features impact raters’ judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were adapted from essays written by EPT test-takers. These responses were crafted to reflect different conditions of source use features, namely source use quantity and quality. Rater scores were analyzed using the many-facet Rasch model and mixed two-way analyses of variance (ANOVAs) to examine how they are affected by source use features and rater experience. Results show that source use features impacted the argumentation scores assigned by raters. Paragraphs with more source text ideas that are better incorporated received the highest argumentation scores, and vice versa for those with limited, poorly integrated source information. Rater experience impacted scores but did not influence rater performance meaningfully. The findings of this study connect specific source use features with raters’ evaluation of argumentation, helping to further disentangle the relationships among examinee performance, rater decision, and task features of integrated argumentative writing tests. They also provide meaningful implications for writing assessment research and practices.
{"title":"Do source use features impact raters’ judgment of argumentation? An experimental study","authors":"Ping-Lin Chuang","doi":"10.1177/02655322241263629","DOIUrl":"https://doi.org/10.1177/02655322241263629","url":null,"abstract":"This experimental study explores how source use features impact raters’ judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were adapted from essays written by EPT test-takers. These responses were crafted to reflect different conditions of source use features, namely source use quantity and quality. Rater scores were analyzed using the many-facet Rasch model and mixed two-way analyses of variance (ANOVAs) to examine how they are affected by source use features and rater experience. Results show that source use features impacted the argumentation scores assigned by raters. Paragraphs with more source text ideas that are better incorporated received the highest argumentation scores, and vice versa for those with limited, poorly integrated source information. Rater experience impacted scores but did not influence rater performance meaningfully. The findings of this study connect specific source use features with raters’ evaluation of argumentation, helping to further disentangle the relationships among examinee performance, rater decision, and task features of integrated argumentative writing tests. They also provide meaningful implications for writing assessment research and practices.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"177 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141872498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-30DOI: 10.1177/02655322241263628
Hung Tan Ha, Duyen Thi Bich Nguyen, Tim Stoeckel
Word frequency has a long history of being considered the most important predictor of word difficulty and has served as a guideline for several aspects of second language vocabulary teaching, learning, and assessment. However, recent empirical research has challenged the supremacy of frequency as a predictor of word difficulty. Accordingly, applied linguists have questioned the use of frequency as the principal criterion in the development of wordlists and vocabulary tests. Despite being informative, previous studies on the topic have been limited in the way the researchers measured word difficulty and the statistical techniques they employed for exploratory data analysis. In the current study, meaning recall was used as a measure of word difficulty, and random forest was employed to examine the importance of various lexical sophistication metrics in predicting word difficulty. The results showed that frequency was not the most important predictor of word difficulty. Due to the limited scope, research findings are only generalizable to Vietnamese learners of English.
{"title":"What is the best predictor of word difficulty? A case of data mining using random forest","authors":"Hung Tan Ha, Duyen Thi Bich Nguyen, Tim Stoeckel","doi":"10.1177/02655322241263628","DOIUrl":"https://doi.org/10.1177/02655322241263628","url":null,"abstract":"Word frequency has a long history of being considered the most important predictor of word difficulty and has served as a guideline for several aspects of second language vocabulary teaching, learning, and assessment. However, recent empirical research has challenged the supremacy of frequency as a predictor of word difficulty. Accordingly, applied linguists have questioned the use of frequency as the principal criterion in the development of wordlists and vocabulary tests. Despite being informative, previous studies on the topic have been limited in the way the researchers measured word difficulty and the statistical techniques they employed for exploratory data analysis. In the current study, meaning recall was used as a measure of word difficulty, and random forest was employed to examine the importance of various lexical sophistication metrics in predicting word difficulty. The results showed that frequency was not the most important predictor of word difficulty. Due to the limited scope, research findings are only generalizable to Vietnamese learners of English.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"78 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141872499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1177/02655322241261415
Amber Dudley, Emma Marsden, Giulia Bovolenta
Vocabulary knowledge strongly predicts second language reading, listening, writing, and speaking. Yet, few tests have been developed to assess vocabulary knowledge in French. The primary aim of this pilot study was to design and initially validate the Context-Aligned Two Thousand Test (CA-TTT), following open research practices. The CA-TTT is a test of written form–meaning recognition of high-frequency vocabulary aimed at beginner-to-low intermediate learners of French at the end of their fifth year of secondary education. Using an argument-based validation framework, we drew on classical test theory and Rasch modeling, together with correlations with another vocabulary size test and proficiency measures, to assess the CA-TTT’s internal and external validity. Overall, the CA-TTT showed high internal and external validity. Our study highlighted the decisive role of the curriculum in determining vocabulary knowledge in instructed, low-exposure contexts. We discuss how this might contribute to under- or over-estimations of vocabulary size, depending on the relations between the test and curriculum content. Further research using the tool is openly invited, particularly with lower proficiency learners in this context. Following further validation, the test could serve as a tool for assessing high-frequency vocabulary knowledge at beginner-to-low intermediate levels, with due attention paid to alignment with curriculum content.
{"title":"A Context-Aligned Two Thousand Test: Toward estimating high-frequency French vocabulary knowledge for beginner-to-low intermediate proficiency adolescent learners in England","authors":"Amber Dudley, Emma Marsden, Giulia Bovolenta","doi":"10.1177/02655322241261415","DOIUrl":"https://doi.org/10.1177/02655322241261415","url":null,"abstract":"Vocabulary knowledge strongly predicts second language reading, listening, writing, and speaking. Yet, few tests have been developed to assess vocabulary knowledge in French. The primary aim of this pilot study was to design and initially validate the Context-Aligned Two Thousand Test (CA-TTT), following open research practices. The CA-TTT is a test of written form–meaning recognition of high-frequency vocabulary aimed at beginner-to-low intermediate learners of French at the end of their fifth year of secondary education. Using an argument-based validation framework, we drew on classical test theory and Rasch modeling, together with correlations with another vocabulary size test and proficiency measures, to assess the CA-TTT’s internal and external validity. Overall, the CA-TTT showed high internal and external validity. Our study highlighted the decisive role of the curriculum in determining vocabulary knowledge in instructed, low-exposure contexts. We discuss how this might contribute to under- or over-estimations of vocabulary size, depending on the relations between the test and curriculum content. Further research using the tool is openly invited, particularly with lower proficiency learners in this context. Following further validation, the test could serve as a tool for assessing high-frequency vocabulary knowledge at beginner-to-low intermediate levels, with due attention paid to alignment with curriculum content.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"53 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}