{"title":"The analysis of marking reliability through the approach of gauge repeatability and reproducibility (GR&R) study: a case of English-speaking test","authors":"Pornphan Sureeyatanapas, Panitas Sureeyatanapas, Uthumporn Panitanarak, Jittima Kraisriwattana, Patchanan Sarootyanapat, Daniel O’Connell","doi":"10.1186/s40468-023-00271-z","DOIUrl":"https://doi.org/10.1186/s40468-023-00271-z","url":null,"abstract":"","PeriodicalId":37050,"journal":{"name":"Language Testing in Asia","volume":"4 8","pages":"1-28"},"PeriodicalIF":2.8,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139438512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EFL teachers’ cognition of social and psychological consequences of high-stake national language tests: role of teacher training workshops","authors":"Rahmatolah Allahyari, Mahmoud Moradi Abbasabady, Shamim Akhter, Goudarz Alibakhshi","doi":"10.1186/s40468-023-00262-0","DOIUrl":"https://doi.org/10.1186/s40468-023-00262-0","url":null,"abstract":"","PeriodicalId":37050,"journal":{"name":"Language Testing in Asia","volume":"18 1","pages":"1-16"},"PeriodicalIF":2.8,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139215304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-09DOI: 10.1186/s40468-023-00269-7
Dongil Shin, Soohyeon Park, Eunhae Cho
{"title":"Correction: A review study on discourse-analytical approaches to language testing policy in the South Korean context","authors":"Dongil Shin, Soohyeon Park, Eunhae Cho","doi":"10.1186/s40468-023-00269-7","DOIUrl":"https://doi.org/10.1186/s40468-023-00269-7","url":null,"abstract":"","PeriodicalId":37050,"journal":{"name":"Language Testing in Asia","volume":" 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135242270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-07DOI: 10.1186/s40468-023-00266-w
Abu Nawas, I Gusti Ngurah Darmawan, Nina Maadad
Abstract The greater emphasis on the significance and difference in English performance between the school types has mainly been investigated across Asian countries. However, not much is known about what language skills differentiate their overall language achievement. Using a quantitative study with comparative analysis, this study measured the reading and listening skills of 1319 Indonesian students who were selected using a stratified sample design and grouped them into secular ( Sekolah , n = 726) and Islamic ( Madrasah , n = 593) groups. The samples were selected from 9205 of the total population of secondary school students, in Bone Regency, South Sulawesi Indonesia. The three-way ANOVA results showed a significant difference ( p < 0.05) in reading and listening subskills between the groups. Highly significant results of Madrasah students in reading and listening subskills indicate they are better at constructing what text means in a variety of contexts, as a literary experience in reading texts and obtaining general and specific information from listening tests compared to those attending secular schools. Poor performance of boys and students who enrolled in public secular schools may become the main explanation for achievement gaps across the groups. The main and interaction effects of the school system, sectors, and gender on the tested subskills were also explained in this study. Additionally, the result of the DIF test confirmed that the equity of the tested items between them was supported.
对不同学校类型学生英语表现的重要性和差异的研究主要集中在亚洲国家。然而,对于语言技能如何区分他们的整体语言成就,我们知之甚少。本研究采用定量研究与比较分析相结合的方法,对1319名印度尼西亚学生的阅读和听力技能进行了测量,这些学生采用分层抽样设计,并将他们分为世俗(Sekolah, n = 726)和伊斯兰(Madrasah, n = 593)两组。样本选自印度尼西亚南苏拉威西省Bone Regency的9205名中学生。三因素方差分析结果显示差异有统计学意义(p <(0.05)组间阅读和听力子技能差异显著。伊斯兰学校学生在阅读和听力子技能方面的显著成绩表明,与世俗学校的学生相比,他们更善于在各种语境中构建文本的含义,作为阅读文本的文学体验,并从听力测试中获得一般和具体信息。男生和就读于公立世俗学校的学生表现不佳,可能成为各群体之间成绩差距的主要原因。本研究还解释了学校制度、部门和性别对测试子技能的主要影响和交互影响。此外,DIF测试的结果证实了他们之间的测试项目的公平性是支持的。
{"title":"Indonesian secular vs. Madrasah schools: assessing the discrepancy in English reading and listening tests","authors":"Abu Nawas, I Gusti Ngurah Darmawan, Nina Maadad","doi":"10.1186/s40468-023-00266-w","DOIUrl":"https://doi.org/10.1186/s40468-023-00266-w","url":null,"abstract":"Abstract The greater emphasis on the significance and difference in English performance between the school types has mainly been investigated across Asian countries. However, not much is known about what language skills differentiate their overall language achievement. Using a quantitative study with comparative analysis, this study measured the reading and listening skills of 1319 Indonesian students who were selected using a stratified sample design and grouped them into secular ( Sekolah , n = 726) and Islamic ( Madrasah , n = 593) groups. The samples were selected from 9205 of the total population of secondary school students, in Bone Regency, South Sulawesi Indonesia. The three-way ANOVA results showed a significant difference ( p < 0.05) in reading and listening subskills between the groups. Highly significant results of Madrasah students in reading and listening subskills indicate they are better at constructing what text means in a variety of contexts, as a literary experience in reading texts and obtaining general and specific information from listening tests compared to those attending secular schools. Poor performance of boys and students who enrolled in public secular schools may become the main explanation for achievement gaps across the groups. The main and interaction effects of the school system, sectors, and gender on the tested subskills were also explained in this study. Additionally, the result of the DIF test confirmed that the equity of the tested items between them was supported.","PeriodicalId":37050,"journal":{"name":"Language Testing in Asia","volume":"52 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135476490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.1186/s40468-023-00268-8
Karim Ibrahim
{"title":"Correction: Using AI-based detectors to control AI-assisted plagiarism in ESL writing: “The Terminator Versus the Machines”","authors":"Karim Ibrahim","doi":"10.1186/s40468-023-00268-8","DOIUrl":"https://doi.org/10.1186/s40468-023-00268-8","url":null,"abstract":"","PeriodicalId":37050,"journal":{"name":"Language Testing in Asia","volume":"298 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135814864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-26DOI: 10.1186/s40468-023-00265-x
Wenfeng Jia, Peixin Zhang
Abstract It is widely believed that raters’ cognition is an important aspect of writing assessment, as it has both logical and temporal priority over scores. Based on a critical review of previous research in this area, it is found that raters’ cognition can be boiled to two fundamental issues: building text images and strategies for articulating scores. Compared to the scoring contexts of previous research, the TEM 8 integrated writing task scoring scale has unique features. It is urgent to know how raters build text images and how they articulate scores for text images in the specific context of rating TEM8 compositions. In order to answer these questions, the present study conducted qualitative research by considering raters as problem solvers in the light of problem-solving theory. Hence, 6 highly experienced raters were asked to verbalize their thoughts simultaneously while rating TEM 8 essays, supplemented by a retrospective interview. Analyzing the collected protocols, we found that with regard to research question 1, the raters went through two stages by setting building text images as isolated nodes and building holistic text images for each dimension as two sub-goals, respectively. In order to achieve the first sub-goal, raters used strategies such as single foci evaluating, diagnosing, and comparing; for the second sub-goal, they mainly used synthesizing and comparing. Regarding the second question, the results showed that they resorted to two groups of strategies: demarcating boundaries between scores within a dimension and discriminating between dimensions, each group consisting of more specific processes. Each of the extracted processes was defined clearly and their relationships were delineated, on the basis of which a new working model of the rating process was finalized. Overall, the present study deepens our understanding of rating processes and provides evidence for the scoring validity of the TEM 8 integrated writing test. It also provides implications for rating practice, such as the need for the distinction between two types of analytical rating scales.
{"title":"Rater cognitive processes in integrated writing tasks: from the perspective of problem-solving","authors":"Wenfeng Jia, Peixin Zhang","doi":"10.1186/s40468-023-00265-x","DOIUrl":"https://doi.org/10.1186/s40468-023-00265-x","url":null,"abstract":"Abstract It is widely believed that raters’ cognition is an important aspect of writing assessment, as it has both logical and temporal priority over scores. Based on a critical review of previous research in this area, it is found that raters’ cognition can be boiled to two fundamental issues: building text images and strategies for articulating scores. Compared to the scoring contexts of previous research, the TEM 8 integrated writing task scoring scale has unique features. It is urgent to know how raters build text images and how they articulate scores for text images in the specific context of rating TEM8 compositions. In order to answer these questions, the present study conducted qualitative research by considering raters as problem solvers in the light of problem-solving theory. Hence, 6 highly experienced raters were asked to verbalize their thoughts simultaneously while rating TEM 8 essays, supplemented by a retrospective interview. Analyzing the collected protocols, we found that with regard to research question 1, the raters went through two stages by setting building text images as isolated nodes and building holistic text images for each dimension as two sub-goals, respectively. In order to achieve the first sub-goal, raters used strategies such as single foci evaluating, diagnosing, and comparing; for the second sub-goal, they mainly used synthesizing and comparing. Regarding the second question, the results showed that they resorted to two groups of strategies: demarcating boundaries between scores within a dimension and discriminating between dimensions, each group consisting of more specific processes. Each of the extracted processes was defined clearly and their relationships were delineated, on the basis of which a new working model of the rating process was finalized. Overall, the present study deepens our understanding of rating processes and provides evidence for the scoring validity of the TEM 8 integrated writing test. It also provides implications for rating practice, such as the need for the distinction between two types of analytical rating scales.","PeriodicalId":37050,"journal":{"name":"Language Testing in Asia","volume":"8 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136381599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-23DOI: 10.1186/s40468-023-00263-z
Yunlong Liu, Yaqiong Cui, Hua Yu
Abstract Many Chinese universities are implementing an educational reform to transform their general English courses into academic English courses. Accordingly, how to assess students’ English ability should also be reformed. In this test review, we introduce a school-based general academic English summative test developed by English instructors. An argument-based approach was adopted to analyze the test validity by obtaining students’ test data and their reflective responses to the test. This review can provide a practical reference for the development of a valid general academic English summative test by including practices of course instructors and voices of students, two important test stakeholders, in test design.
{"title":"Test review: a general academic English summative test","authors":"Yunlong Liu, Yaqiong Cui, Hua Yu","doi":"10.1186/s40468-023-00263-z","DOIUrl":"https://doi.org/10.1186/s40468-023-00263-z","url":null,"abstract":"Abstract Many Chinese universities are implementing an educational reform to transform their general English courses into academic English courses. Accordingly, how to assess students’ English ability should also be reformed. In this test review, we introduce a school-based general academic English summative test developed by English instructors. An argument-based approach was adopted to analyze the test validity by obtaining students’ test data and their reflective responses to the test. This review can provide a practical reference for the development of a valid general academic English summative test by including practices of course instructors and voices of students, two important test stakeholders, in test design.","PeriodicalId":37050,"journal":{"name":"Language Testing in Asia","volume":"44 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135405800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-20DOI: 10.1186/s40468-023-00261-1
Soi Kei Ho, Zhengdong Gan
Abstract This comparative study investigated the associations between instructional practices and students’ reading performance among 10 top performing regions that participated in the Program for International Student Assessment (PISA) 2018. A nationally representative sample consisting of 80,016 15-year-old students from 5 Asian regions (B-S-J-Z [China], Singapore, Macao, Hong Kong, and Korea) and 5 Western regions (Estonia, Canada, Finland, Ireland, and Poland) were included. A secondary analysis of PISA survey and assessment data was conducted. T test and ANOVA analyses revealed systematic differences in instructional practices of the 10 regions. B-S-J-Z (China) had significantly higher levels of teacher support, teacher-directed instruction, and teacher stimulation than the other sample regions. Asian regions tended to have higher levels of teacher support, teacher-directed instruction, teacher feedback, adaptive instruction, and teacher enthusiasm compared with Western regions, although variations were also found within Asian regions or within Western regions. Hierarchical linear regression (HLR) analyses indicated that reading performance was positively predicted by teacher support, adaptive instruction, teacher stimulation, and teacher enthusiasm, but negatively predicted by teacher-directed instruction and teacher feedback. This study sheds light on the effective instructional practices for optimizing students’ reading performance across different cultural contexts.
{"title":"Instructional practices and students’ reading performance: a comparative study of 10 top performing regions in PISA 2018","authors":"Soi Kei Ho, Zhengdong Gan","doi":"10.1186/s40468-023-00261-1","DOIUrl":"https://doi.org/10.1186/s40468-023-00261-1","url":null,"abstract":"Abstract This comparative study investigated the associations between instructional practices and students’ reading performance among 10 top performing regions that participated in the Program for International Student Assessment (PISA) 2018. A nationally representative sample consisting of 80,016 15-year-old students from 5 Asian regions (B-S-J-Z [China], Singapore, Macao, Hong Kong, and Korea) and 5 Western regions (Estonia, Canada, Finland, Ireland, and Poland) were included. A secondary analysis of PISA survey and assessment data was conducted. T test and ANOVA analyses revealed systematic differences in instructional practices of the 10 regions. B-S-J-Z (China) had significantly higher levels of teacher support, teacher-directed instruction, and teacher stimulation than the other sample regions. Asian regions tended to have higher levels of teacher support, teacher-directed instruction, teacher feedback, adaptive instruction, and teacher enthusiasm compared with Western regions, although variations were also found within Asian regions or within Western regions. Hierarchical linear regression (HLR) analyses indicated that reading performance was positively predicted by teacher support, adaptive instruction, teacher stimulation, and teacher enthusiasm, but negatively predicted by teacher-directed instruction and teacher feedback. This study sheds light on the effective instructional practices for optimizing students’ reading performance across different cultural contexts.","PeriodicalId":37050,"journal":{"name":"Language Testing in Asia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135617037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-16DOI: 10.1186/s40468-023-00260-2
Karim Ibrahim
Abstract The release of ChatGPT marked the beginning of a new era of AI-assisted plagiarism that disrupts traditional assessment practices in ESL composition. In the face of this challenge, educators are left with little guidance in controlling AI-assisted plagiarism, especially when conventional methods fail to detect AI-generated texts. One approach to managing AI-assisted plagiarism is using fine-tuned AI classifiers, such as RoBERTa, to identify machine-generated texts; however, the reliability of this approach is yet to be established. To address the challenge of AI-assisted plagiarism in ESL contexts, the present cross-disciplinary descriptive study examined the potential of two RoBERTa-based classifiers to control AI-assisted plagiarism on a dataset of 240 human-written and ChatGPT-generated essays. Data analysis revealed that both platforms could identify AI-generated texts, but their detection accuracy was inconsistent across the dataset.
{"title":"Using AI-based detectors to control AI-assisted plagiarism in ESL writing: “The Terminator Versus the Machines”","authors":"Karim Ibrahim","doi":"10.1186/s40468-023-00260-2","DOIUrl":"https://doi.org/10.1186/s40468-023-00260-2","url":null,"abstract":"Abstract The release of ChatGPT marked the beginning of a new era of AI-assisted plagiarism that disrupts traditional assessment practices in ESL composition. In the face of this challenge, educators are left with little guidance in controlling AI-assisted plagiarism, especially when conventional methods fail to detect AI-generated texts. One approach to managing AI-assisted plagiarism is using fine-tuned AI classifiers, such as RoBERTa, to identify machine-generated texts; however, the reliability of this approach is yet to be established. To address the challenge of AI-assisted plagiarism in ESL contexts, the present cross-disciplinary descriptive study examined the potential of two RoBERTa-based classifiers to control AI-assisted plagiarism on a dataset of 240 human-written and ChatGPT-generated essays. Data analysis revealed that both platforms could identify AI-generated texts, but their detection accuracy was inconsistent across the dataset.","PeriodicalId":37050,"journal":{"name":"Language Testing in Asia","volume":"279 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136077859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}