Pub Date : 2024-02-12DOI: 10.1177/02655322231222600
Tia M. Fechter, Heeyeon Yoon
This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent mixed-methods study design. The study used the Yes/No ratings as the baseline method in two rounds of ratings, while differentiating the two methods by incorporating item maps and an Ordered Item Booklet, each of which is an integral tool of the Mapmark and the Bookmark methods. The results showed that the internal validity evidence is similar across both methods, especially after Round 2 ratings. When procedural validity evidence was considered, however, a preference emerged for the method where panelists conducted the initial ratings unbeknownst to the empirical item difficulty information, and then such information was provided on an item map as part of the Round 1 feedback. The findings highlight the importance of evaluating both internal and procedural validity evidence when considering standard-setting methods.
{"title":"Evaluating methodological enhancements to the Yes/No Angoff standard-setting method in language proficiency assessment","authors":"Tia M. Fechter, Heeyeon Yoon","doi":"10.1177/02655322231222600","DOIUrl":"https://doi.org/10.1177/02655322231222600","url":null,"abstract":"This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent mixed-methods study design. The study used the Yes/No ratings as the baseline method in two rounds of ratings, while differentiating the two methods by incorporating item maps and an Ordered Item Booklet, each of which is an integral tool of the Mapmark and the Bookmark methods. The results showed that the internal validity evidence is similar across both methods, especially after Round 2 ratings. When procedural validity evidence was considered, however, a preference emerged for the method where panelists conducted the initial ratings unbeknownst to the empirical item difficulty information, and then such information was provided on an item map as part of the Round 1 feedback. The findings highlight the importance of evaluating both internal and procedural validity evidence when considering standard-setting methods.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139784401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-12DOI: 10.1177/02655322231222600
Tia M. Fechter, Heeyeon Yoon
This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent mixed-methods study design. The study used the Yes/No ratings as the baseline method in two rounds of ratings, while differentiating the two methods by incorporating item maps and an Ordered Item Booklet, each of which is an integral tool of the Mapmark and the Bookmark methods. The results showed that the internal validity evidence is similar across both methods, especially after Round 2 ratings. When procedural validity evidence was considered, however, a preference emerged for the method where panelists conducted the initial ratings unbeknownst to the empirical item difficulty information, and then such information was provided on an item map as part of the Round 1 feedback. The findings highlight the importance of evaluating both internal and procedural validity evidence when considering standard-setting methods.
{"title":"Evaluating methodological enhancements to the Yes/No Angoff standard-setting method in language proficiency assessment","authors":"Tia M. Fechter, Heeyeon Yoon","doi":"10.1177/02655322231222600","DOIUrl":"https://doi.org/10.1177/02655322231222600","url":null,"abstract":"This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent mixed-methods study design. The study used the Yes/No ratings as the baseline method in two rounds of ratings, while differentiating the two methods by incorporating item maps and an Ordered Item Booklet, each of which is an integral tool of the Mapmark and the Bookmark methods. The results showed that the internal validity evidence is similar across both methods, especially after Round 2 ratings. When procedural validity evidence was considered, however, a preference emerged for the method where panelists conducted the initial ratings unbeknownst to the empirical item difficulty information, and then such information was provided on an item map as part of the Round 1 feedback. The findings highlight the importance of evaluating both internal and procedural validity evidence when considering standard-setting methods.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139844386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-07DOI: 10.1177/02655322231225426
Shangchao Min, Kyoungwon Bishop
This paper evaluates the multistage adaptive test (MST) design of a large-scale academic language assessment (ACCESS) for Grades 1–12, with an aim to simplify the current MST design, using both operational and simulated test data. Study 1 explored the operational population data (1,456,287 test-takers) of the listening and reading tests of MST ACCESS in the 2018–2019 school year to evaluate the MST design in terms of measurement efficiency and precision. Study 2 is a simulation study conducted to find an optimal MST design with manipulation on the number of items per stage and panel structure. The results from operational test data showed that the test length for both the listening and reading tests could be shortened to six folders (i.e., 18 items), with final ability estimates and reliability coefficients comparable to those of the current test, with slight differences. The simulation study showed that all six proposed MST designs yielded slightly better measurement accuracy and efficiency than the current design, among which the 1-3-3 MST design with more items at earlier stages ranked first. The findings of this study provide implications for the evaluation of MST designs and ways to optimize MST designs in language assessment.
{"title":"A shortened test is feasible: Evaluating a large-scale multistage adaptive English language assessment","authors":"Shangchao Min, Kyoungwon Bishop","doi":"10.1177/02655322231225426","DOIUrl":"https://doi.org/10.1177/02655322231225426","url":null,"abstract":"This paper evaluates the multistage adaptive test (MST) design of a large-scale academic language assessment (ACCESS) for Grades 1–12, with an aim to simplify the current MST design, using both operational and simulated test data. Study 1 explored the operational population data (1,456,287 test-takers) of the listening and reading tests of MST ACCESS in the 2018–2019 school year to evaluate the MST design in terms of measurement efficiency and precision. Study 2 is a simulation study conducted to find an optimal MST design with manipulation on the number of items per stage and panel structure. The results from operational test data showed that the test length for both the listening and reading tests could be shortened to six folders (i.e., 18 items), with final ability estimates and reliability coefficients comparable to those of the current test, with slight differences. The simulation study showed that all six proposed MST designs yielded slightly better measurement accuracy and efficiency than the current design, among which the 1-3-3 MST design with more items at earlier stages ranked first. The findings of this study provide implications for the evaluation of MST designs and ways to optimize MST designs in language assessment.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139857765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-07DOI: 10.1177/02655322231225426
Shangchao Min, Kyoungwon Bishop
This paper evaluates the multistage adaptive test (MST) design of a large-scale academic language assessment (ACCESS) for Grades 1–12, with an aim to simplify the current MST design, using both operational and simulated test data. Study 1 explored the operational population data (1,456,287 test-takers) of the listening and reading tests of MST ACCESS in the 2018–2019 school year to evaluate the MST design in terms of measurement efficiency and precision. Study 2 is a simulation study conducted to find an optimal MST design with manipulation on the number of items per stage and panel structure. The results from operational test data showed that the test length for both the listening and reading tests could be shortened to six folders (i.e., 18 items), with final ability estimates and reliability coefficients comparable to those of the current test, with slight differences. The simulation study showed that all six proposed MST designs yielded slightly better measurement accuracy and efficiency than the current design, among which the 1-3-3 MST design with more items at earlier stages ranked first. The findings of this study provide implications for the evaluation of MST designs and ways to optimize MST designs in language assessment.
{"title":"A shortened test is feasible: Evaluating a large-scale multistage adaptive English language assessment","authors":"Shangchao Min, Kyoungwon Bishop","doi":"10.1177/02655322231225426","DOIUrl":"https://doi.org/10.1177/02655322231225426","url":null,"abstract":"This paper evaluates the multistage adaptive test (MST) design of a large-scale academic language assessment (ACCESS) for Grades 1–12, with an aim to simplify the current MST design, using both operational and simulated test data. Study 1 explored the operational population data (1,456,287 test-takers) of the listening and reading tests of MST ACCESS in the 2018–2019 school year to evaluate the MST design in terms of measurement efficiency and precision. Study 2 is a simulation study conducted to find an optimal MST design with manipulation on the number of items per stage and panel structure. The results from operational test data showed that the test length for both the listening and reading tests could be shortened to six folders (i.e., 18 items), with final ability estimates and reliability coefficients comparable to those of the current test, with slight differences. The simulation study showed that all six proposed MST designs yielded slightly better measurement accuracy and efficiency than the current design, among which the 1-3-3 MST design with more items at earlier stages ranked first. The findings of this study provide implications for the evaluation of MST designs and ways to optimize MST designs in language assessment.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139797579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-06DOI: 10.1177/02655322231224051
Maria Treadaway, John Read
Standard-setting is an essential component of test development, supporting the meaningfulness and appropriate interpretation of test scores. However, in the high-stakes testing environment of aviation, standard-setting studies are underexplored. To address this gap, we document two stages in the standard-setting procedures for the Overseas Flight Training Preparation Test (OFTPT), a diagnostic English test for ab initio pilots, aligned to the International Civil Aviation Organization (ICAO)’s Language Proficiency Rating Scale (LPRS). Performance-level descriptors (PLDs) were empirically generated in Stage 1 in collaboration with six subject matter experts (SMEs). These PLDs made explicit the correspondence between linguistic performance levels within the target language use domain and the ICAO scale. Findings suggest that the ICAO scale is not fine-grained enough to distinguish levels of linguistic readiness among ab initio pilots, nor does it adequately reflect the knowledge, skills, and abilities valued by SMEs within this domain. In Stage 2, 12 SMEs were recruited to set standards and were divided into two groups to investigate the replicability of Ebel method standard-setting procedures. Cut scores were determined for the OFTPT reading and listening tests, which were inferentially linked to the LPRS. There were no significant differences in the cut scores arrived at by both groups and reliability was excellent, suggesting that test users can have confidence in the standards set.
{"title":"Setting standards for a diagnostic test of aviation English for student pilots","authors":"Maria Treadaway, John Read","doi":"10.1177/02655322231224051","DOIUrl":"https://doi.org/10.1177/02655322231224051","url":null,"abstract":"Standard-setting is an essential component of test development, supporting the meaningfulness and appropriate interpretation of test scores. However, in the high-stakes testing environment of aviation, standard-setting studies are underexplored. To address this gap, we document two stages in the standard-setting procedures for the Overseas Flight Training Preparation Test (OFTPT), a diagnostic English test for ab initio pilots, aligned to the International Civil Aviation Organization (ICAO)’s Language Proficiency Rating Scale (LPRS). Performance-level descriptors (PLDs) were empirically generated in Stage 1 in collaboration with six subject matter experts (SMEs). These PLDs made explicit the correspondence between linguistic performance levels within the target language use domain and the ICAO scale. Findings suggest that the ICAO scale is not fine-grained enough to distinguish levels of linguistic readiness among ab initio pilots, nor does it adequately reflect the knowledge, skills, and abilities valued by SMEs within this domain. In Stage 2, 12 SMEs were recruited to set standards and were divided into two groups to investigate the replicability of Ebel method standard-setting procedures. Cut scores were determined for the OFTPT reading and listening tests, which were inferentially linked to the LPRS. There were no significant differences in the cut scores arrived at by both groups and reliability was excellent, suggesting that test users can have confidence in the standards set.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139861289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-06DOI: 10.1177/02655322231222596
Haerim Hwang, Hyunwoo Kim
Given the lack of computational tools available for assessing second language (L2) production in Korean, this study introduces a novel automated tool called the Korean Syntactic Complexity Analyzer (KOSCA) for measuring syntactic complexity in L2 Korean production. As an open-source graphic user interface (GUI) developed in Python, KOSCA provides seven indices of syntactic complexity, including traditional and Korean-specific ones. Its validity was tested by investigating whether the syntactic complexity indices measured by it in L2 written and spoken production could explain the variability of L2 Korean learners’ proficiency. The results of mixed-effects regression analyses showed that all seven indices significantly accounted for learner proficiency in Korean. Subsequent stepwise multiple regression analyses revealed that the syntactic complexity indices explained 56.0% of the total variance in proficiency for the written data and 54.4% for the spoken data. These findings underscore the validity of the syntactic complexity indices measured by KOSCA as reliable indicators of L2 Korean proficiency, which can serve as a valuable resource for researchers and educators in the field of L2 Korean learning and assessment.
{"title":"Korean Syntactic Complexity Analyzer (KOSCA): An NLP application for the analysis of syntactic complexity in second language production","authors":"Haerim Hwang, Hyunwoo Kim","doi":"10.1177/02655322231222596","DOIUrl":"https://doi.org/10.1177/02655322231222596","url":null,"abstract":"Given the lack of computational tools available for assessing second language (L2) production in Korean, this study introduces a novel automated tool called the Korean Syntactic Complexity Analyzer (KOSCA) for measuring syntactic complexity in L2 Korean production. As an open-source graphic user interface (GUI) developed in Python, KOSCA provides seven indices of syntactic complexity, including traditional and Korean-specific ones. Its validity was tested by investigating whether the syntactic complexity indices measured by it in L2 written and spoken production could explain the variability of L2 Korean learners’ proficiency. The results of mixed-effects regression analyses showed that all seven indices significantly accounted for learner proficiency in Korean. Subsequent stepwise multiple regression analyses revealed that the syntactic complexity indices explained 56.0% of the total variance in proficiency for the written data and 54.4% for the spoken data. These findings underscore the validity of the syntactic complexity indices measured by KOSCA as reliable indicators of L2 Korean proficiency, which can serve as a valuable resource for researchers and educators in the field of L2 Korean learning and assessment.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139859258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-06DOI: 10.1177/02655322231222596
Haerim Hwang, Hyunwoo Kim
Given the lack of computational tools available for assessing second language (L2) production in Korean, this study introduces a novel automated tool called the Korean Syntactic Complexity Analyzer (KOSCA) for measuring syntactic complexity in L2 Korean production. As an open-source graphic user interface (GUI) developed in Python, KOSCA provides seven indices of syntactic complexity, including traditional and Korean-specific ones. Its validity was tested by investigating whether the syntactic complexity indices measured by it in L2 written and spoken production could explain the variability of L2 Korean learners’ proficiency. The results of mixed-effects regression analyses showed that all seven indices significantly accounted for learner proficiency in Korean. Subsequent stepwise multiple regression analyses revealed that the syntactic complexity indices explained 56.0% of the total variance in proficiency for the written data and 54.4% for the spoken data. These findings underscore the validity of the syntactic complexity indices measured by KOSCA as reliable indicators of L2 Korean proficiency, which can serve as a valuable resource for researchers and educators in the field of L2 Korean learning and assessment.
{"title":"Korean Syntactic Complexity Analyzer (KOSCA): An NLP application for the analysis of syntactic complexity in second language production","authors":"Haerim Hwang, Hyunwoo Kim","doi":"10.1177/02655322231222596","DOIUrl":"https://doi.org/10.1177/02655322231222596","url":null,"abstract":"Given the lack of computational tools available for assessing second language (L2) production in Korean, this study introduces a novel automated tool called the Korean Syntactic Complexity Analyzer (KOSCA) for measuring syntactic complexity in L2 Korean production. As an open-source graphic user interface (GUI) developed in Python, KOSCA provides seven indices of syntactic complexity, including traditional and Korean-specific ones. Its validity was tested by investigating whether the syntactic complexity indices measured by it in L2 written and spoken production could explain the variability of L2 Korean learners’ proficiency. The results of mixed-effects regression analyses showed that all seven indices significantly accounted for learner proficiency in Korean. Subsequent stepwise multiple regression analyses revealed that the syntactic complexity indices explained 56.0% of the total variance in proficiency for the written data and 54.4% for the spoken data. These findings underscore the validity of the syntactic complexity indices measured by KOSCA as reliable indicators of L2 Korean proficiency, which can serve as a valuable resource for researchers and educators in the field of L2 Korean learning and assessment.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139799271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-06DOI: 10.1177/02655322231224051
Maria Treadaway, John Read
Standard-setting is an essential component of test development, supporting the meaningfulness and appropriate interpretation of test scores. However, in the high-stakes testing environment of aviation, standard-setting studies are underexplored. To address this gap, we document two stages in the standard-setting procedures for the Overseas Flight Training Preparation Test (OFTPT), a diagnostic English test for ab initio pilots, aligned to the International Civil Aviation Organization (ICAO)’s Language Proficiency Rating Scale (LPRS). Performance-level descriptors (PLDs) were empirically generated in Stage 1 in collaboration with six subject matter experts (SMEs). These PLDs made explicit the correspondence between linguistic performance levels within the target language use domain and the ICAO scale. Findings suggest that the ICAO scale is not fine-grained enough to distinguish levels of linguistic readiness among ab initio pilots, nor does it adequately reflect the knowledge, skills, and abilities valued by SMEs within this domain. In Stage 2, 12 SMEs were recruited to set standards and were divided into two groups to investigate the replicability of Ebel method standard-setting procedures. Cut scores were determined for the OFTPT reading and listening tests, which were inferentially linked to the LPRS. There were no significant differences in the cut scores arrived at by both groups and reliability was excellent, suggesting that test users can have confidence in the standards set.
{"title":"Setting standards for a diagnostic test of aviation English for student pilots","authors":"Maria Treadaway, John Read","doi":"10.1177/02655322231224051","DOIUrl":"https://doi.org/10.1177/02655322231224051","url":null,"abstract":"Standard-setting is an essential component of test development, supporting the meaningfulness and appropriate interpretation of test scores. However, in the high-stakes testing environment of aviation, standard-setting studies are underexplored. To address this gap, we document two stages in the standard-setting procedures for the Overseas Flight Training Preparation Test (OFTPT), a diagnostic English test for ab initio pilots, aligned to the International Civil Aviation Organization (ICAO)’s Language Proficiency Rating Scale (LPRS). Performance-level descriptors (PLDs) were empirically generated in Stage 1 in collaboration with six subject matter experts (SMEs). These PLDs made explicit the correspondence between linguistic performance levels within the target language use domain and the ICAO scale. Findings suggest that the ICAO scale is not fine-grained enough to distinguish levels of linguistic readiness among ab initio pilots, nor does it adequately reflect the knowledge, skills, and abilities valued by SMEs within this domain. In Stage 2, 12 SMEs were recruited to set standards and were divided into two groups to investigate the replicability of Ebel method standard-setting procedures. Cut scores were determined for the OFTPT reading and listening tests, which were inferentially linked to the LPRS. There were no significant differences in the cut scores arrived at by both groups and reliability was excellent, suggesting that test users can have confidence in the standards set.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139801558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to address the needs of the continually growing number of Chinese language learners, the present study developed and presented initial validation of a 100-item Chinese vocabulary proficiency test (CVPT) for learners of Chinese as a second/foreign language (CS/FL) using Item Response Theory among 170 CS/FL learners from Indonesia and 354 CS/FL learners from Thailand. Participants were required to translate or explain the meanings of the Chinese words using Indonesian or Thai. The results provided preliminary evidence for the construct validity of the CVPT for measuring CS/FL learners’ receptive Chinese vocabulary knowledge in terms of content, substantive, structural, generalizability, and external aspects. The translation-based CVPT was an attempt to measure CS/FL learners’ vocabulary proficiency by exploring their performance in a vocabulary translation task, potentially revealing test-takers’ high-degree vocabulary knowledge. Such a CVPT could be useful for Chinese vocabulary instruction and designing future Chinese vocabulary measurement tools.
{"title":"The development of a Chinese vocabulary proficiency test (CVPT) for learners of Chinese as a second/foreign language","authors":"Haiwei Zhang, Peng Sun, Yaowaluk Bianglae, Winda Widiawati","doi":"10.1177/02655322231219998","DOIUrl":"https://doi.org/10.1177/02655322231219998","url":null,"abstract":"In order to address the needs of the continually growing number of Chinese language learners, the present study developed and presented initial validation of a 100-item Chinese vocabulary proficiency test (CVPT) for learners of Chinese as a second/foreign language (CS/FL) using Item Response Theory among 170 CS/FL learners from Indonesia and 354 CS/FL learners from Thailand. Participants were required to translate or explain the meanings of the Chinese words using Indonesian or Thai. The results provided preliminary evidence for the construct validity of the CVPT for measuring CS/FL learners’ receptive Chinese vocabulary knowledge in terms of content, substantive, structural, generalizability, and external aspects. The translation-based CVPT was an attempt to measure CS/FL learners’ vocabulary proficiency by exploring their performance in a vocabulary translation task, potentially revealing test-takers’ high-degree vocabulary knowledge. Such a CVPT could be useful for Chinese vocabulary instruction and designing future Chinese vocabulary measurement tools.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139441444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}