首页 > 最新文献

Language Testing最新文献

英文 中文
Evaluating methodological enhancements to the Yes/No Angoff standard-setting method in language proficiency assessment 评估语言能力评估中 "是/否 "安格夫标准设定法的方法改进情况
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2024-02-12 DOI: 10.1177/02655322231222600
Tia M. Fechter, Heeyeon Yoon
This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent mixed-methods study design. The study used the Yes/No ratings as the baseline method in two rounds of ratings, while differentiating the two methods by incorporating item maps and an Ordered Item Booklet, each of which is an integral tool of the Mapmark and the Bookmark methods. The results showed that the internal validity evidence is similar across both methods, especially after Round 2 ratings. When procedural validity evidence was considered, however, a preference emerged for the method where panelists conducted the initial ratings unbeknownst to the empirical item difficulty information, and then such information was provided on an item map as part of the Round 1 feedback. The findings highlight the importance of evaluating both internal and procedural validity evidence when considering standard-setting methods.
本研究针对美国政府的一项高风险语言能力测试,在一项操作标准制定研究中对两种建议方法的有效性进行了评估。研究的目的是对现有的 "是/否 "安格夫评分法进行低成本的修改,以提高推荐切分分数的有效性和可靠性。这项研究在两轮评级中使用是/否评级作为基准方法,同时通过纳入项目地图和有序项目手册来区分这两种方法,每种方法都是地图标记和书签方法不可或缺的工具。结果表明,两种方法的内部效度证据相似,尤其是在第二轮评级之后。然而,在考虑程序效度证据时,发现小组成员更倾向于在不了解实证项目难度信息的情况下进行初始评级,然后在项目图上提供此类信息作为第一轮反馈的一部分的方法。研究结果凸显了在考虑标准制定方法时评估内部有效性和程序有效性证据的重要性。
{"title":"Evaluating methodological enhancements to the Yes/No Angoff standard-setting method in language proficiency assessment","authors":"Tia M. Fechter, Heeyeon Yoon","doi":"10.1177/02655322231222600","DOIUrl":"https://doi.org/10.1177/02655322231222600","url":null,"abstract":"This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent mixed-methods study design. The study used the Yes/No ratings as the baseline method in two rounds of ratings, while differentiating the two methods by incorporating item maps and an Ordered Item Booklet, each of which is an integral tool of the Mapmark and the Bookmark methods. The results showed that the internal validity evidence is similar across both methods, especially after Round 2 ratings. When procedural validity evidence was considered, however, a preference emerged for the method where panelists conducted the initial ratings unbeknownst to the empirical item difficulty information, and then such information was provided on an item map as part of the Round 1 feedback. The findings highlight the importance of evaluating both internal and procedural validity evidence when considering standard-setting methods.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139844386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A shortened test is feasible: Evaluating a large-scale multistage adaptive English language assessment 缩短测试时间是可行的:评估大规模多阶段适应性英语语言评估
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2024-02-07 DOI: 10.1177/02655322231225426
Shangchao Min, Kyoungwon Bishop
This paper evaluates the multistage adaptive test (MST) design of a large-scale academic language assessment (ACCESS) for Grades 1–12, with an aim to simplify the current MST design, using both operational and simulated test data. Study 1 explored the operational population data (1,456,287 test-takers) of the listening and reading tests of MST ACCESS in the 2018–2019 school year to evaluate the MST design in terms of measurement efficiency and precision. Study 2 is a simulation study conducted to find an optimal MST design with manipulation on the number of items per stage and panel structure. The results from operational test data showed that the test length for both the listening and reading tests could be shortened to six folders (i.e., 18 items), with final ability estimates and reliability coefficients comparable to those of the current test, with slight differences. The simulation study showed that all six proposed MST designs yielded slightly better measurement accuracy and efficiency than the current design, among which the 1-3-3 MST design with more items at earlier stages ranked first. The findings of this study provide implications for the evaluation of MST designs and ways to optimize MST designs in language assessment.
本文使用操作数据和模拟测试数据,评估了 1-12 年级大规模学术语言评估(ACCESS)的多阶段自适应测试(MST)设计,旨在简化当前的 MST 设计。研究 1 探索了 2018-2019 学年 MST ACCESS 听力和阅读测试的操作人群数据(1,456,287 名测试者),以评估 MST 设计的测量效率和精度。研究 2 是一项模拟研究,旨在通过对每个阶段的项目数量和面板结构进行操作,找到最佳的 MST 设计。操作测试数据的结果表明,听力和阅读测试的长度都可以缩短到 6 个文件夹(即 18 个项目),最终的能力估计值和信度系数与现行测试相当,但略有差异。模拟研究表明,所有六种建议的 MST 设计的测量准确度和效率都略高于现行设计,其中在早期阶段增加项目的 1-3-3 MST 设计排名第一。本研究的结果为评估 MST 设计和优化语言测评中的 MST 设计提供了启示。
{"title":"A shortened test is feasible: Evaluating a large-scale multistage adaptive English language assessment","authors":"Shangchao Min, Kyoungwon Bishop","doi":"10.1177/02655322231225426","DOIUrl":"https://doi.org/10.1177/02655322231225426","url":null,"abstract":"This paper evaluates the multistage adaptive test (MST) design of a large-scale academic language assessment (ACCESS) for Grades 1–12, with an aim to simplify the current MST design, using both operational and simulated test data. Study 1 explored the operational population data (1,456,287 test-takers) of the listening and reading tests of MST ACCESS in the 2018–2019 school year to evaluate the MST design in terms of measurement efficiency and precision. Study 2 is a simulation study conducted to find an optimal MST design with manipulation on the number of items per stage and panel structure. The results from operational test data showed that the test length for both the listening and reading tests could be shortened to six folders (i.e., 18 items), with final ability estimates and reliability coefficients comparable to those of the current test, with slight differences. The simulation study showed that all six proposed MST designs yielded slightly better measurement accuracy and efficiency than the current design, among which the 1-3-3 MST design with more items at earlier stages ranked first. The findings of this study provide implications for the evaluation of MST designs and ways to optimize MST designs in language assessment.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139857765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A shortened test is feasible: Evaluating a large-scale multistage adaptive English language assessment 缩短测试时间是可行的:评估大规模多阶段适应性英语语言评估
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2024-02-07 DOI: 10.1177/02655322231225426
Shangchao Min, Kyoungwon Bishop
This paper evaluates the multistage adaptive test (MST) design of a large-scale academic language assessment (ACCESS) for Grades 1–12, with an aim to simplify the current MST design, using both operational and simulated test data. Study 1 explored the operational population data (1,456,287 test-takers) of the listening and reading tests of MST ACCESS in the 2018–2019 school year to evaluate the MST design in terms of measurement efficiency and precision. Study 2 is a simulation study conducted to find an optimal MST design with manipulation on the number of items per stage and panel structure. The results from operational test data showed that the test length for both the listening and reading tests could be shortened to six folders (i.e., 18 items), with final ability estimates and reliability coefficients comparable to those of the current test, with slight differences. The simulation study showed that all six proposed MST designs yielded slightly better measurement accuracy and efficiency than the current design, among which the 1-3-3 MST design with more items at earlier stages ranked first. The findings of this study provide implications for the evaluation of MST designs and ways to optimize MST designs in language assessment.
本文使用操作数据和模拟测试数据,评估了 1-12 年级大规模学术语言评估(ACCESS)的多阶段自适应测试(MST)设计,旨在简化当前的 MST 设计。研究 1 探索了 2018-2019 学年 MST ACCESS 听力和阅读测试的操作人群数据(1,456,287 名测试者),以评估 MST 设计的测量效率和精度。研究 2 是一项模拟研究,旨在通过对每个阶段的项目数量和面板结构进行操作,找到最佳的 MST 设计。操作测试数据的结果表明,听力和阅读测试的长度都可以缩短到 6 个文件夹(即 18 个项目),最终的能力估计值和信度系数与现行测试相当,但略有差异。模拟研究表明,所有六种建议的 MST 设计的测量准确度和效率都略高于现行设计,其中在早期阶段增加项目的 1-3-3 MST 设计排名第一。本研究的结果为评估 MST 设计和优化语言测评中的 MST 设计提供了启示。
{"title":"A shortened test is feasible: Evaluating a large-scale multistage adaptive English language assessment","authors":"Shangchao Min, Kyoungwon Bishop","doi":"10.1177/02655322231225426","DOIUrl":"https://doi.org/10.1177/02655322231225426","url":null,"abstract":"This paper evaluates the multistage adaptive test (MST) design of a large-scale academic language assessment (ACCESS) for Grades 1–12, with an aim to simplify the current MST design, using both operational and simulated test data. Study 1 explored the operational population data (1,456,287 test-takers) of the listening and reading tests of MST ACCESS in the 2018–2019 school year to evaluate the MST design in terms of measurement efficiency and precision. Study 2 is a simulation study conducted to find an optimal MST design with manipulation on the number of items per stage and panel structure. The results from operational test data showed that the test length for both the listening and reading tests could be shortened to six folders (i.e., 18 items), with final ability estimates and reliability coefficients comparable to those of the current test, with slight differences. The simulation study showed that all six proposed MST designs yielded slightly better measurement accuracy and efficiency than the current design, among which the 1-3-3 MST design with more items at earlier stages ranked first. The findings of this study provide implications for the evaluation of MST designs and ways to optimize MST designs in language assessment.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139797579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Setting standards for a diagnostic test of aviation English for student pilots 为学生飞行员航空英语诊断测试制定标准
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2024-02-06 DOI: 10.1177/02655322231224051
Maria Treadaway, John Read
Standard-setting is an essential component of test development, supporting the meaningfulness and appropriate interpretation of test scores. However, in the high-stakes testing environment of aviation, standard-setting studies are underexplored. To address this gap, we document two stages in the standard-setting procedures for the Overseas Flight Training Preparation Test (OFTPT), a diagnostic English test for ab initio pilots, aligned to the International Civil Aviation Organization (ICAO)’s Language Proficiency Rating Scale (LPRS). Performance-level descriptors (PLDs) were empirically generated in Stage 1 in collaboration with six subject matter experts (SMEs). These PLDs made explicit the correspondence between linguistic performance levels within the target language use domain and the ICAO scale. Findings suggest that the ICAO scale is not fine-grained enough to distinguish levels of linguistic readiness among ab initio pilots, nor does it adequately reflect the knowledge, skills, and abilities valued by SMEs within this domain. In Stage 2, 12 SMEs were recruited to set standards and were divided into two groups to investigate the replicability of Ebel method standard-setting procedures. Cut scores were determined for the OFTPT reading and listening tests, which were inferentially linked to the LPRS. There were no significant differences in the cut scores arrived at by both groups and reliability was excellent, suggesting that test users can have confidence in the standards set.
标准设定是测试开发的重要组成部分,有助于测试分数的意义和适当解释。然而,在高风险的航空测试环境中,对标准制定的研究却很少。为了弥补这一不足,我们记录了海外飞行培训准备测试(OFTPT)标准制定程序的两个阶段,OFTPT 是针对初学飞行员的诊断性英语测试,与国际民用航空组织(ICAO)的语言能力等级量表(LPRS)相一致。在第一阶段,与六位主题专家(SMEs)合作,根据经验生成了性能等级描述符(PLDs)。这些 PLD 明确了目标语言使用领域内的语言能力水平与 ICAO 量表之间的对应关系。研究结果表明,ICAO量表不够精细,无法区分初学飞行员的语言准备水平,也不能充分反映中小型企业在这一领域所重视的知识、技能和能力。在第二阶段,招募了 12 家中小型企业来制定标准,并将其分为两组,以调查埃贝尔方法标准制定程序的可复制性。确定了 OFTPT 阅读和听力测试的切分分数,并将其与 LPRS 进行了推断。两组得出的切分分数没有明显差异,可靠性极佳,表明测试用户对所制定的标准有信心。
{"title":"Setting standards for a diagnostic test of aviation English for student pilots","authors":"Maria Treadaway, John Read","doi":"10.1177/02655322231224051","DOIUrl":"https://doi.org/10.1177/02655322231224051","url":null,"abstract":"Standard-setting is an essential component of test development, supporting the meaningfulness and appropriate interpretation of test scores. However, in the high-stakes testing environment of aviation, standard-setting studies are underexplored. To address this gap, we document two stages in the standard-setting procedures for the Overseas Flight Training Preparation Test (OFTPT), a diagnostic English test for ab initio pilots, aligned to the International Civil Aviation Organization (ICAO)’s Language Proficiency Rating Scale (LPRS). Performance-level descriptors (PLDs) were empirically generated in Stage 1 in collaboration with six subject matter experts (SMEs). These PLDs made explicit the correspondence between linguistic performance levels within the target language use domain and the ICAO scale. Findings suggest that the ICAO scale is not fine-grained enough to distinguish levels of linguistic readiness among ab initio pilots, nor does it adequately reflect the knowledge, skills, and abilities valued by SMEs within this domain. In Stage 2, 12 SMEs were recruited to set standards and were divided into two groups to investigate the replicability of Ebel method standard-setting procedures. Cut scores were determined for the OFTPT reading and listening tests, which were inferentially linked to the LPRS. There were no significant differences in the cut scores arrived at by both groups and reliability was excellent, suggesting that test users can have confidence in the standards set.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139861289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Korean Syntactic Complexity Analyzer (KOSCA): An NLP application for the analysis of syntactic complexity in second language production 韩国语句法复杂性分析器(KOSCA):用于分析第二语言句法复杂性的 NLP 应用程序
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2024-02-06 DOI: 10.1177/02655322231222596
Haerim Hwang, Hyunwoo Kim
Given the lack of computational tools available for assessing second language (L2) production in Korean, this study introduces a novel automated tool called the Korean Syntactic Complexity Analyzer (KOSCA) for measuring syntactic complexity in L2 Korean production. As an open-source graphic user interface (GUI) developed in Python, KOSCA provides seven indices of syntactic complexity, including traditional and Korean-specific ones. Its validity was tested by investigating whether the syntactic complexity indices measured by it in L2 written and spoken production could explain the variability of L2 Korean learners’ proficiency. The results of mixed-effects regression analyses showed that all seven indices significantly accounted for learner proficiency in Korean. Subsequent stepwise multiple regression analyses revealed that the syntactic complexity indices explained 56.0% of the total variance in proficiency for the written data and 54.4% for the spoken data. These findings underscore the validity of the syntactic complexity indices measured by KOSCA as reliable indicators of L2 Korean proficiency, which can serve as a valuable resource for researchers and educators in the field of L2 Korean learning and assessment.
鉴于目前缺乏用于评估韩语第二语言(L2)生产的计算工具,本研究引入了一种名为 "韩语句法复杂性分析器"(KOSCA)的新型自动工具,用于测量韩语第二语言生产中的句法复杂性。作为一个用 Python 开发的开源图形用户界面(GUI),KOSCA 提供了七种句法复杂性指数,包括传统的和韩语特有的指数。通过研究 KOSCA 所测量的句法复杂度指数在第二语言书面和口语生产中能否解释第二语言韩国语学习者能力的变异性,检验了 KOSCA 的有效性。混合效应回归分析的结果表明,所有七个指数都能显著地解释学习者的韩语水平。随后的逐步多元回归分析表明,句法复杂度指数解释了书面数据中56.0%和口语数据中54.4%的能力总变异。这些研究结果表明,KOSCA测量的句法复杂性指数是衡量韩语第二语言能力的可靠指标,可以为韩语第二语言学习和评估领域的研究人员和教育工作者提供宝贵的资源。
{"title":"Korean Syntactic Complexity Analyzer (KOSCA): An NLP application for the analysis of syntactic complexity in second language production","authors":"Haerim Hwang, Hyunwoo Kim","doi":"10.1177/02655322231222596","DOIUrl":"https://doi.org/10.1177/02655322231222596","url":null,"abstract":"Given the lack of computational tools available for assessing second language (L2) production in Korean, this study introduces a novel automated tool called the Korean Syntactic Complexity Analyzer (KOSCA) for measuring syntactic complexity in L2 Korean production. As an open-source graphic user interface (GUI) developed in Python, KOSCA provides seven indices of syntactic complexity, including traditional and Korean-specific ones. Its validity was tested by investigating whether the syntactic complexity indices measured by it in L2 written and spoken production could explain the variability of L2 Korean learners’ proficiency. The results of mixed-effects regression analyses showed that all seven indices significantly accounted for learner proficiency in Korean. Subsequent stepwise multiple regression analyses revealed that the syntactic complexity indices explained 56.0% of the total variance in proficiency for the written data and 54.4% for the spoken data. These findings underscore the validity of the syntactic complexity indices measured by KOSCA as reliable indicators of L2 Korean proficiency, which can serve as a valuable resource for researchers and educators in the field of L2 Korean learning and assessment.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139859258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Korean Syntactic Complexity Analyzer (KOSCA): An NLP application for the analysis of syntactic complexity in second language production 韩国语句法复杂性分析器(KOSCA):用于分析第二语言句法复杂性的 NLP 应用程序
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2024-02-06 DOI: 10.1177/02655322231222596
Haerim Hwang, Hyunwoo Kim
Given the lack of computational tools available for assessing second language (L2) production in Korean, this study introduces a novel automated tool called the Korean Syntactic Complexity Analyzer (KOSCA) for measuring syntactic complexity in L2 Korean production. As an open-source graphic user interface (GUI) developed in Python, KOSCA provides seven indices of syntactic complexity, including traditional and Korean-specific ones. Its validity was tested by investigating whether the syntactic complexity indices measured by it in L2 written and spoken production could explain the variability of L2 Korean learners’ proficiency. The results of mixed-effects regression analyses showed that all seven indices significantly accounted for learner proficiency in Korean. Subsequent stepwise multiple regression analyses revealed that the syntactic complexity indices explained 56.0% of the total variance in proficiency for the written data and 54.4% for the spoken data. These findings underscore the validity of the syntactic complexity indices measured by KOSCA as reliable indicators of L2 Korean proficiency, which can serve as a valuable resource for researchers and educators in the field of L2 Korean learning and assessment.
鉴于目前缺乏用于评估韩语第二语言(L2)生产的计算工具,本研究引入了一种名为 "韩语句法复杂性分析器"(KOSCA)的新型自动工具,用于测量韩语第二语言生产中的句法复杂性。作为一个用 Python 开发的开源图形用户界面(GUI),KOSCA 提供了七种句法复杂性指数,包括传统的和韩语特有的指数。通过研究 KOSCA 所测量的句法复杂度指数在第二语言书面和口语生产中能否解释第二语言韩国语学习者能力的变异性,检验了 KOSCA 的有效性。混合效应回归分析的结果表明,所有七个指数都能显著地解释学习者的韩语水平。随后的逐步多元回归分析表明,句法复杂度指数解释了书面数据中56.0%和口语数据中54.4%的能力总变异。这些研究结果表明,KOSCA测量的句法复杂性指数是衡量韩语第二语言能力的可靠指标,可以为韩语第二语言学习和评估领域的研究人员和教育工作者提供宝贵的资源。
{"title":"Korean Syntactic Complexity Analyzer (KOSCA): An NLP application for the analysis of syntactic complexity in second language production","authors":"Haerim Hwang, Hyunwoo Kim","doi":"10.1177/02655322231222596","DOIUrl":"https://doi.org/10.1177/02655322231222596","url":null,"abstract":"Given the lack of computational tools available for assessing second language (L2) production in Korean, this study introduces a novel automated tool called the Korean Syntactic Complexity Analyzer (KOSCA) for measuring syntactic complexity in L2 Korean production. As an open-source graphic user interface (GUI) developed in Python, KOSCA provides seven indices of syntactic complexity, including traditional and Korean-specific ones. Its validity was tested by investigating whether the syntactic complexity indices measured by it in L2 written and spoken production could explain the variability of L2 Korean learners’ proficiency. The results of mixed-effects regression analyses showed that all seven indices significantly accounted for learner proficiency in Korean. Subsequent stepwise multiple regression analyses revealed that the syntactic complexity indices explained 56.0% of the total variance in proficiency for the written data and 54.4% for the spoken data. These findings underscore the validity of the syntactic complexity indices measured by KOSCA as reliable indicators of L2 Korean proficiency, which can serve as a valuable resource for researchers and educators in the field of L2 Korean learning and assessment.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139799271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Setting standards for a diagnostic test of aviation English for student pilots 为学生飞行员航空英语诊断测试制定标准
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2024-02-06 DOI: 10.1177/02655322231224051
Maria Treadaway, John Read
Standard-setting is an essential component of test development, supporting the meaningfulness and appropriate interpretation of test scores. However, in the high-stakes testing environment of aviation, standard-setting studies are underexplored. To address this gap, we document two stages in the standard-setting procedures for the Overseas Flight Training Preparation Test (OFTPT), a diagnostic English test for ab initio pilots, aligned to the International Civil Aviation Organization (ICAO)’s Language Proficiency Rating Scale (LPRS). Performance-level descriptors (PLDs) were empirically generated in Stage 1 in collaboration with six subject matter experts (SMEs). These PLDs made explicit the correspondence between linguistic performance levels within the target language use domain and the ICAO scale. Findings suggest that the ICAO scale is not fine-grained enough to distinguish levels of linguistic readiness among ab initio pilots, nor does it adequately reflect the knowledge, skills, and abilities valued by SMEs within this domain. In Stage 2, 12 SMEs were recruited to set standards and were divided into two groups to investigate the replicability of Ebel method standard-setting procedures. Cut scores were determined for the OFTPT reading and listening tests, which were inferentially linked to the LPRS. There were no significant differences in the cut scores arrived at by both groups and reliability was excellent, suggesting that test users can have confidence in the standards set.
标准设定是测试开发的重要组成部分,有助于测试分数的意义和适当解释。然而,在高风险的航空测试环境中,对标准制定的研究却很少。为了弥补这一不足,我们记录了海外飞行培训准备测试(OFTPT)标准制定程序的两个阶段,OFTPT 是针对初学飞行员的诊断性英语测试,与国际民用航空组织(ICAO)的语言能力等级量表(LPRS)相一致。在第一阶段,与六位主题专家(SMEs)合作,根据经验生成了性能等级描述符(PLDs)。这些 PLD 明确了目标语言使用领域内的语言能力水平与 ICAO 量表之间的对应关系。研究结果表明,ICAO量表不够精细,无法区分初学飞行员的语言准备水平,也不能充分反映中小型企业在这一领域所重视的知识、技能和能力。在第二阶段,招募了 12 家中小型企业来制定标准,并将其分为两组,以调查埃贝尔方法标准制定程序的可复制性。确定了 OFTPT 阅读和听力测试的切分分数,并将其与 LPRS 进行了推断。两组得出的切分分数没有明显差异,可靠性极佳,表明测试用户对所制定的标准有信心。
{"title":"Setting standards for a diagnostic test of aviation English for student pilots","authors":"Maria Treadaway, John Read","doi":"10.1177/02655322231224051","DOIUrl":"https://doi.org/10.1177/02655322231224051","url":null,"abstract":"Standard-setting is an essential component of test development, supporting the meaningfulness and appropriate interpretation of test scores. However, in the high-stakes testing environment of aviation, standard-setting studies are underexplored. To address this gap, we document two stages in the standard-setting procedures for the Overseas Flight Training Preparation Test (OFTPT), a diagnostic English test for ab initio pilots, aligned to the International Civil Aviation Organization (ICAO)’s Language Proficiency Rating Scale (LPRS). Performance-level descriptors (PLDs) were empirically generated in Stage 1 in collaboration with six subject matter experts (SMEs). These PLDs made explicit the correspondence between linguistic performance levels within the target language use domain and the ICAO scale. Findings suggest that the ICAO scale is not fine-grained enough to distinguish levels of linguistic readiness among ab initio pilots, nor does it adequately reflect the knowledge, skills, and abilities valued by SMEs within this domain. In Stage 2, 12 SMEs were recruited to set standards and were divided into two groups to investigate the replicability of Ebel method standard-setting procedures. Cut scores were determined for the OFTPT reading and listening tests, which were inferentially linked to the LPRS. There were no significant differences in the cut scores arrived at by both groups and reliability was excellent, suggesting that test users can have confidence in the standards set.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139801558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The development of a Chinese vocabulary proficiency test (CVPT) for learners of Chinese as a second/foreign language 为汉语作为第二语言/外语学习者开发汉语词汇能力测试(CVPT)
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2024-01-10 DOI: 10.1177/02655322231219998
Haiwei Zhang, Peng Sun, Yaowaluk Bianglae, Winda Widiawati
In order to address the needs of the continually growing number of Chinese language learners, the present study developed and presented initial validation of a 100-item Chinese vocabulary proficiency test (CVPT) for learners of Chinese as a second/foreign language (CS/FL) using Item Response Theory among 170 CS/FL learners from Indonesia and 354 CS/FL learners from Thailand. Participants were required to translate or explain the meanings of the Chinese words using Indonesian or Thai. The results provided preliminary evidence for the construct validity of the CVPT for measuring CS/FL learners’ receptive Chinese vocabulary knowledge in terms of content, substantive, structural, generalizability, and external aspects. The translation-based CVPT was an attempt to measure CS/FL learners’ vocabulary proficiency by exploring their performance in a vocabulary translation task, potentially revealing test-takers’ high-degree vocabulary knowledge. Such a CVPT could be useful for Chinese vocabulary instruction and designing future Chinese vocabulary measurement tools.
为了满足不断增长的汉语学习者的需要,本研究采用项目反应理论,在印度尼西亚的 170 名汉语作为第二语言/外语(CS/FL)学习者和泰国的 354 名汉语作为第二语言/外语(CS/FL)学习者中,开发并初步验证了针对汉语作为第二语言/外语(CS/FL)学习者的 100 个项目的汉语词汇能力测试(CVPT)。被试需要用印尼语或泰语翻译或解释中文词义。研究结果从内容、实质、结构、泛化和外部等方面初步证明了CVPT在测量CS/FL学习者接受性汉语词汇知识方面的建构效度。基于翻译的 CVPT 是通过探究 CS/FL 学习者在词汇翻译任务中的表现来测量其词汇水平的一种尝试,有可能揭示应试者的高度词汇知识。这种 CVPT 对中文词汇教学和未来中文词汇测量工具的设计可能会有所帮助。
{"title":"The development of a Chinese vocabulary proficiency test (CVPT) for learners of Chinese as a second/foreign language","authors":"Haiwei Zhang, Peng Sun, Yaowaluk Bianglae, Winda Widiawati","doi":"10.1177/02655322231219998","DOIUrl":"https://doi.org/10.1177/02655322231219998","url":null,"abstract":"In order to address the needs of the continually growing number of Chinese language learners, the present study developed and presented initial validation of a 100-item Chinese vocabulary proficiency test (CVPT) for learners of Chinese as a second/foreign language (CS/FL) using Item Response Theory among 170 CS/FL learners from Indonesia and 354 CS/FL learners from Thailand. Participants were required to translate or explain the meanings of the Chinese words using Indonesian or Thai. The results provided preliminary evidence for the construct validity of the CVPT for measuring CS/FL learners’ receptive Chinese vocabulary knowledge in terms of content, substantive, structural, generalizability, and external aspects. The translation-based CVPT was an attempt to measure CS/FL learners’ vocabulary proficiency by exploring their performance in a vocabulary translation task, potentially revealing test-takers’ high-degree vocabulary knowledge. Such a CVPT could be useful for Chinese vocabulary instruction and designing future Chinese vocabulary measurement tools.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139441444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open Science should be welcomed by test providers but grounded in pragmatic caution: A response to Winke 开放科学应受到测试提供者的欢迎,但也要以务实谨慎为基础:对 Winke 的回应
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2024-01-03 DOI: 10.1177/02655322231223105
Tony Clark, Emma Bruce
This article is temporarily under embargo.
本文暂时禁止发表。
{"title":"Open Science should be welcomed by test providers but grounded in pragmatic caution: A response to Winke","authors":"Tony Clark, Emma Bruce","doi":"10.1177/02655322231223105","DOIUrl":"https://doi.org/10.1177/02655322231223105","url":null,"abstract":"This article is temporarily under embargo.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139387670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Purposeful turns for more equitable and transparent publishing in language testing and assessment 在语言测试和评估中进行更公平、更透明的出版目的性转变
IF 4.1 1区 文学 Q1 Arts and Humanities Pub Date : 2024-01-01 DOI: 10.1177/02655322231203234
Talia Isaacs, Paula M. Winke
This Editorial comes at a time when the after-effects of the acute phase of the COVID-19 pandemic are still being felt but when, in most countries around the world, there has been some easing of restrictions and a return to (quasi-)normalcy. In the language testing and assessment community, many colleagues relished the opportunity to meet and participate in events at the 44th annual Language Testing Research Colloquium (LTRC) in New York in July 2023. This was after 4 years of LTRC exclusively being held online due to public health concerns, restrictions on movement, and other policy-related and logistical matters. In the context of this Editorial, which comes out annually, we find it liberating to be able to focus on matters that are non-pandemic related. In terms of the day-to-day business of managing the journal, we have moved beyond a time of crisis, as reflected in the removal of a note about pandemic effects in our Author and Reviewer e-mail invitation templates. In this annual address, we note a change of the guard in the editorial team that will have come into effect by the time this Editorial is published and some elements of continuity. We also reflect on developments over the past year while briefly touching on what lies ahead.
这篇社论发表之际,COVID-19 大流行病急性期的后遗症仍在持续,但世界上大多数国家的限制措施已有所放松,并已恢复(准)正常状态。在语言测试和评估界,许多同行都很高兴能有机会在 2023 年 7 月于纽约举行的第 44 届年度语言测试研究讨论会(LTRC)上会面并参加相关活动。此前 4 年,由于公共卫生问题、行动限制以及其他与政策相关的后勤问题,LTRC 一直在网上举行。在这本每年出版一次的《社论》中,我们发现能够专注于与流行病无关的事务是一种解放。就管理期刊的日常事务而言,我们已经走出了危机时期,这体现在我们的作者和审稿人电子邮件邀请模板中删除了有关大流行病影响的说明。在这篇年度致辞中,我们注意到编辑团队的人事变动,在这篇社论出版时这一变动已经生效,同时也注意到一些连续性因素。我们还将回顾过去一年的发展,并简要展望未来。
{"title":"Purposeful turns for more equitable and transparent publishing in language testing and assessment","authors":"Talia Isaacs, Paula M. Winke","doi":"10.1177/02655322231203234","DOIUrl":"https://doi.org/10.1177/02655322231203234","url":null,"abstract":"This Editorial comes at a time when the after-effects of the acute phase of the COVID-19 pandemic are still being felt but when, in most countries around the world, there has been some easing of restrictions and a return to (quasi-)normalcy. In the language testing and assessment community, many colleagues relished the opportunity to meet and participate in events at the 44th annual Language Testing Research Colloquium (LTRC) in New York in July 2023. This was after 4 years of LTRC exclusively being held online due to public health concerns, restrictions on movement, and other policy-related and logistical matters. In the context of this Editorial, which comes out annually, we find it liberating to be able to focus on matters that are non-pandemic related. In terms of the day-to-day business of managing the journal, we have moved beyond a time of crisis, as reflected in the removal of a note about pandemic effects in our Author and Reviewer e-mail invitation templates. In this annual address, we note a change of the guard in the editorial team that will have come into effect by the time this Editorial is published and some elements of continuity. We also reflect on developments over the past year while briefly touching on what lies ahead.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":null,"pages":null},"PeriodicalIF":4.1,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139392081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Language Testing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1