首页 > 最新文献

ETS Research Report Series最新文献

英文 中文
Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel–Haenszel DIF Statistics 加权差分项目功能(DIF)分析的稳健性:以Mantel-Haenszel DIF统计为例
Q3 Social Sciences Pub Date : 2021-08-08 DOI: 10.1002/ets2.12325
Ru Lu, Hongwen Guo, Neil J. Dorans

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel–Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from measurement invariance (DMI) for two studied groups. Previous research has shown, that DIF and DMI do not necessarily agree with each other. In practice, many operational testing programs use the MH DIF procedure to flag potential DIF items. Recently, weighted DIF statistics has been proposed, where weighted sum scores are used as the matching variable and the weights are the item discrimination parameters. It has been shown theoretically and analytically that, given the item parameters, weighted DIF statistics can close the gap between DIF and DMI. The current study investigates the robustness of using weighted DIF statistics empirically through simulations when item parameters have to be estimated from data.

两类分析方法可用于差异项目功能(DIF)分析。一类是基于观察得分的DIF分析,如Mantel-Haenszel (MH)和DIF程序的标准化比例正确度量;另一种是基于潜在能力的分析,其中统计量是两个研究组的偏离测量不变性(DMI)的度量。先前的研究表明,DIF和DMI不一定相互一致。在实践中,许多操作测试程序使用MH DIF程序来标记潜在的DIF项目。近年来,人们提出了一种加权DIF统计方法,将加权和分数作为匹配变量,权重作为项目区分参数。理论和分析表明,在给定项目参数的情况下,加权DIF统计可以缩小DIF与DMI之间的差距。本研究通过模拟实验考察了在需要从数据中估计项目参数时使用加权DIF统计的稳健性。
{"title":"Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel–Haenszel DIF Statistics","authors":"Ru Lu,&nbsp;Hongwen Guo,&nbsp;Neil J. Dorans","doi":"10.1002/ets2.12325","DOIUrl":"10.1002/ets2.12325","url":null,"abstract":"<p>Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel–Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from measurement invariance (DMI) for two studied groups. Previous research has shown, that DIF and DMI do not necessarily agree with each other. In practice, many operational testing programs use the MH DIF procedure to flag potential DIF items. Recently, weighted DIF statistics has been proposed, where weighted sum scores are used as the matching variable and the weights are the item discrimination parameters. It has been shown theoretically and analytically that, given the item parameters, weighted DIF statistics can close the gap between DIF and DMI. The current study investigates the robustness of using weighted DIF statistics empirically through simulations when item parameters have to be estimated from data.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-23"},"PeriodicalIF":0.0,"publicationDate":"2021-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12325","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46346124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Model Adequacy Checking for Applying Harmonic Regression to Assessment Quality Control 调和回归在评估质量控制中的模型充分性检验
Q3 Social Sciences Pub Date : 2021-08-08 DOI: 10.1002/ets2.12327
Jiahe Qian, Shuhong Li

In recent years, harmonic regression models have been applied to implement quality control for educational assessment data consisting of multiple administrations and displaying seasonality. As with other types of regression models, it is imperative that model adequacy checking and model fit be appropriately conducted. However, there has been no literature on how to perform a comprehensive model adequacy evaluation when applying harmonic regression models to sequential data with seasonality in the educational assessment field. This paper is intended to fill this gap with an illustration of real data from an English language assessment. Two types of cross-validation, leave-one-out and out-of-sample, were designed to measure prediction errors and check model validation. Three types of R-squared (, , and ) and various residual diagnostics were applied to check model adequacy and model fitting.

近年来,调和回归模型被应用于由多个部门组成并具有季节性的教育评估数据的质量控制。与其他类型的回归模型一样,必须适当地进行模型充分性检查和模型拟合。然而,在教育评估领域,调和回归模型如何对具有季节性的序列数据进行全面的模型充分性评价,目前尚无文献报道。本文旨在填补这一空白,从一个真实的数据说明,从英语语言评估。设计了两种类型的交叉验证,即留一和样本外验证,以测量预测误差并检查模型验证。应用三种类型的r平方(,,和)和各种残差诊断来检查模型充分性和模型拟合。
{"title":"Model Adequacy Checking for Applying Harmonic Regression to Assessment Quality Control","authors":"Jiahe Qian,&nbsp;Shuhong Li","doi":"10.1002/ets2.12327","DOIUrl":"10.1002/ets2.12327","url":null,"abstract":"<p>In recent years, harmonic regression models have been applied to implement quality control for educational assessment data consisting of multiple administrations and displaying seasonality. As with other types of regression models, it is imperative that model adequacy checking and model fit be appropriately conducted. However, there has been no literature on how to perform a comprehensive model adequacy evaluation when applying harmonic regression models to sequential data with seasonality in the educational assessment field. This paper is intended to fill this gap with an illustration of real data from an English language assessment. Two types of cross-validation, leave-one-out and out-of-sample, were designed to measure prediction errors and check model validation. Three types of <i>R</i>-squared (, , and ) and various residual diagnostics were applied to check model adequacy and model fitting.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-26"},"PeriodicalIF":0.0,"publicationDate":"2021-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12327","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42131139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Equitable STEM Instruction and Assessment: Accessibility and Fairness Considerations for Special Populations 公平的STEM教学和评估:特殊人群的可及性和公平性考虑
Q3 Social Sciences Pub Date : 2021-07-22 DOI: 10.1002/ets2.12324
Danielle Guzman-Orth, Cary A. Supalo, Derrick W. Smith, Okhee Lee, Teresa King

The landscape for STEM instruction is rapidly shifting in the United States. Attention toward STEM instruction and assessment opportunities is increasing. All students must have opportunities to gain access to the STEM content and show what they know and are able to do. We caution that attention to fairness and accessibility is critical for students from special populations, particularly English learners and students with disabilities. Opportunities for equitable access to STEM instruction and assessment are diminished without accessibility. In this report, we use an assets-based perspective to discuss and reframe common misconceptions and challenges as opportunities. We argue that attention to accessibility at the onset of STEM instruction and assessment is the pivotal foundation for fair opportunities in STEM. We highlight key opportunities and conclude with recommendations for improved fairness and access in STEM.

在美国,STEM教学的格局正在迅速发生变化。对STEM教学和评估机会的关注正在增加。所有学生都必须有机会获得STEM内容,并展示他们所知道和能够做的事情。我们提醒,对于特殊群体的学生,特别是英语学习者和残疾学生,关注公平和可及性至关重要。如果没有无障碍环境,公平获得STEM教学和评估的机会就会减少。在本报告中,我们使用基于资产的视角来讨论和重新定义常见的误解,并将挑战视为机遇。我们认为,在STEM教学和评估开始时关注可及性是STEM公平机会的关键基础。我们强调了关键的机会,并提出了改善STEM公平性和准入的建议。
{"title":"Equitable STEM Instruction and Assessment: Accessibility and Fairness Considerations for Special Populations","authors":"Danielle Guzman-Orth,&nbsp;Cary A. Supalo,&nbsp;Derrick W. Smith,&nbsp;Okhee Lee,&nbsp;Teresa King","doi":"10.1002/ets2.12324","DOIUrl":"10.1002/ets2.12324","url":null,"abstract":"<p>The landscape for STEM instruction is rapidly shifting in the United States. Attention toward STEM instruction and assessment opportunities is increasing. All students must have opportunities to gain access to the STEM content and show what they know and are able to do. We caution that attention to fairness and accessibility is critical for students from special populations, particularly English learners and students with disabilities. Opportunities for equitable access to STEM instruction and assessment are diminished without accessibility. In this report, we use an assets-based perspective to discuss and reframe common misconceptions and challenges as opportunities. We argue that attention to accessibility at the onset of STEM instruction and assessment is the pivotal foundation for fair opportunities in STEM. We highlight key opportunities and conclude with recommendations for improved fairness and access in STEM.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-16"},"PeriodicalIF":0.0,"publicationDate":"2021-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12324","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45176062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Assessing Mode Effects of At-Home Testing Without a Randomized Trial 在没有随机试验的情况下评估在家测试的模式效应
Q3 Social Sciences Pub Date : 2021-07-20 DOI: 10.1002/ets2.12323
Sooyeon Kim, Michael Walker

In this investigation, we used real data to assess potential differential effects associated with taking a test in a test center (TC) versus testing at home using remote proctoring (RP). We used a pseudo-equivalent groups (PEG) approach to examine group equivalence at the item level and the total score level. If our assumption holds that the PEG approach removes between-group ability differences (as measured by the test) reasonably well, then a plausible explanation for any systematic differences in performance between TC and RP groups that remain after applying the PEG approach would be the operation of test mode effects. At the item level, we compared item difficulties estimated using the PEG approach (i.e., adjusting only for ability differences between groups) to those estimated via delta equating (i.e., adjusting for any systematic differences between groups). All tests used in this investigation showed small, nonsystematic differences, providing evidence of trivial effects associated with at-home testing. At the total score level, we linked the RP group scores to the TC group scores after adjusting for group differences using demographic covariates. We then compared the resulting RP group conversion to the original TC group conversion (the criterion in this study). The magnitude of differences between the RP conversion and the TC conversion was small, leading to the same pass/fail decision for most RP examinees. The present analyses seem to suggest little to no mode effects for the tests used in this investigation.

在这项调查中,我们使用真实数据来评估在测试中心(TC)与在家使用远程监考(RP)进行测试相关的潜在差异影响。我们使用伪等效组(PEG)方法来检验项目水平和总分水平的组等效性。如果我们的假设认为PEG方法可以很好地消除组间能力差异(通过测试测量),那么对于应用PEG方法后TC组和RP组之间表现的任何系统性差异的合理解释将是测试模式效应的运作。在项目层面,我们比较了使用PEG方法估计的项目难度(即,仅调整小组之间的能力差异)和通过delta方程估计的项目难度(即,调整小组之间的任何系统差异)。在这项调查中使用的所有测试都显示出小的、非系统的差异,提供了与家庭测试相关的微不足道的影响的证据。在总分水平上,我们使用人口统计学协变量调整组差异后,将RP组得分与TC组得分联系起来。然后,我们将RP组的转换结果与原始TC组的转换结果(本研究的标准)进行比较。RP转换和TC转换之间的差异幅度很小,导致大多数RP考生的通过/不通过决定相同。目前的分析似乎表明,在这个调查中使用的测试很少或没有模态效应。
{"title":"Assessing Mode Effects of At-Home Testing Without a Randomized Trial","authors":"Sooyeon Kim,&nbsp;Michael Walker","doi":"10.1002/ets2.12323","DOIUrl":"10.1002/ets2.12323","url":null,"abstract":"<p>In this investigation, we used real data to assess potential differential effects associated with taking a test in a test center (TC) versus testing at home using remote proctoring (RP). We used a pseudo-equivalent groups (PEG) approach to examine group equivalence at the item level and the total score level. If our assumption holds that the PEG approach removes between-group ability differences (as measured by the test) reasonably well, then a plausible explanation for any systematic differences in performance between TC and RP groups that remain after applying the PEG approach would be the operation of test mode effects. At the item level, we compared item difficulties estimated using the PEG approach (i.e., adjusting only for ability differences between groups) to those estimated via delta equating (i.e., adjusting for any systematic differences between groups). All tests used in this investigation showed small, nonsystematic differences, providing evidence of trivial effects associated with at-home testing. At the total score level, we linked the RP group scores to the TC group scores after adjusting for group differences using demographic covariates. We then compared the resulting RP group conversion to the original TC group conversion (the criterion in this study). The magnitude of differences between the RP conversion and the TC conversion was small, leading to the same pass/fail decision for most RP examinees. The present analyses seem to suggest little to no mode effects for the tests used in this investigation.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-21"},"PeriodicalIF":0.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12323","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49496964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The Effects of Extended Planning Time on Candidates' Performance, Processes, and Strategy Use in the Lecture Listening-Into-Speaking Tasks of the TOEFL iBT® Test 延长计划时间对考生在托福网考听力转化为口语任务中的表现、过程和策略使用的影响
Q3 Social Sciences Pub Date : 2021-06-21 DOI: 10.1002/ets2.12322
Chihiro Inoue, Daniel M. K. Lam

This study investigated the effects of two different planning time conditions (i.e., operational [20 s] and extended length [90 s]) for the lecture listening-into-speaking tasks of the TOEFL iBT® test for candidates at different proficiency levels. Seventy international students based in universities and language schools in the United Kingdom (35 at a lower level; 35 at a higher level) participated in the study. The effects of different lengths of planning time were examined in terms of (a) the scores given by ETS-certified raters; (b) the quality of the speaking performances characterized by accurately reproduced idea units and the measures of complexity, accuracy, and fluency; and (c) self-reported use of cognitive and metacognitive processes and strategies during listening, planning, and speaking. The results found neither a statistically significant main effect of the length of planning time nor an interaction between planning time and proficiency on the scores or on the quality of the speaking performance. There were several cognitive and metacognitive processes and strategies where significantly more engagement was reported under the extended planning time, which suggests enhanced cognitive validity of the task. However, the increased engagement in planning did not lead to any measurable improvement in the score. Therefore, in the interest of practicality, the results of this study provide justifications for the operational length of planning time for the lecture listening-into-speaking tasks in the speaking section of the TOEFL iBT test.

本研究调查了两种不同的计划时间条件(即操作时间[20]和延长时间[90])对不同熟练程度考生的托福iBT®考试演讲听任务的影响。联合王国各大学和语言学校的70名国际学生(35名低年级学生;35名高年级学生)参加了这项研究。根据(a)ETS认证评分员给出的分数来检验不同计划时间长度的影响;(b) 演讲表演的质量,其特点是准确再现思想单元,以及复杂性、准确性和流畅性的衡量标准;以及(c)在听力、计划和口语过程中自我报告的认知和元认知过程和策略的使用情况。研究结果发现,计划时间长度和计划时间与熟练程度之间的相互作用对口语成绩或质量的影响都没有统计学意义。在几个认知和元认知过程和策略中,在延长的计划时间下,参与度显著增加,这表明该任务的认知有效性增强。然而,管理计划的增加并没有导致核心的任何可衡量的改善。因此,出于实用性的考虑,本研究的结果为托福iBT考试口语部分的演讲听口语任务的计划时间长度提供了合理的依据。
{"title":"The Effects of Extended Planning Time on Candidates' Performance, Processes, and Strategy Use in the Lecture Listening-Into-Speaking Tasks of the TOEFL iBT® Test","authors":"Chihiro Inoue,&nbsp;Daniel M. K. Lam","doi":"10.1002/ets2.12322","DOIUrl":"10.1002/ets2.12322","url":null,"abstract":"<p>This study investigated the effects of two different planning time conditions (i.e., operational [20 s] and extended length [90 s]) for the lecture listening-into-speaking tasks of the <i>TOEFL iBT</i>® test for candidates at different proficiency levels. Seventy international students based in universities and language schools in the United Kingdom (35 at a lower level; 35 at a higher level) participated in the study. The effects of different lengths of planning time were examined in terms of (a) the scores given by ETS-certified raters; (b) the quality of the speaking performances characterized by accurately reproduced idea units and the measures of complexity, accuracy, and fluency; and (c) self-reported use of cognitive and metacognitive processes and strategies during listening, planning, and speaking. The results found neither a statistically significant main effect of the length of planning time nor an interaction between planning time and proficiency on the scores or on the quality of the speaking performance. There were several cognitive and metacognitive processes and strategies where significantly more engagement was reported under the extended planning time, which suggests enhanced cognitive validity of the task. However, the increased engagement in planning did not lead to any measurable improvement in the score. Therefore, in the interest of practicality, the results of this study provide justifications for the operational length of planning time for the lecture listening-into-speaking tasks in the speaking section of the TOEFL iBT test.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-32"},"PeriodicalIF":0.0,"publicationDate":"2021-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12322","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41750346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Identifying Teachers' Needs for Results From Interim Unit Assessments 确定教师对中期单元评估结果的需求
Q3 Social Sciences Pub Date : 2021-05-03 DOI: 10.1002/ets2.12320
Priya Kannan, Andrew D. Bryant, Shiyi Shao, E. Caroline Wylie

Interim assessments have been defined variously in different contexts and can be used for predictive purposes or instructional purposes. In this paper, we present results from a study where we evaluated reporting needs for interim assessments designed for instructional purposes and intended to be used at the end of defined curriculum units. Results from such unit assessments should help teachers determine gaps in student understanding and inform ongoing instructional decision-making. Our goal was to determine if learning progressions (LPs) could serve as the cognitive lens through which teachers can evaluate how their students' understanding of key constructs improves through periodic unit assessments. Therefore, we used the LP framework in mathematics and the key practices (KP) framework for English language arts (ELA) to design preliminary teacher report mock-ups for these unit assessments. Within a utilization-oriented evaluation framework, we conducted six needs-assessment focus groups with elementary and middle school mathematics (n = 12) and ELA (n = 11) teachers to specifically evaluate the extent to which they find results presented within the LP and KP frameworks understandable and useful for their instructional practice. Results from the focus groups show teachers' overall needs for types of information sought from unit assessment reports, the extent to which teachers are familiar with the LP and KP frameworks, their interpretations (including confusions) of the information presented in the preliminary mock-ups, and their additional needs for reports from unit assessments to be instructionally useful.

中期评估在不同的背景下有不同的定义,可以用于预测目的或教学目的。在本文中,我们介绍了一项研究的结果,在该研究中,我们评估了为教学目的设计的中期评估的报告需求,该评估旨在在定义的课程单元结束时使用。这种单元评估的结果应该有助于教师确定学生理解方面的差距,并为正在进行的教学决策提供信息。我们的目标是确定学习进展(LP)是否可以作为认知镜头,教师可以通过该镜头评估学生如何通过定期的单元评估提高对关键结构的理解。因此,我们使用数学中的LP框架和英语语言艺术(ELA)的关键实践(KP)框架为这些单元评估设计了初步的教师报告模型。在一个以使用为导向的评估框架内,我们对小学和中学数学(n=12)和ELA(n=11)教师进行了六个需求评估重点小组,具体评估了他们发现在LP和KP框架中得出的结果在多大程度上可以理解并对其教学实践有用。焦点小组的结果显示了教师对从单元评估报告中寻求的信息类型的总体需求,教师对LP和KP框架的熟悉程度,他们对初步模型中提供的信息的解释(包括混淆),以及他们对单元评估报告的额外需求,这些报告具有指导意义。本研究涉及以下广泛的研究问题:(a)这些教师目前使用什么类型的单元评估?他们收到了哪些类型的评估报告(如果有的话)?(b) 中小学ELA和数学教师对单元评估结果的最高需求是什么?(c) 在多大程度上,LP/KP框架中呈现的结果对教师来说是可理解和有用的?(d) 初步模型在多大程度上满足了教师对单元评估结果的最重要需求?缺少什么,可以改进什么?
{"title":"Identifying Teachers' Needs for Results From Interim Unit Assessments","authors":"Priya Kannan,&nbsp;Andrew D. Bryant,&nbsp;Shiyi Shao,&nbsp;E. Caroline Wylie","doi":"10.1002/ets2.12320","DOIUrl":"10.1002/ets2.12320","url":null,"abstract":"<p>Interim assessments have been defined variously in different contexts and can be used for predictive purposes or instructional purposes. In this paper, we present results from a study where we evaluated reporting needs for interim assessments designed for instructional purposes and intended to be used at the end of defined curriculum units. Results from such unit assessments should help teachers determine gaps in student understanding and inform ongoing instructional decision-making. Our goal was to determine if learning progressions (LPs) could serve as the cognitive lens through which teachers can evaluate how their students' understanding of key constructs improves through periodic unit assessments. Therefore, we used the LP framework in mathematics and the key practices (KP) framework for English language arts (ELA) to design preliminary teacher report mock-ups for these unit assessments. Within a utilization-oriented evaluation framework, we conducted six needs-assessment focus groups with elementary and middle school mathematics (<i>n</i> = 12) and ELA (<i>n</i> = 11) teachers to specifically evaluate the extent to which they find results presented within the LP and KP frameworks understandable and useful for their instructional practice. Results from the focus groups show teachers' overall needs for types of information sought from unit assessment reports, the extent to which teachers are familiar with the LP and KP frameworks, their interpretations (including confusions) of the information presented in the preliminary mock-ups, and their additional needs for reports from unit assessments to be instructionally useful.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-39"},"PeriodicalIF":0.0,"publicationDate":"2021-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12320","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46194713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Development and Evaluation of Interactional Competence Elicitor for Oral Language Assessments 口语评估互动能力诱导因子的开发与评价
Q3 Social Sciences Pub Date : 2021-04-14 DOI: 10.1002/ets2.12319
Evgeny Chukharev-Hudilainen, Gary J. Ockey

This paper describes the development and evaluation of Interaction Competence Elicitor (ICE), a spoken dialog system (SDS) for the delivery of a paired oral discussion task in the context of language assessment. The purpose of ICE is to sustain a topic-specific conversation with a test taker in order to elicit discourse that can be later judged to assess the test taker's oral language ability, including interactional competence. The development of ICE is reported in detail to provide guidance for future developers of similar systems. The performance of ICE is evaluated on two aspects: (a) by analyzing system errors that occur at different stages in the natural language processing (NLP) pipeline in terms of both their preventability and their impact on the downstream stages of the pipeline, and (b) by analyzing questionnaire and semistructured interview data to establish the test takers' experience with the system. Findings suggest that ICE was robust in 90% of the dialog turns it produced, and test takers noted both positive and negative aspects of communicating with the system as opposed to a human interlocutor. We conclude that this prototype system lays important groundwork for the development and use of specialized SDSs in the assessment of oral communication, which includes interactional competence.

本文介绍了交互式能力引出器(ICE)的开发和评估,这是一种用于在语言评估背景下进行配对口头讨论任务的口语对话系统(SDS)。ICE的目的是与考生保持特定话题的对话,以引出话语,这些话语可以在之后被判断以评估考生的口语能力,包括互动能力。详细介绍了ICE的发展情况,为今后类似系统的开发提供指导。我们从两个方面对ICE的性能进行评估:(a)通过分析自然语言处理(NLP)管道中不同阶段发生的系统错误,分析其可预防性及其对管道下游阶段的影响;(b)通过分析问卷和半结构化访谈数据,建立考生对该系统的体验。研究结果表明,ICE在其产生的90%的对话回合中都是稳健的,与人类对话者相比,考生注意到了与系统沟通的积极和消极方面。我们的结论是,该原型系统为开发和使用专门的sds来评估口头交流(包括互动能力)奠定了重要的基础。
{"title":"The Development and Evaluation of Interactional Competence Elicitor for Oral Language Assessments","authors":"Evgeny Chukharev-Hudilainen,&nbsp;Gary J. Ockey","doi":"10.1002/ets2.12319","DOIUrl":"10.1002/ets2.12319","url":null,"abstract":"<p>This paper describes the development and evaluation of Interaction Competence Elicitor (ICE), a spoken dialog system (SDS) for the delivery of a paired oral discussion task in the context of language assessment. The purpose of ICE is to sustain a topic-specific conversation with a test taker in order to elicit discourse that can be later judged to assess the test taker's oral language ability, including interactional competence. The development of ICE is reported in detail to provide guidance for future developers of similar systems. The performance of ICE is evaluated on two aspects: (a) by analyzing system errors that occur at different stages in the natural language processing (NLP) pipeline in terms of both their preventability and their impact on the downstream stages of the pipeline, and (b) by analyzing questionnaire and semistructured interview data to establish the test takers' experience with the system. Findings suggest that ICE was robust in 90% of the dialog turns it produced, and test takers noted both positive and negative aspects of communicating with the system as opposed to a human interlocutor. We conclude that this prototype system lays important groundwork for the development and use of specialized SDSs in the assessment of oral communication, which includes interactional competence.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-20"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12319","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41668805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Career and Technical Education as a Conduit for Skilled Technical Careers: A Targeted Research Review and Framework for Future Research 职业和技术教育作为熟练技术职业的渠道:一个有针对性的研究回顾和未来研究的框架
Q3 Social Sciences Pub Date : 2021-04-11 DOI: 10.1002/ets2.12318
Sara Haviland, Steven Robbins

Workforce development and career and technical education (CTE) have long provided reliable pathways to middle skill jobs and a gateway to the middle class. Given recent changes in middle skills jobs, the education landscape, and federal policy priorities, the role of CTE in the U.S. educational landscape is evolving more rapidly, encompassing a broader range of education, and practices are changing ahead of research. The first part of this report provides an overview of the current state of CTE in the United States, as well as the state of CTE research, and presents an argument for a broader definition of CTE that incorporates workforce development through postsecondary institutions. The second part provides operational definitions and typologies to facilitate future research. Our aim is to build a research framework for CTE that is grounded in a normative path through CTE: getting in (preparation and recruitment), getting through (retention and skill acquisition), getting out (completion and initial hire), and getting on (career progression). Key challenges and priorities for future research are discussed.

长期以来,劳动力发展和职业技术教育(CTE)提供了通往中等技能工作的可靠途径和通往中产阶级的门户。鉴于最近中等技能工作、教育格局和联邦政策重点的变化,CTE在美国教育格局中的作用正在迅速演变,涵盖了更广泛的教育领域,实践也在研究之前发生变化。本报告的第一部分概述了美国CTE的现状,以及CTE研究的现状,并提出了一个更广泛的CTE定义,其中包括通过高等教育机构的劳动力发展。第二部分提供了可操作的定义和类型,以方便未来的研究。我们的目标是为CTE建立一个基于CTE规范路径的研究框架:进入(准备和招聘),通过(保留和技能获取),退出(完成和初始雇用),以及继续(职业发展)。讨论了未来研究的主要挑战和重点。
{"title":"Career and Technical Education as a Conduit for Skilled Technical Careers: A Targeted Research Review and Framework for Future Research","authors":"Sara Haviland,&nbsp;Steven Robbins","doi":"10.1002/ets2.12318","DOIUrl":"10.1002/ets2.12318","url":null,"abstract":"<p>Workforce development and career and technical education (CTE) have long provided reliable pathways to middle skill jobs and a gateway to the middle class. Given recent changes in middle skills jobs, the education landscape, and federal policy priorities, the role of CTE in the U.S. educational landscape is evolving more rapidly, encompassing a broader range of education, and practices are changing ahead of research. The first part of this report provides an overview of the current state of CTE in the United States, as well as the state of CTE research, and presents an argument for a broader definition of CTE that incorporates workforce development through postsecondary institutions. The second part provides operational definitions and typologies to facilitate future research. Our aim is to build a research framework for CTE that is grounded in a normative path through CTE: getting in (preparation and recruitment), getting through (retention and skill acquisition), getting out (completion and initial hire), and getting on (career progression). Key challenges and priorities for future research are discussed.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-42"},"PeriodicalIF":0.0,"publicationDate":"2021-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12318","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47638937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Certified to Evaluate: Exploring Administrator Accuracy and Beliefs in Teacher Observation 认证评估:探索管理者在教师观察中的准确性和信念
Q3 Social Sciences Pub Date : 2021-03-29 DOI: 10.1002/ets2.12316
Nathan Jones, Courtney Bell, Yi Qi, Jennifer Lewis, David Kirui, Leslie Stickler, Amanda Redash

The observation systems being used in all 50 states require administrators to learn to accurately and reliably score their teachers' instruction using standardized observation systems. Although the literature on observation systems is growing, relatively few studies have examined the outcomes of trainings focused on developing administrators' accuracy using observation systems and the administrators' perceptions of that training. Therefore, the focus of this study is on examining administrators' efforts to become accurate and reliable within the context of a comprehensive teacher evaluation reform. This study was conducted during the year-long training and implementation of a new observation system in the context of a large urban district's teacher evaluation reform. The study brings together data on the outcomes of the district training—results on a certification exercise from all administrators in the district—with two sources of data on administrators' perceptions and beliefs. Specifically, we collected fall and spring survey data from nearly 300 administrators and longitudinal interview data from a subsample of 24 administrators. Taken together, these data allowed us to investigate administrators' responses to training and low-stakes practice using the observation process over 1 year. At the end of initial training, administrators reported high levels of learning, particularly in domains aligned with the focus of training. Over the year, administrators reported increased facility with the routines of conducting observations, but they still expressed learning needs, many related to the content of the observation framework. However, results from the training certification test suggested lower than desired levels of accuracy and reliability; administrators regularly did not agree with each other or with master raters. The certification test results suggested that even with a significant investment in administrator learning, there was more to be learned and mastered. If we hope for teacher evaluation to lead to the types of changes in teaching and learning that reformers have envisioned, policymakers and practitioners alike will need to devote time and resources to supporting administrator learning in initial training and throughout administrator use in practice.

所有50个州都在使用的观察系统要求管理人员学会使用标准化的观察系统准确可靠地为教师的教学打分。虽然关于观察系统的文献越来越多,但相对较少的研究审查了侧重于发展使用观察系统的管理人员的准确性和管理人员对该培训的看法的培训的结果。因此,本研究的重点是在全面教师评价改革的背景下,检查管理者为变得准确和可靠所做的努力。本研究是在一个大型城市地区教师评价改革背景下为期一年的新观察系统的培训和实施期间进行的。该研究汇集了地区培训结果的数据-地区所有行政人员的认证结果-以及行政人员的认知和信念的两个数据来源。具体来说,我们收集了来自近300名管理人员的秋季和春季调查数据,以及来自24名管理人员的纵向访谈数据。综上所述,这些数据使我们能够使用超过1年的观察过程来调查管理人员对培训和低风险实践的反应。在初始培训结束时,管理人员报告了高水平的学习,特别是在与培训重点一致的领域。在过去的一年里,管理人员报告说,他们在进行观察的日常工作方面增加了便利,但他们仍然表达了学习需求,其中许多与观察框架的内容有关。然而,培训认证测试的结果表明,准确性和可靠性低于预期水平;管理人员经常不同意彼此或主评级。认证测试结果表明,即使在管理员学习方面进行了大量投资,仍然有更多的东西需要学习和掌握。如果我们希望教师评价能带来改革者所设想的教与学的变化,政策制定者和实践者都需要投入时间和资源,在最初的培训和整个实践中支持管理者的学习。
{"title":"Certified to Evaluate: Exploring Administrator Accuracy and Beliefs in Teacher Observation","authors":"Nathan Jones,&nbsp;Courtney Bell,&nbsp;Yi Qi,&nbsp;Jennifer Lewis,&nbsp;David Kirui,&nbsp;Leslie Stickler,&nbsp;Amanda Redash","doi":"10.1002/ets2.12316","DOIUrl":"10.1002/ets2.12316","url":null,"abstract":"<p>The observation systems being used in all 50 states require administrators to learn to accurately and reliably score their teachers' instruction using standardized observation systems. Although the literature on observation systems is growing, relatively few studies have examined the outcomes of trainings focused on developing administrators' accuracy using observation systems and the administrators' perceptions of that training. Therefore, the focus of this study is on examining administrators' efforts to become accurate and reliable within the context of a comprehensive teacher evaluation reform. This study was conducted during the year-long training and implementation of a new observation system in the context of a large urban district's teacher evaluation reform. The study brings together data on the outcomes of the district training—results on a certification exercise from all administrators in the district—with two sources of data on administrators' perceptions and beliefs. Specifically, we collected fall and spring survey data from nearly 300 administrators and longitudinal interview data from a subsample of 24 administrators. Taken together, these data allowed us to investigate administrators' responses to training and low-stakes practice using the observation process over 1 year. At the end of initial training, administrators reported high levels of learning, particularly in domains aligned with the focus of training. Over the year, administrators reported increased facility with the routines of conducting observations, but they still expressed learning needs, many related to the content of the observation framework. However, results from the training certification test suggested lower than desired levels of accuracy and reliability; administrators regularly did not agree with each other or with master raters. The certification test results suggested that even with a significant investment in administrator learning, there was more to be learned and mastered. If we hope for teacher evaluation to lead to the types of changes in teaching and learning that reformers have envisioned, policymakers and practitioners alike will need to devote time and resources to supporting administrator learning in initial training and throughout administrator use in practice.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-20"},"PeriodicalIF":0.0,"publicationDate":"2021-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12316","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42458081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Researching Academic Reading in Two Contrasting English as a Medium of Instruction Contexts at a University Level 两种对比英语教学语境下的大学水平学术阅读研究
Q3 Social Sciences Pub Date : 2021-03-29 DOI: 10.1002/ets2.12317
Nathaniel Owen, Prithvi N. Shrestha, Anna Kristina Hultgren

This project examined academic reading in two contrasting English as a medium of instruction (EMI) university settings in Nepal and Sweden and the unique challenges facing students who are studying in a language other than their primary language. The motivation for the project was to explore the role of high-stakes testing in EMI contexts and the implications for the design of the TOEFL iBT® test. We employed a sequential mixed-methods approach to gather substantive and authentic qualitative data from stakeholders immersed in EMI settings. A small sample of students (Nepal = 19, Sweden = nine) were asked to complete reading logs over a period of 3 weeks so we could determine the types of texts and reading load associated with diverse EMI settings. Additionally, a larger cohort of students from each setting (Nepal = 69, Sweden = 60) completed questionnaires examining academic reading demands, reading skills, and practices. Students who completed the questionnaires also completed the reading section of the TOEFL iBT test. The same students also completed a TOEFL® family of tests suitability questionnaire so we could consider the suitability of the TOEFL iBT test for EMI contexts. Following test completion, a series of semistructured interviews (Nepal = 21, Sweden = 23) focused more closely on students' perspectives of reading demands in their academic contexts and the suitability of the reading section of the TOEFL iBT test to make claims about readiness to study in EMI contexts. Our findings revealed that different EMI contexts have different standards of high and low academic reading proficiency and that these differences may occur due to differences in educational experiences of the respective cohorts. The findings offer important new insights into academic reading and assessment in EMI contexts. Students in EMI contexts are sensitive to violations of expectations regarding test-taking experiences (face validity). The study has implications for the design of test tasks, which should consider local, contextual varieties of English.

本项目考察了尼泊尔和瑞典两所大学以英语作为教学媒介(EMI)的学术阅读情况,以及以非母语学习的学生所面临的独特挑战。该项目的动机是探索高风险考试在EMI环境中的作用,以及对托福网考®考试设计的影响。我们采用顺序混合方法从沉浸在EMI环境中的利益相关者那里收集实质性和真实的定性数据。一小部分学生(尼泊尔= 19人,瑞典= 9人)被要求在3周内完成阅读日志,这样我们就可以确定与不同EMI设置相关的文本类型和阅读负荷。此外,来自不同国家(尼泊尔= 69,瑞典= 60)的更大的学生群体完成了关于学术阅读需求、阅读技能和实践的问卷调查。完成问卷的学生同时也完成了托福网考的阅读部分。这些学生还完成了托福®系列考试适用性调查问卷,以便我们考虑托福网考是否适合EMI环境。测试完成后,一系列半结构化访谈(尼泊尔= 21,瑞典= 23)更密切地关注学生在学术环境中阅读需求的观点,以及托福网考阅读部分的适用性,以声明是否准备好在EMI环境中学习。我们的研究结果表明,不同的EMI环境有不同的高和低学术阅读能力标准,这些差异可能是由于各自群体的教育经历不同而产生的。这些发现为EMI环境下的学术阅读和评估提供了重要的新见解。EMI环境下的学生对违反有关考试经历的期望(面效度)很敏感。这项研究对测试任务的设计有启示,它应该考虑到当地的、上下文的英语多样性。
{"title":"Researching Academic Reading in Two Contrasting English as a Medium of Instruction Contexts at a University Level","authors":"Nathaniel Owen,&nbsp;Prithvi N. Shrestha,&nbsp;Anna Kristina Hultgren","doi":"10.1002/ets2.12317","DOIUrl":"10.1002/ets2.12317","url":null,"abstract":"<p>This project examined academic reading in two contrasting English as a medium of instruction (EMI) university settings in Nepal and Sweden and the unique challenges facing students who are studying in a language other than their primary language. The motivation for the project was to explore the role of high-stakes testing in EMI contexts and the implications for the design of the <i>TOEFL iBT</i>® test. We employed a sequential mixed-methods approach to gather substantive and authentic qualitative data from stakeholders immersed in EMI settings. A small sample of students (Nepal = 19, Sweden = nine) were asked to complete reading logs over a period of 3 weeks so we could determine the types of texts and reading load associated with diverse EMI settings. Additionally, a larger cohort of students from each setting (Nepal = 69, Sweden = 60) completed questionnaires examining academic reading demands, reading skills, and practices. Students who completed the questionnaires also completed the reading section of the TOEFL iBT test. The same students also completed a <i>TOEFL</i>® family of tests suitability questionnaire so we could consider the suitability of the TOEFL iBT test for EMI contexts. Following test completion, a series of semistructured interviews (Nepal = 21, Sweden = 23) focused more closely on students' perspectives of reading demands in their academic contexts and the suitability of the reading section of the TOEFL iBT test to make claims about readiness to study in EMI contexts. Our findings revealed that different EMI contexts have different standards of high and low academic reading proficiency and that these differences may occur due to differences in educational experiences of the respective cohorts. The findings offer important new insights into academic reading and assessment in EMI contexts. Students in EMI contexts are sensitive to violations of expectations regarding test-taking experiences (face validity). The study has implications for the design of test tasks, which should consider local, contextual varieties of English.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-28"},"PeriodicalIF":0.0,"publicationDate":"2021-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12317","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42450371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
ETS Research Report Series
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1