ETS Research Report Series最新文献

Exploring the Idea of Task in the Context of the Young Language Learner Classroom 任务概念在青少年语言学习者课堂语境中的探索

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2024-12-02 DOI: 10.1002/ets2.12389

Veronika Timpe-Laughlin, Bianca Roters, Yuko Goto Butler

Originating in adult education, the approach of task-based language teaching (TBLT) has been promoted in young language learner (YLL) education. However, its application often encounters challenges due to varying interpretations of what constitutes a “task.” Previous research has repeatedly highlighted gaps in teachers' understanding of tasks, often reducing them to mere exercises rather than opportunities for genuine communication. A potential issue could be that some of the criteria of a task as defined in the literature that focuses on adult second/foreign language (L2) learners do not necessarily apply or may need to be modified in YLL education. For example, tasks have traditionally been defined as having “authenticity,” but this may vary, as YLLs are often engaged in play and driven by imagination. Additionally, for children, school represents their “real world,” so their concept of an “authentic” task may differ from that of adult L2 learners, who may be attending classes to improve workplace skills. In this study, we aimed to explore the concept of task in the context of teaching an additional language to YLLs in primary education. Utilizing a Delphi method, 16 well-known experts who work at the intersection of applied linguistics, TBLT, and YLLs participated in three rounds of data collection via email. After providing written definitions of a task and its characteristics in the YLL classroom in Round 1, the experts rated each other's definitions on a 4-point Likert scale and provided comments on the definitions in two subsequent rounds. Additionally, we conducted follow-up interviews with a subsample of the participants (n = 6) relative to a particular task characteristic: “authenticity.” Using both quantitative and qualitative analyses, we identified key aspects from the data, including task characteristics, learner considerations, and implementation details. Findings showed a distinction between “activity” and “task,” with the latter being understood as featuring certain characteristics. Accordingly, a task in the YLL classroom has a goal orientation, an orientation to meaning rather than linguistic form, a need for YLLs to use their L2 repertoire, a type of information gap, and a real-life connection. While largely congruent with the concept of task in the L2 adult literature, the experts particularly highlighted a learner-oriented approach to tasks that stresses cognitive, social-emotional, and affective development of YLLs. In particular, experts highlighted the significance of imagination as part of children's authentic world. Thus an “authentic” task for adults may reference a “real-world” domain, whereas an authentic task for YLLs may reference an imaginary one. We discuss the findings and emphasize that the concept of task in YLL education should be broadened to include aspects of imaginary worlds and make-believe.

任务型语言教学（task-based language teaching，简称TBLT）起源于成人教育，在青少年语言学习者教育中得到推广。然而，由于对“任务”构成的不同解释，它的应用经常遇到挑战。先前的研究一再强调教师对任务的理解存在差距，往往将其简化为纯粹的练习，而不是真正交流的机会。一个潜在的问题可能是，在针对成人第二语言/外语学习者的文献中定义的一些任务标准不一定适用于或可能需要在青少年语言教育中进行修改。例如，任务传统上被定义为具有“真实性”，但这可能会有所不同，因为yll经常参与游戏并受到想象力的驱动。此外，对于孩子们来说，学校代表着他们的“真实世界”，所以他们对“真实”任务的概念可能与成人二语学习者不同，成人二语学习者可能是为了提高工作技能而上课。在本研究中，我们旨在探讨任务概念在小学外语教学中的作用。采用德尔菲法，16位在应用语言学、任务型语言教学法和语言教学法交叉领域工作的知名专家通过电子邮件参与了三轮数据收集。在第一轮的YLL课堂上，专家们提供了任务及其特征的书面定义后，用4分的李克特量表对彼此的定义进行评分，并在随后的两轮中对定义进行评论。此外，我们对参与者的子样本（n = 6）进行了关于特定任务特征的后续访谈：“真实性”。通过定量和定性分析，我们从数据中确定了关键方面，包括任务特征、学习者考虑因素和实施细节。研究结果显示，“活动”和“任务”之间存在区别，后者被理解为具有某些特征。因此，在青少年二语课堂上的任务具有目标导向、对意义而不是语言形式的导向、对青少年二语学习者使用他们的第二语言技能的需求、一种类型的信息差距和现实生活中的联系。虽然在很大程度上与二语成人文献中的任务概念一致，但专家们特别强调了以学习者为导向的任务方法，强调了二语学习者的认知、社会情感和情感发展。专家们特别强调了想象力作为儿童真实世界一部分的重要性。因此，成年人的“真实”任务可能涉及“现实世界”领域，而青少年的“真实”任务可能涉及虚构领域。我们讨论了研究结果，并强调任务的概念应该扩大到包括想象世界和假装的方面。

{"title":"Exploring the Idea of Task in the Context of the Young Language Learner Classroom","authors":"Veronika Timpe-Laughlin, Bianca Roters, Yuko Goto Butler","doi":"10.1002/ets2.12389","DOIUrl":"https://doi.org/10.1002/ets2.12389","url":null,"abstract":"Originating in adult education, the approach of task-based language teaching (TBLT) has been promoted in young language learner (YLL) education. However, its application often encounters challenges due to varying interpretations of what constitutes a “task.” Previous research has repeatedly highlighted gaps in teachers' understanding of tasks, often reducing them to mere exercises rather than opportunities for genuine communication. A potential issue could be that some of the criteria of a task as defined in the literature that focuses on adult second/foreign language (L2) learners do not necessarily apply or may need to be modified in YLL education. For example, tasks have traditionally been defined as having “authenticity,” but this may vary, as YLLs are often engaged in play and driven by imagination. Additionally, for children, school represents their “real world,” so their concept of an “authentic” task may differ from that of adult L2 learners, who may be attending classes to improve workplace skills. In this study, we aimed to explore the concept of task in the context of teaching an additional language to YLLs in primary education. Utilizing a Delphi method, 16 well-known experts who work at the intersection of applied linguistics, TBLT, and YLLs participated in three rounds of data collection via email. After providing written definitions of a task and its characteristics in the YLL classroom in Round 1, the experts rated each other's definitions on a 4-point Likert scale and provided comments on the definitions in two subsequent rounds. Additionally, we conducted follow-up interviews with a subsample of the participants (n = 6) relative to a particular task characteristic: “authenticity.” Using both quantitative and qualitative analyses, we identified key aspects from the data, including task characteristics, learner considerations, and implementation details. Findings showed a distinction between “activity” and “task,” with the latter being understood as featuring certain characteristics. Accordingly, a task in the YLL classroom has a goal orientation, an orientation to meaning rather than linguistic form, a need for YLLs to use their L2 repertoire, a type of information gap, and a real-life connection. While largely congruent with the concept of task in the L2 adult literature, the experts particularly highlighted a learner-oriented approach to tasks that stresses cognitive, social-emotional, and affective development of YLLs. In particular, experts highlighted the significance of imagination as part of children's authentic world. Thus an “authentic” task for adults may reference a “real-world” domain, whereas an authentic task for YLLs may reference an imaginary one. We discuss the findings and emphasize that the concept of task in YLL education should be broadened to include aspects of imaginary worlds and make-believe.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-52"},"PeriodicalIF":0.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12389","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142867864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Monitoring Oral Reading Fluency From Electronic Shared Book Reading: Insights From a Full-Length Book Reading Study With Relay Reader® 监测口语阅读流畅性从电子共享的书籍阅读：从一个完整的阅读研究中继阅读器的见解®

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2024-11-27 DOI: 10.1002/ets2.12390

Zuowei Wang, Beata Beigman Klebanov, Tenaha O'Reilly, John Sabatini

Existing research reveals a robust relationship between self-reported print exposure and long-term literacy development, yet few studies have demonstrated how reading skills change as children read a book in the short term. In this study, 50 children (mean age 9.7 years, SD = .8) took turns with a prerecorded narrator reading aloud a popular children's novel, producing 6,092 oral reading responses over 1,093 book passages. Each oral reading response was evaluated by a speech engine that calculated words-correct-per-minute (WCPM). Mixed effect models revealed that text level differences, between-individual differences, and within-individual variations explained 13%, 56% and 32% of variance in WCPM, respectively. On average, children started reading the book at about 93 WCPM, and they improved by 2.26 WCPM for every 10,000 words of book reading. Random effects showed that the standard deviation of the growth rate was 1.85 WCPM, suggesting substantial individual difference in growth rate. Implications for reading instruction and assessment were discussed.

现有的研究表明，自我报告的印刷品接触与长期读写能力发展之间存在密切关系，但很少有研究表明，儿童在短期内阅读时，阅读技能会发生怎样的变化。在这项研究中，50名儿童（平均年龄9.7岁，SD = .8）轮流听预先录好的叙述者朗读一本流行儿童小说，在1093个章节中产生6092个口头阅读反应。每个口头阅读反应都由一个计算每分钟单词正确数（WCPM）的语音引擎进行评估。混合效应模型显示，文本水平差异、个体间差异和个体内差异分别解释了WCPM方差的13%、56%和32%。平均而言，孩子们开始阅读这本书的时候大约是93 WCPM，每阅读1万字，他们的WCPM就会提高2.26 WCPM。随机效应表明，生长速率的标准差为1.85 WCPM，表明生长速率存在较大的个体差异。讨论了对阅读教学和评估的影响。

{"title":"Monitoring Oral Reading Fluency From Electronic Shared Book Reading: Insights From a Full-Length Book Reading Study With Relay Reader®","authors":"Zuowei Wang, Beata Beigman Klebanov, Tenaha O'Reilly, John Sabatini","doi":"10.1002/ets2.12390","DOIUrl":"https://doi.org/10.1002/ets2.12390","url":null,"abstract":"Existing research reveals a robust relationship between self-reported print exposure and long-term literacy development, yet few studies have demonstrated how reading skills change as children read a book in the short term. In this study, 50 children (mean age 9.7 years, SD = .8) took turns with a prerecorded narrator reading aloud a popular children's novel, producing 6,092 oral reading responses over 1,093 book passages. Each oral reading response was evaluated by a speech engine that calculated words-correct-per-minute (WCPM). Mixed effect models revealed that text level differences, between-individual differences, and within-individual variations explained 13%, 56% and 32% of variance in WCPM, respectively. On average, children started reading the book at about 93 WCPM, and they improved by 2.26 WCPM for every 10,000 words of book reading. Random effects showed that the standard deviation of the growth rate was 1.85 WCPM, suggesting substantial individual difference in growth rate. Implications for reading instruction and assessment were discussed.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-11"},"PeriodicalIF":0.0,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12390","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Charting the Future of Assessments 展望评估的未来

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2024-11-21 DOI: 10.1002/ets2.12388

Patrick Kyllonen, Amit Sevak, Teresa Ober, Ikkyu Choi, Jesse Sparks, Daniel Fishtein

Assessment refers to a broad array of approaches for measuring or evaluating a person's (or group of persons') skills, behaviors, dispositions, or other attributes. Assessments range from standardized tests used in admissions, employee selection, licensure examinations, and domestic and international large-scale assessments of cognitive and behavioral skills to formative K–12 classroom curricular assessments. The various types of assessments are used for a wide variety of purposes, but they also have many common elements, such as standards for their reliability, validity, and fairness—even classroom assessments have standards.

In this paper, we argue and provide evidence for our belief that the future of assessment contains challenges but is promising. The challenges include risks associated with security and exposure of personal data, test score bias, and inappropriate test uses, all of which may be exacerbated by the growing infiltration of artificial intelligence (AI) into our lives. The promise is increasing opportunities for testing to help individuals achieve their education and career goals and contribute to well-being and overall quality of life. To help achieve this promise we focus on the evidence-based science of measurement in education and workplace learning, a theme throughout this paper.

评估是指测量或评估一个人（或一群人）的技能、行为、性格或其他属性的一系列广泛方法。评估范围从入学、员工选拔、执照考试、国内和国际大规模的认知和行为技能评估到形成性的K-12课堂课程评估。各种类型的评估用于各种各样的目的，但它们也有许多共同的元素，例如可靠性，有效性和公平性的标准-甚至课堂评估也有标准。在本文中，我们论证并提供证据证明我们的信念，即评估的未来包含挑战，但是有希望的。这些挑战包括与安全和个人数据暴露、考试分数偏差和不适当的考试使用相关的风险，所有这些都可能因人工智能（AI）日益渗透到我们的生活中而加剧。希望是增加测试的机会，以帮助个人实现他们的教育和职业目标，并为福祉和整体生活质量做出贡献。为了帮助实现这一承诺，我们将重点放在教育和工作场所学习中测量的循证科学上，这是贯穿本文的主题。

{"title":"Charting the Future of Assessments","authors":"Patrick Kyllonen, Amit Sevak, Teresa Ober, Ikkyu Choi, Jesse Sparks, Daniel Fishtein","doi":"10.1002/ets2.12388","DOIUrl":"https://doi.org/10.1002/ets2.12388","url":null,"abstract":"Assessment refers to a broad array of approaches for measuring or evaluating a person's (or group of persons') skills, behaviors, dispositions, or other attributes. Assessments range from standardized tests used in admissions, employee selection, licensure examinations, and domestic and international large-scale assessments of cognitive and behavioral skills to formative K–12 classroom curricular assessments. The various types of assessments are used for a wide variety of purposes, but they also have many common elements, such as standards for their reliability, validity, and fairness—even classroom assessments have standards.\u0000 \u0000 In this paper, we argue and provide evidence for our belief that the future of assessment contains challenges but is promising. The challenges include risks associated with security and exposure of personal data, test score bias, and inappropriate test uses, all of which may be exacerbated by the growing infiltration of artificial intelligence (AI) into our lives. The promise is increasing opportunities for testing to help individuals achieve their education and career goals and contribute to well-being and overall quality of life. To help achieve this promise we focus on the evidence-based science of measurement in education and workplace learning, a theme throughout this paper.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-62"},"PeriodicalIF":0.0,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12388","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142868801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Insights Into Critical Discussion: Designing a Computer-Supported Collaborative Space for Middle Schoolers 对批判性讨论的洞察：为中学生设计计算机支持的协作空间

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2024-10-27 DOI: 10.1002/ets2.12387

Yi Song, Ralph P. Ferretti, John Sabatini, Wenju Cui

Collaborative learning environments that support students' problem solving have been shown to promote better decision-making, greater academic achievement, and more reasonable argumentation about controversial issues. In this research, we developed a technology-based critical discussion platform to support middle school students' argumentation, with a focus on evidence-based reasoning and perspective taking. A feasibility study was conducted to examine the patterns of group interaction and individual students' contributions to the critical discussion and their perceptions of the critical discussion activity. We found that more students used text-based communications than audio, but students who used audio collaborated with each other more frequently. In addition, student engagement in argumentative discourse varied greatly across groups as well as individuals. At the end of the discussion, most groups provided a solution that integrated both sides of the controversial issue. Survey and interview results suggest an overall positive experience with this technology-supported critical discussion activity. Using the insights from our research, we develop a conceptual dialogue analysis framework that identifies relevant skills under the argumentation and collaboration dimensions. In this report, we discuss our design considerations, feasibility study results, and implications of engaging students in computer-supported collaborative argumentation.

支持学生解决问题的协作学习环境已被证明可以促进更好的决策，更大的学术成就，以及对有争议的问题进行更合理的论证。在本研究中，我们开发了一个基于技术的批判性讨论平台，以支持中学生的论证，重点是循证推理和观点采取。我们进行了可行性研究，以检验小组互动模式和学生个人对批判性讨论的贡献，以及他们对批判性讨论活动的看法。我们发现更多的学生使用基于文本的交流而不是音频，但是使用音频的学生更频繁地相互合作。此外，学生在辩论话语中的参与度在群体和个人之间都有很大差异。在讨论的最后，大多数小组提供了一个综合了争议问题双方的解决方案。调查和访谈结果表明，这种技术支持的批判性讨论活动总体上是积极的。利用我们研究的见解，我们开发了一个概念性对话分析框架，以确定论证和协作维度下的相关技能。在本报告中，我们讨论了我们的设计考虑，可行性研究结果，以及让学生参与计算机支持的协作论证的含义。

{"title":"Insights Into Critical Discussion: Designing a Computer-Supported Collaborative Space for Middle Schoolers","authors":"Yi Song, Ralph P. Ferretti, John Sabatini, Wenju Cui","doi":"10.1002/ets2.12387","DOIUrl":"https://doi.org/10.1002/ets2.12387","url":null,"abstract":"Collaborative learning environments that support students' problem solving have been shown to promote better decision-making, greater academic achievement, and more reasonable argumentation about controversial issues. In this research, we developed a technology-based critical discussion platform to support middle school students' argumentation, with a focus on evidence-based reasoning and perspective taking. A feasibility study was conducted to examine the patterns of group interaction and individual students' contributions to the critical discussion and their perceptions of the critical discussion activity. We found that more students used text-based communications than audio, but students who used audio collaborated with each other more frequently. In addition, student engagement in argumentative discourse varied greatly across groups as well as individuals. At the end of the discussion, most groups provided a solution that integrated both sides of the controversial issue. Survey and interview results suggest an overall positive experience with this technology-supported critical discussion activity. Using the insights from our research, we develop a conceptual dialogue analysis framework that identifies relevant skills under the argumentation and collaboration dimensions. In this report, we discuss our design considerations, feasibility study results, and implications of engaging students in computer-supported collaborative argumentation.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-20"},"PeriodicalIF":0.0,"publicationDate":"2024-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting the Impact of Remote Proctored At-Home Testing Using Propensity Score Weighting 使用倾向得分加权检测远程监护家庭测试的影响

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2024-10-27 DOI: 10.1002/ets2.12386

Jing Miao, Yi Cao, Michael E. Walker

Studies of test score comparability have been conducted at different stages in the history of testing to ensure that test results carry the same meaning regardless of test conditions. The expansion of at-home testing via remote proctoring sparked another round of interest. This study uses data from three licensure tests to assess potential mode effects associated with the dual option of on-site testing at test centers and at-home testing via remote proctoring. We generated propensity score weights to balance the two self-selected groups in order to detect the mode effect on the test outcomes. We also assessed the potential impact of omitted variables on the estimated mode effect. Results of the study indicate that the demographic compositions of the test takers are similar before and after the introduction of the RP option. Examinees under the two testing modes differ slightly on certain background variables. Once the group differences are adjusted by propensity score weighting, the estimated mode effects are small and nonsystematic across test titles overall. We note some variations across subgroups based on gender and race.

测试成绩可比性研究在测试历史的不同阶段进行，以确保测试结果在任何测试条件下都具有相同的含义。通过远程监考的家庭考试的扩大引发了另一轮兴趣。本研究使用三个执照测试的数据来评估在测试中心现场测试和通过远程监考在家测试的双重选择相关的潜在模式效应。我们生成倾向得分权重来平衡两个自选组，以检测模式对测试结果的影响。我们还评估了遗漏变量对估计模态效应的潜在影响。研究结果表明，在引入RP选项之前和之后，考生的人口组成相似。两种测试模式下的考生在某些背景变量上略有差异。一旦组间差异被倾向得分加权调整，估计模式的影响是小的和非系统性的测试标题总体。我们注意到基于性别和种族的子群体之间存在一些差异。

{"title":"Detecting the Impact of Remote Proctored At-Home Testing Using Propensity Score Weighting","authors":"Jing Miao, Yi Cao, Michael E. Walker","doi":"10.1002/ets2.12386","DOIUrl":"https://doi.org/10.1002/ets2.12386","url":null,"abstract":"Studies of test score comparability have been conducted at different stages in the history of testing to ensure that test results carry the same meaning regardless of test conditions. The expansion of at-home testing via remote proctoring sparked another round of interest. This study uses data from three licensure tests to assess potential mode effects associated with the dual option of on-site testing at test centers and at-home testing via remote proctoring. We generated propensity score weights to balance the two self-selected groups in order to detect the mode effect on the test outcomes. We also assessed the potential impact of omitted variables on the estimated mode effect. Results of the study indicate that the demographic compositions of the test takers are similar before and after the introduction of the RP option. Examinees under the two testing modes differ slightly on certain background variables. Once the group differences are adjusted by propensity score weighting, the estimated mode effects are small and nonsystematic across test titles overall. We note some variations across subgroups based on gender and race.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-32"},"PeriodicalIF":0.0,"publicationDate":"2024-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12386","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Exploratory Approach to Predicting Performance in an Online Electronics Collaborative Problem-Solving Task 在线电子协作问题解决任务绩效预测的探索性方法

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2024-10-24 DOI: 10.1002/ets2.12385

Jonathan Steinberg, Carol Forsyth, Jessica Andrews-Todd

In a study of 370 postsecondary students in electronics, engineering, and other science classes, we investigated collaborative problem-solving (CPS) skills that best predict performance at individual levels in an online electronics environment. The results showed that while monitoring was a consistent predictor across levels, other skills such as executing, sharing information, planning, and maintaining communication each predicted individual performance at one or more levels of the task. The availability of background data on students' content classes and associated content knowledge to analyze the model results can help identify possible cues for instructors across domains to help students improve specific CPS skills to achieve high performance in activities conducted in collaborative learning environments.

在一项针对370名电子、工程和其他科学类的高等教育学生的研究中，我们调查了在线电子环境中最能预测个人水平表现的协作解决问题（CPS）技能。结果表明，虽然监控是跨级别的一致预测因素，但其他技能（如执行、共享信息、计划和维护沟通）各自预测任务的一个或多个级别上的个人表现。学生内容课的背景数据和相关内容知识的可用性可以帮助教师识别跨领域的可能线索，以帮助学生提高特定的CPS技能，从而在协作学习环境中进行的活动中取得高绩效。

引用次数: 0

Detecting Test-Taking Engagement in Changing Test Contexts 在变化的考试环境中检测考生参与度

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2024-10-03 DOI: 10.1002/ets2.12384

Blair Lehman, Jesse R. Sparks, Jonathan Steinberg

Over the last 20 years, many methods have been proposed to use process data (e.g., response time) to detect changes in engagement during the test-taking process. However, many of these methods were developed and evaluated in highly similar testing contexts: 30 or more single-select multiple-choice items presented in a linear, fixed sequence in which an item must be answered before progressing to the next item. However, this testing context becomes less and less representative of testing contexts in general as the affordances of technology are leveraged to provide more diverse and innovative testing experiences. The 2019 National Assessment of Educational Progress (NAEP) mathematics administration for grades 8 and 12 testing context represents an example use case that differed significantly from assessments that were typically used in previous research on test-taking engagement (e.g., number of items, item format, navigation). Thus, we leveraged this use case to re-evaluate the utility of an existing engagement detection method: normative threshold method. We decomposed the normative threshold method to evaluate its alignment with this use case and then evaluated 25 variations of this threshold-setting method with previously established evaluation criteria. Our findings revealed that this critical analysis of the threshold-setting method's alignment with the NAEP testing context could be used to identify the most appropriate variation of this method for this use case. We discuss the broader implications for engagement detection as testing contexts continue to evolve.

在过去的20年里，人们提出了许多方法来使用过程数据（例如，反应时间）来检测考试过程中参与度的变化。然而，这些方法中的许多都是在高度相似的测试环境中开发和评估的：30个或更多的单选题选择题以线性固定顺序呈现，其中必须回答一个问题才能进入下一个问题。然而，随着技术的支持被用来提供更加多样化和创新的测试体验，这种测试环境变得越来越不能代表一般的测试环境。2019年全国教育进步评估（NAEP） 8年级和12年级数学管理测试背景是一个示例用例，与之前关于考试参与度的研究中通常使用的评估（例如，项目数量、项目格式、导航）有很大不同。因此，我们利用这个用例来重新评估现有敬业度检测方法的效用：规范阈值方法。我们分解了规范的阈值方法，以评估它与这个用例的一致性，然后用先前建立的评估标准评估这个阈值设置方法的25个变体。我们的研究结果表明，阈值设置方法与NAEP测试上下文的一致性的关键分析可以用于确定该方法对该用例的最合适的变化。随着测试环境的不断发展，我们将讨论敬业度检测的更广泛含义。

{"title":"Detecting Test-Taking Engagement in Changing Test Contexts","authors":"Blair Lehman, Jesse R. Sparks, Jonathan Steinberg","doi":"10.1002/ets2.12384","DOIUrl":"https://doi.org/10.1002/ets2.12384","url":null,"abstract":"Over the last 20 years, many methods have been proposed to use process data (e.g., response time) to detect changes in engagement during the test-taking process. However, many of these methods were developed and evaluated in highly similar testing contexts: 30 or more single-select multiple-choice items presented in a linear, fixed sequence in which an item must be answered before progressing to the next item. However, this testing context becomes less and less representative of testing contexts in general as the affordances of technology are leveraged to provide more diverse and innovative testing experiences. The 2019 National Assessment of Educational Progress (NAEP) mathematics administration for grades 8 and 12 testing context represents an example use case that differed significantly from assessments that were typically used in previous research on test-taking engagement (e.g., number of items, item format, navigation). Thus, we leveraged this use case to re-evaluate the utility of an existing engagement detection method: normative threshold method. We decomposed the normative threshold method to evaluate its alignment with this use case and then evaluated 25 variations of this threshold-setting method with previously established evaluation criteria. Our findings revealed that this critical analysis of the threshold-setting method's alignment with the NAEP testing context could be used to identify the most appropriate variation of this method for this use case. We discuss the broader implications for engagement detection as testing contexts continue to evolve.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-15"},"PeriodicalIF":0.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12384","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142868039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AutoESD: An Automated System for Detecting Nonauthentic Texts for High-Stakes Writing Tests AutoESD：用于检测高风险写作测试的非真实文本的自动系统

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2024-08-18 DOI: 10.1002/ets2.12383

Ikkyu Choi, Jiangang Hao, Chen Li, Michael Fauss, Jakub Novák

A frequently encountered security issue in writing tests is nonauthentic text submission: Test takers submit texts that are not their own but rather are copies of texts prepared by someone else. In this report, we propose AutoESD, a human-in-the-loop and automated system to detect nonauthentic texts for a large-scale writing tests, and report its performance on an operational data set. The AutoESD system utilizes multiple automated text similarity measures to identify suspect texts and provides an analytics-enhanced web application to help human experts review the identified texts. To evaluate the performance of AutoESD, we obtained its similarity measures on TOEFL iBT® test writing responses collected from multiple remote administrations and examined their distributions. The results were highly encouraging in that the distributional characteristics of AutoESD similarity measures were effective in identifying suspect texts and the measures could be computed quickly without affecting the operational score turnaround timeline.

写作测试中经常遇到的一个安全问题是提交的文本不真实：应试者提交的文本不是他们自己的，而是其他人准备的文本副本。在本报告中，我们提出了 AutoESD 系统，这是一个由人工操作的自动系统，用于检测大规模写作测试中的非真实文本，并报告了该系统在运行数据集上的表现。AutoESD 系统利用多种自动文本相似性测量方法来识别可疑文本，并提供一个分析增强型网络应用程序来帮助人类专家审查已识别的文本。为了评估 AutoESD 的性能，我们对从多个远程考试中收集的 TOEFL iBT® 考试写作答案进行了相似性测量，并检查了它们的分布情况。结果非常令人鼓舞，AutoESD 相似性度量的分布特征能有效识别可疑文本，而且这些度量可以快速计算，不会影响操作分数的周转时间。

引用次数: 0

Estimating Reliability for Tests With One Constructed-Response Item in a Section 估计在一个部分中只有一个构造反应项目的测试的信度

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2024-06-24 DOI: 10.1002/ets2.12382

Yanxuan Qu, Sandip Sinharay

The goal of this paper is to find better ways to estimate the internal consistency reliability of scores on tests with a specific type of design that are often encountered in practice: tests with constructed-response items clustered into sections that are not parallel or tau-equivalent, and one of the sections has only one item. To estimate the reliability of scores on this kind of test, we propose a two-step approach (denoted as CA_STR) that first estimates the reliability of scores on the section with a single item using the correction for attenuation method and then estimates the reliability of scores on the whole test using the stratified coefficient alpha. We compared the CA_STR method with three other reliability estimation approaches under various conditions using both real and simulated data. We found that overall, the CA_STR method performed the best and it was easy to implement.

本文的目的是找到更好的方法来估计具有特定设计类型的测验分数的内部一致性信度，这种类型的测验在实践中经常遇到：测验中的建构-反应项目被集中到不平行或不等距的部分，其中一个部分只有一个项目。为了估计这类测验分数的信度，我们提出了一种两步法（记为 CA_STR），即首先用衰减校正法估计单个项目部分分数的信度，然后用分层系数α估计整个测验分数的信度。我们利用真实数据和模拟数据，比较了 CA_STR 方法和其他三种信度估计方法在不同条件下的效果。我们发现，总体而言，CA_STR 方法的性能最好，而且易于实施。

引用次数: 0

Investigating Fairness Claims for a General-Purposes Assessment of English Proficiency for the International Workplace: Do Full-Time Employees Have an Unfair Advantage Over Full-Time Students? 调查国际工作场所通用英语水平评估的公平性要求：全职员工比全日制学生有不公平的优势吗？

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2024-05-30 DOI: 10.1002/ets2.12380

Jonathan Schmidgall, Yan Huo, Jaime Cid, Youhua Wei

The principle of fairness in testing traditionally involves an assertion about the absence of bias, or that measurement should be impartial (i.e., not provide an unfair advantage or disadvantage), across groups of test takers. In more general-purposes language testing, a test taker's background knowledge is not typically considered relevant to the measurement of language proficiency; consequently, if there are systematic differences in background knowledge between groups of test takers this background knowledge should not provide an unfair advantage or disadvantage. As a general-purposes assessment of English for everyday life and the international workplace, the TOEIC® Listening and Reading test is designed to assess the listening and reading comprehension skills of second language (L2) users of English. In this study, we investigated whether a group of test takers with more workplace experience (full-time employees) have an unfair advantage over test takers with less workplace experience (full-time students). We conducted DIF analysis using nine forms of the test (1,800 items) and flagged 18 items (1.0%) for statistical differential functioning. An expert panel reviewed the items and concluded that none of the items could be clearly identified as biased in favor of employed (or student) test takers. Follow-up analyses using score equity assessment found that test scores do not unfairly advantage fulltime employed (versus student) test takers. Finally, we performed a content review using two expert panels that led to examples of how workplace-oriented content is incorporated into test items without disadvantaging full-time students (versus full-time employees). The results of these analyses provide support for claims about the impartiality (or fairness) of TOEIC Listening and Reading test scores for postsecondary test takers and add to current research on the role of background knowledge and fairness for more general-purposes language assessments.

传统上，测试的公平原则包括对没有偏见的断言，或者测量应该是公正的（即，不提供不公平的优势或劣势），在不同的考生群体中。在更通用的语言测试中，测试者的背景知识通常被认为与语言能力的测量无关；因此，如果两组考生在背景知识上存在系统性差异，这种背景知识不应构成不公平的优势或劣势。托业®听力和阅读测试是对日常生活和国际工作场所英语的通用评估，旨在评估第二语言（L2）英语使用者的听力和阅读理解能力。在这项研究中，我们调查了一组有更多工作经验的应试者（全职员工）是否比工作经验较少的应试者（全日制学生）有不公平的优势。我们使用9种形式的测试（1800个项目）进行了DIF分析，并标记了18个项目（1.0%）用于统计差异功能。一个专家小组审查了这些项目，得出的结论是，没有一个项目可以明确地确定为偏向于就业（或学生）考生。使用分数公平评估的后续分析发现，考试分数并不会对全职（相对于学生）考生造成不公平的优势。最后，我们使用两个专家小组进行了内容审查，这些小组给出了如何将面向工作场所的内容纳入测试项目而不会对全日制学生（相对于全职员工）不利的例子。这些分析的结果为托业听力和阅读考试成绩对高等教育考生的公正性（或公平性）的说法提供了支持，并为当前关于背景知识和公平性在更通用的语言评估中的作用的研究提供了补充。

{"title":"Investigating Fairness Claims for a General-Purposes Assessment of English Proficiency for the International Workplace: Do Full-Time Employees Have an Unfair Advantage Over Full-Time Students?","authors":"Jonathan Schmidgall, Yan Huo, Jaime Cid, Youhua Wei","doi":"10.1002/ets2.12380","DOIUrl":"https://doi.org/10.1002/ets2.12380","url":null,"abstract":"The principle of fairness in testing traditionally involves an assertion about the absence of bias, or that measurement should be impartial (i.e., not provide an unfair advantage or disadvantage), across groups of test takers. In more general-purposes language testing, a test taker's background knowledge is not typically considered relevant to the measurement of language proficiency; consequently, if there are systematic differences in background knowledge between groups of test takers this background knowledge should not provide an unfair advantage or disadvantage. As a general-purposes assessment of English for everyday life and the international workplace, the TOEIC® Listening and Reading test is designed to assess the listening and reading comprehension skills of second language (L2) users of English. In this study, we investigated whether a group of test takers with more workplace experience (full-time employees) have an unfair advantage over test takers with less workplace experience (full-time students). We conducted DIF analysis using nine forms of the test (1,800 items) and flagged 18 items (1.0%) for statistical differential functioning. An expert panel reviewed the items and concluded that none of the items could be clearly identified as biased in favor of employed (or student) test takers. Follow-up analyses using score equity assessment found that test scores do not unfairly advantage fulltime employed (versus student) test takers. Finally, we performed a content review using two expert panels that led to examples of how workplace-oriented content is incorporated into test items without disadvantaging full-time students (versus full-time employees). The results of these analyses provide support for claims about the impartiality (or fairness) of TOEIC Listening and Reading test scores for postsecondary test takers and add to current research on the role of background knowledge and fairness for more general-purposes language assessments.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-20"},"PeriodicalIF":0.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12380","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0