ETS Research Report Series最新文献_第5页

The Relationship Between Praxis® Core Academic Skills for Educators Test and Praxis® Subject Assessment Scores: Validity Coefficients and Differential Prediction Analysis by Race/Ethnicity Praxis®教育工作者核心学术技能测试与Praxis™学科评估分数之间的关系：有效性系数和种族/民族差异预测分析

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2021-10-24 DOI: 10.1002/ets2.12336

Heather Buzick

The Praxis® Core Academic Skills for Educators (Core) tests are used in the teacher preparation program admissions process and as part of initial teacher licensure. The purpose of this study was to estimate the relationship between scores on Praxis Core tests and Praxis Subject Assessments and to test for differential prediction by race and ethnicity. Data were drawn from operational test taker records over a period of nearly 5 years. The analysis suggests that Praxis Core tests of reading, writing, and mathematics are moderate predictors of scores from 11 high-volume Praxis Subject Assessments. There was little evidence of differential prediction across White, Black or African American, and Hispanic or Latinx test takers.

Praxis®核心教育学术技能(Core)测试用于教师准备计划招生过程，并作为初始教师执照的一部分。本研究的目的是估计Praxis核心测试和Praxis主题评估得分之间的关系，并测试种族和民族的差异预测。数据取自近5年的测试记录。分析表明，阅读、写作和数学的Praxis核心测试是11个大容量Praxis学科评估分数的适度预测指标。几乎没有证据表明白人、黑人或非裔美国人、西班牙裔或拉丁裔考生的预测存在差异。

引用次数: 0

Methods for Measuring Speededness: Chronology, Classification, and Ensuing Research and Development 测量速度的方法：年代、分类和确保研发

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2021-10-24 DOI: 10.1002/ets2.12337

Dakota W. Cintron

The extent to which a test's time limit alters a test taker's performance is known as speededness. The manifestation of speededness, or speeded behavior on a test, can be in the form of random guessing, leaving a substantial proportion of test items unanswered, or rushed test-taking behavior in general. Speeded responses do not depend solely on a test taker's ability and are therefore not appropriate for traditional item response theory. The literature on measuring the extent of speededness on a test is extensive and dates back over a half-century. Yet, simple rules of thumb for measuring speededness, dating back until at least Swineford in 1949, are still in operation—for example, 80% of the candidates reach the last item. The purpose of this research report is to provide a chronology and classification of methods for measuring speededness and to discuss ensuing research and development in measuring speededness.

考试时间限制改变考生成绩的程度被称为速度。在考试中，速度或速度行为的表现形式可以是随机猜测，留下大量未回答的测试项目，或者通常是匆忙的考试行为。快速反应并不完全取决于测试者的能力，因此不适合传统的项目反应理论。关于在测试中测量速度程度的文献很多，可以追溯到半个多世纪以前。然而，测量速度的简单经验法则，至少可以追溯到1949年的斯威福德，仍然在使用，例如，80%的候选物体到达了最后一项。本研究报告的目的是提供测量速度的方法的年表和分类，并讨论随后在测量速度的研究和发展。

引用次数: 4

Symmetric Least Squares Estimates of Functional Relationships 函数关系的对称最小二乘估计

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2021-10-14 DOI: 10.1002/ets2.12331

Michael T. Kane

Ordinary least squares (OLS) regression provides optimal linear predictions of a dependent variable, y, given an independent variable, x, but OLS regressions are not symmetric or reversible. In order to get optimal linear predictions of x given y, a separate OLS regression in that direction would be needed. This report provides a least squares derivation of the geometric mean (GM) regression line, which is symmetric and reversible, as the line that minimizes a weighted sum of the mean squared errors for y, given x, and for x, given y. It is shown that the GM regression line is symmetric and predicts equally well (or poorly, depending on the absolute value of r_xy) in both directions. The errors of prediction for the GM line are, naturally, larger for the predictions of both x and y than those for the two OLS equations, each of which is specifically optimized for prediction in one direction, but for high values of , the difference is not large. The GM line has previously been derived as a special case of principal-components analysis and gets its name from the fact that its slope is equal to the geometric mean of the slopes of the OLS regressions of y on x and x on y.

给定自变量x，普通最小二乘(OLS)回归提供了因变量y的最佳线性预测，但OLS回归不是对称的或可逆的。为了在给定y的情况下获得x的最佳线性预测，需要在该方向上进行单独的OLS回归。本报告提供了几何平均(GM)回归线的最小二乘推导，这是对称的和可逆的，因为这条线可以最小化给定x的y和给定y的x的均方误差的加权和。结果表明，GM回归线是对称的，并且在两个方向上预测同样好(或差，取决于rxy的绝对值)。对于x和y的预测，GM线的预测误差自然比两个OLS方程的预测误差大，其中每个方程都是专门针对一个方向进行优化的预测，但对于高值，差异并不大。GM线以前是作为主成分分析的一种特殊情况推导出来的，它的斜率等于y对x和x对y的OLS回归斜率的几何平均值，因此得名。

{"title":"Symmetric Least Squares Estimates of Functional Relationships","authors":"Michael T. Kane","doi":"10.1002/ets2.12331","DOIUrl":"10.1002/ets2.12331","url":null,"abstract":"Ordinary least squares (OLS) regression provides optimal linear predictions of a dependent variable, y, given an independent variable, x, but OLS regressions are not symmetric or reversible. In order to get optimal linear predictions of x given y, a separate OLS regression in that direction would be needed. This report provides a least squares derivation of the geometric mean (GM) regression line, which is symmetric and reversible, as the line that minimizes a weighted sum of the mean squared errors for y, given x, and for x, given y. It is shown that the GM regression line is symmetric and predicts equally well (or poorly, depending on the absolute value of rxy) in both directions. The errors of prediction for the GM line are, naturally, larger for the predictions of both x and y than those for the two OLS equations, each of which is specifically optimized for prediction in one direction, but for high values of , the difference is not large. The GM line has previously been derived as a special case of principal-components analysis and gets its name from the fact that its slope is equal to the geometric mean of the slopes of the OLS regressions of y on x and x on y.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-14"},"PeriodicalIF":0.0,"publicationDate":"2021-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12331","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44752025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Making the Case for the Quality and Use of a New Language Proficiency Assessment: Validity Argument for the Redesigned TOEIC Bridge® Tests 为新语言能力评估的质量和使用辩护：重新设计的TOEIC Bridge®测试的有效性论证

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2021-09-24 DOI: 10.1002/ets2.12335

Jonathan Schmidgall, Jaime Cid, Elizabeth Carter Grissom, Lucy Li

The redesigned TOEIC Bridge® tests were designed to evaluate test takers' English listening, reading, speaking, and writing skills in the context of everyday adult life. In this paper, we summarize the initial validity argument that supports the use of test scores for the purpose of selection, placement, and evaluation of a test taker's English skills. The validity argument consists of four major claims that provide a coherent narrative about the measurement quality and intended uses of test scores. Each major claim in the validity argument is supported by more specific claims and a summary of supporting evidence. By considering the claims and supporting evidence presented in the validity argument, readers should be able to better evaluate whether the TOEIC Bridge tests are appropriate for their situation.

重新设计的TOEIC Bridge®考试旨在评估考生在日常成人生活中的英语听力、阅读、口语和写作技能。在本文中，我们总结了最初的有效性论点，该论点支持使用考试成绩作为选择、安置和评估考生英语技能的目的。有效性论证由四个主要主张组成，这些主张提供了关于测试分数的测量质量和预期用途的连贯叙述。有效性论证中的每一个主要主张都有更具体的主张和支持性证据的摘要来支持。通过考虑有效性论证中提出的主张和支持证据，读者应该能够更好地评估托业桥梁考试是否适合他们的情况。

引用次数: 0

Beyond Nuclear Families: Development of Inclusive Student Socioeconomic Status Survey Questions 超越核心家庭:包容性学生社会经济地位调查问题的发展

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2021-09-15 DOI: 10.1002/ets2.12332

Ryan Whorton, Debby Almonte, Darby Steiger, Cynthia Robins, Christopher Gentile, Jonas Bertling

Social changes have resulted in an increase of students living in households that do not include both a mother and a father, reducing the efficacy of common survey questionnaire approaches to measuring student socioeconomic status (SES). This paper presents two studies conducted to develop and test a new, more inclusive set of student SES items appropriate for students from a range of household types. In the first study, we held group interviews with 57 students in Grades 4, 8, and 12 who lived in four nontraditional household types. The study goal was, first, to understand how students thought about their household members and learn what they knew about the educational background and employment status of their caregivers and, second, to develop draft items based on these findings. In the second study, we held 51 individual cognitive interviews with a similar sample to evaluate draft item clarity and function. We found that although students may live with a broad range of family members and other adults, they understood the term caregiver to refer to a person who provides resources and support. Students found it easier to answer items when the items included the titles of their caregivers. Our results demonstrate that a customizable approach to measuring student SES allows more students to report information about their caregivers than the current standard of asking about mothers and fathers. We provide recommendations for student SES measurement and potential next steps for research on this topic.

社会变化导致越来越多的学生生活在没有父母的家庭中，这降低了衡量学生社会经济地位(SES)的普通调查问卷方法的有效性。本文提出了两项研究，旨在开发和测试一套新的，更具包容性的学生SES项目，适用于来自各种家庭类型的学生。在第一项研究中，我们对生活在四种非传统家庭类型中的57名4年级、8年级和12年级的学生进行了小组访谈。研究的目标是，首先，了解学生如何看待他们的家庭成员，并了解他们对照顾他们的人的教育背景和就业状况的了解，其次，根据这些发现制定草案项目。在第二项研究中，我们用类似的样本进行了51个个人认知访谈，以评估草稿项目的清晰度和功能。我们发现，尽管学生可能与各种各样的家庭成员和其他成年人生活在一起，但他们对“照顾者”一词的理解是指提供资源和支持的人。学生们发现，当题目中包含照顾者的头衔时，回答起来更容易。我们的研究结果表明，与目前询问父母的标准相比，一种可定制的测量学生SES的方法可以让更多的学生报告关于他们照顾者的信息。我们为学生的SES测量提供了建议，并为该主题的研究提供了潜在的下一步。

{"title":"Beyond Nuclear Families: Development of Inclusive Student Socioeconomic Status Survey Questions","authors":"Ryan Whorton, Debby Almonte, Darby Steiger, Cynthia Robins, Christopher Gentile, Jonas Bertling","doi":"10.1002/ets2.12332","DOIUrl":"10.1002/ets2.12332","url":null,"abstract":"Social changes have resulted in an increase of students living in households that do not include both a mother and a father, reducing the efficacy of common survey questionnaire approaches to measuring student socioeconomic status (SES). This paper presents two studies conducted to develop and test a new, more inclusive set of student SES items appropriate for students from a range of household types. In the first study, we held group interviews with 57 students in Grades 4, 8, and 12 who lived in four nontraditional household types. The study goal was, first, to understand how students thought about their household members and learn what they knew about the educational background and employment status of their caregivers and, second, to develop draft items based on these findings. In the second study, we held 51 individual cognitive interviews with a similar sample to evaluate draft item clarity and function. We found that although students may live with a broad range of family members and other adults, they understood the term caregiver to refer to a person who provides resources and support. Students found it easier to answer items when the items included the titles of their caregivers. Our results demonstrate that a customizable approach to measuring student SES allows more students to report information about their caregivers than the current standard of asking about mothers and fathers. We provide recommendations for student SES measurement and potential next steps for research on this topic.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-25"},"PeriodicalIF":0.0,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12332","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47390457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

New Validity Evidence on the TOEFL Junior® Standard Test as a Measure of Progress 作为进步衡量标准的TOEFL Junior®标准考试的新有效性证据

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2021-09-15 DOI: 10.1002/ets2.12334

Irshat Madyarov, Vahe Movsisyan, Habet Madoyan, Irena Galikyan, Rubina Gasparyan

The TOEFL Junior® Standard test is a tool for measuring the English language skills of students ages 11+ who learn English as an additional language. It is a paper-based multiple-choice test and measures proficiency in three sections: listening, form and meaning, and reading. To date, empirical evidence provides some support for the construct validity of the TOEFL Junior Standard test as a measure of progress. Although this evidence is based on test scores from multiple countries with diverse instructional environments, it does not account for students' instructional experiences. The present paper aims to provide additional evidence by examining the TOEFL Junior Standard test as a progress measure within the same instructional setting. The study took place in an after-school English program in Armenia, a non-English-speaking country. A total of 154 adolescents took the TOEFL Junior Standard test three times with different test forms at the intervals of 10 and then 20 instructional weeks (a total of 30 weeks). The difference in differences (DID) analysis shows that TOEFL Junior is sensitive to learning gains within 20 instructional hours per 10 weeks among A1–A2 level learners, according to the Common European Frame of Reference (CEFR) scale. However, the data did not provide support for this sensitivity among B1–B2 level learners even though their instructional time was twice as long. Although this methodology offers an improved control over the students' instructional experiences, it also delimits the results to a specific after-school program and comes with a set of other limitations.

TOEFL Junior®标准考试是衡量11岁以上将英语作为附加语言学习的学生的英语语言技能的工具。这是一个基于纸的选择题测试，测试三个部分的熟练程度:听力，形式和意义，以及阅读。迄今为止，经验证据为托福初级标准测试作为进步衡量标准的结构效度提供了一些支持。尽管这一证据是基于多个国家不同教学环境的考试成绩，但它并不能解释学生的教学经历。本文旨在提供额外的证据，通过检查托福初级标准考试作为一个进步措施，在相同的教学设置。这项研究是在非英语国家亚美尼亚的一个课后英语项目中进行的。154名青少年参加了三次不同形式的托福初级标准考试，间隔10周和20周(共30周)。差异中的差异(DID)分析表明，根据欧洲共同参考框架(CEFR)量表，TOEFL Junior对A1-A2级学习者在每10周20个教学小时内的学习收益非常敏感。然而，在B1-B2水平的学习者中，即使他们的教学时间是前者的两倍，数据也不支持这种敏感性。尽管这种方法可以更好地控制学生的教学经历，但它也将结果限定在特定的课外项目上，并带来了一系列其他限制。

{"title":"New Validity Evidence on the TOEFL Junior® Standard Test as a Measure of Progress","authors":"Irshat Madyarov, Vahe Movsisyan, Habet Madoyan, Irena Galikyan, Rubina Gasparyan","doi":"10.1002/ets2.12334","DOIUrl":"10.1002/ets2.12334","url":null,"abstract":"The TOEFL Junior® Standard test is a tool for measuring the English language skills of students ages 11+ who learn English as an additional language. It is a paper-based multiple-choice test and measures proficiency in three sections: listening, form and meaning, and reading. To date, empirical evidence provides some support for the construct validity of the TOEFL Junior Standard test as a measure of progress. Although this evidence is based on test scores from multiple countries with diverse instructional environments, it does not account for students' instructional experiences. The present paper aims to provide additional evidence by examining the TOEFL Junior Standard test as a progress measure within the same instructional setting. The study took place in an after-school English program in Armenia, a non-English-speaking country. A total of 154 adolescents took the TOEFL Junior Standard test three times with different test forms at the intervals of 10 and then 20 instructional weeks (a total of 30 weeks). The difference in differences (DID) analysis shows that TOEFL Junior is sensitive to learning gains within 20 instructional hours per 10 weeks among A1–A2 level learners, according to the Common European Frame of Reference (CEFR) scale. However, the data did not provide support for this sensitivity among B1–B2 level learners even though their instructional time was twice as long. Although this methodology offers an improved control over the students' instructional experiences, it also delimits the results to a specific after-school program and comes with a set of other limitations.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12334","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49186382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring Diversity in Graduate and Professional School Applications 探索研究生和专业学校申请的多样性

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2021-09-13 DOI: 10.1002/ets2.12330

Steven Holtzman, Tamara Minott, Nimmi Devasia, Dessi Kirova, David Klieger

Gathering a diverse student body is important for institutions of higher education (IHEs) at the graduate/professional level. However, it is impossible to select a diverse student body from a homogenous group of candidates. Thus the aim of this study is to discover the extent to which diversity goals in admissions are precluded by the lack of diversity in the applicant pool. To explore this, the proportion of score reports sent to the 150 largest graduate/professional schools and sent for each major was compared, from each gender, race, and socioeconomic status (SES) grouping, to proportions from the overall applicant pool of graduate/professional students. Additionally, differences in the distance graduate/professional school applicants are willing to consider traveling by gender, race/ethnicity, and SES were investigated. Results show that differences exist in the gender, race, and SES distributions for score reports sent to different schools and for different majors as well as in the distance an applicant is willing to consider traveling for graduate/professional school. The patterns found for gender, racial, and socioeconomic diversity provide possibilities for researchers to work further together with graduate/professional schools to tackle the important challenge of increasing diversity in graduate/professional education.

对于研究生/专业水平的高等教育机构来说，聚集多样化的学生群体是很重要的。然而，从单一的考生群体中选拔出多样化的学生群体是不可能的。因此，本研究的目的是发现申请人群体中缺乏多样性在多大程度上阻碍了招生中的多样性目标。为了探讨这一点，我们比较了发送给150所最大的研究生/专业学院和每个专业的分数报告的比例，从每个性别、种族和社会经济地位(SES)分组，到研究生/专业学生的总体申请人数的比例。此外，根据性别、种族/民族和社会经济地位，调查了研究生/专业学校申请人愿意考虑旅行的距离差异。结果表明，不同学校和不同专业的成绩报告在性别、种族和社会经济地位分布上存在差异，申请人愿意考虑到研究生/专业学校旅行的距离也存在差异。性别、种族和社会经济多样性的模式为研究人员提供了与研究生/专业学校进一步合作的可能性，以应对研究生/专业教育中增加多样性的重要挑战。

{"title":"Exploring Diversity in Graduate and Professional School Applications","authors":"Steven Holtzman, Tamara Minott, Nimmi Devasia, Dessi Kirova, David Klieger","doi":"10.1002/ets2.12330","DOIUrl":"10.1002/ets2.12330","url":null,"abstract":"Gathering a diverse student body is important for institutions of higher education (IHEs) at the graduate/professional level. However, it is impossible to select a diverse student body from a homogenous group of candidates. Thus the aim of this study is to discover the extent to which diversity goals in admissions are precluded by the lack of diversity in the applicant pool. To explore this, the proportion of score reports sent to the 150 largest graduate/professional schools and sent for each major was compared, from each gender, race, and socioeconomic status (SES) grouping, to proportions from the overall applicant pool of graduate/professional students. Additionally, differences in the distance graduate/professional school applicants are willing to consider traveling by gender, race/ethnicity, and SES were investigated. Results show that differences exist in the gender, race, and SES distributions for score reports sent to different schools and for different majors as well as in the distance an applicant is willing to consider traveling for graduate/professional school. The patterns found for gender, racial, and socioeconomic diversity provide possibilities for researchers to work further together with graduate/professional schools to tackle the important challenge of increasing diversity in graduate/professional education.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-19"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12330","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47624551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Comparison of Spoken and Written Language Use in Traditional and Technology-Mediated Learning Environments 传统和技术媒介学习环境中口语和书面语使用的比较

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2021-08-17 DOI: 10.1002/ets2.12329

Kristopher Kyle, Ann Tai Choe, Masaki Eguchi, Geoff LaFlair, Nicole Ziegler

A key piece of a validity argument for a language assessment tool is clear overlap between assessment tasks and the target language use (TLU) domain (i.e., the domain description inference). The TOEFL 2000 Spoken and Written Academic Language (T2K-SWAL) corpus, which represents a variety of academic registers and disciplines in traditional learning environments (e.g., lectures, office hours, textbooks, course packs), has served as an important foundation for the TOEFL iBT® test's domain description inference for more than 15 years. There are, however, signs that the characteristics of the registers that students encounter may be changing. Increasingly, typical university courses include technology-mediated learning environments (TMLEs), such as those represented by course management software and other online educational tools. To ensure that the characteristics of TOEFL iBT test tasks continue to align with the TLU domain, it is important to analyze the registers that are typically encountered in TMLEs. In this study, we address this issue by collecting a relatively large (4.5 million words) corpus of spoken and written TMLE registers across the six primary disciplines represented in T2K-SWAL. This corpus was subsequently tagged for a wide variety of linguistic features, and a multidimensional analysis was conducted to compare and contrast written and spoken language in TMLE and T2K-SWAL. The results indicate that although some similarities exist across spoken and written texts in traditional learning environments and TMLEs, language use also differs across learning environments (and modes) with regard to key linguistic dimensions.

语言评估工具有效性论证的一个关键部分是评估任务和目标语言使用(TLU)领域(即领域描述推理)之间的明显重叠。托福2000口语和书面学术语言(T2K-SWAL)语料库代表了传统学习环境(例如，讲座，办公时间，教科书，课程包)中的各种学术注册和学科，已作为托福iBT®考试领域描述推理的重要基础超过15年。然而，有迹象表明，学生们遇到的注册表的特征可能正在发生变化。典型的大学课程越来越多地包括以技术为媒介的学习环境(TMLEs)，例如以课程管理软件和其他在线教育工具为代表的学习环境。为了确保托福网考任务的特点继续与TLU领域保持一致，分析tml中通常遇到的寄存器是很重要的。在本研究中，我们通过收集T2K-SWAL所代表的六个主要学科中相对较大(450万字)的口头和书面TMLE语域语料库来解决这一问题。随后，该语料库被标记为各种各样的语言特征，并进行了多维分析，以比较和对比TMLE和T2K-SWAL的书面和口头语言。结果表明，尽管传统学习环境和TMLEs中口语和书面语存在一些相似性，但在关键的语言维度上，语言使用在不同的学习环境(和模式)中也存在差异。

{"title":"A Comparison of Spoken and Written Language Use in Traditional and Technology-Mediated Learning Environments","authors":"Kristopher Kyle, Ann Tai Choe, Masaki Eguchi, Geoff LaFlair, Nicole Ziegler","doi":"10.1002/ets2.12329","DOIUrl":"10.1002/ets2.12329","url":null,"abstract":"A key piece of a validity argument for a language assessment tool is clear overlap between assessment tasks and the target language use (TLU) domain (i.e., the domain description inference). The TOEFL 2000 Spoken and Written Academic Language (T2K-SWAL) corpus, which represents a variety of academic registers and disciplines in traditional learning environments (e.g., lectures, office hours, textbooks, course packs), has served as an important foundation for the TOEFL iBT® test's domain description inference for more than 15 years. There are, however, signs that the characteristics of the registers that students encounter may be changing. Increasingly, typical university courses include technology-mediated learning environments (TMLEs), such as those represented by course management software and other online educational tools. To ensure that the characteristics of TOEFL iBT test tasks continue to align with the TLU domain, it is important to analyze the registers that are typically encountered in TMLEs. In this study, we address this issue by collecting a relatively large (4.5 million words) corpus of spoken and written TMLE registers across the six primary disciplines represented in T2K-SWAL. This corpus was subsequently tagged for a wide variety of linguistic features, and a multidimensional analysis was conducted to compare and contrast written and spoken language in TMLE and T2K-SWAL. The results indicate that although some similarities exist across spoken and written texts in traditional learning environments and TMLEs, language use also differs across learning environments (and modes) with regard to key linguistic dimensions.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-29"},"PeriodicalIF":0.0,"publicationDate":"2021-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12329","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46456568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Comparisons Among Approaches to Link Tests Using Random Samples Selected Under Suboptimal Conditions 次优条件下随机样本链接测试方法的比较

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2021-08-11 DOI: 10.1002/ets2.12328

Sooyeon Kim, Michael E. Walker

Equating the scores from different forms of a test requires collecting data that link the forms. Problems arise when the test forms to be linked are given to groups that are not equivalent and the forms share no common items by which to measure or adjust for this group nonequivalence. We compared three approaches to adjusting for group nonequivalence in a situation where not only is randomization questionable, but the number of common items is small. Group adjustment through either subgroup weighting, a weak anchor, or a mix of both was evaluated in terms of linking accuracy using a resampling approach. We used data from a single test form to create two research forms for which the equating relationship was known. The results showed that both subgroup weighting and weak anchor approaches produced nearly equivalent linking results when group equivalence was not met. Direct (random groups) linking methods produced the least accurate result due to nontrivial bias. Use of subgroup weighting and linking using the anchor test only marginally improved linking accuracy compared to using the weak anchor alone when the degree of group nonequivalence was small.

要使不同形式的考试成绩相等，就需要收集各种形式之间的数据。当要链接的测试表格被分配给不相等的组时，问题就出现了，并且这些表格没有共同的项目来衡量或调整这种组的不相等性。我们比较了三种方法来调整群体不等效的情况下，不仅是随机的问题，但共同项目的数量是小的。通过子组加权、弱锚或两者的混合进行组调整，使用重新抽样方法评估连接准确性。我们使用来自单一测试表格的数据来创建两个已知相等关系的研究表格。结果表明，在不满足群等价的情况下，子群加权法和弱锚法的链接结果几乎相等。由于非平凡偏差，直接(随机组)链接方法产生的结果最不准确。当分组不等价程度较小时，与单独使用弱锚相比，使用子组加权和锚试验连接仅略微提高了连接精度。

{"title":"Comparisons Among Approaches to Link Tests Using Random Samples Selected Under Suboptimal Conditions","authors":"Sooyeon Kim, Michael E. Walker","doi":"10.1002/ets2.12328","DOIUrl":"10.1002/ets2.12328","url":null,"abstract":"Equating the scores from different forms of a test requires collecting data that link the forms. Problems arise when the test forms to be linked are given to groups that are not equivalent and the forms share no common items by which to measure or adjust for this group nonequivalence. We compared three approaches to adjusting for group nonequivalence in a situation where not only is randomization questionable, but the number of common items is small. Group adjustment through either subgroup weighting, a weak anchor, or a mix of both was evaluated in terms of linking accuracy using a resampling approach. We used data from a single test form to create two research forms for which the equating relationship was known. The results showed that both subgroup weighting and weak anchor approaches produced nearly equivalent linking results when group equivalence was not met. Direct (random groups) linking methods produced the least accurate result due to nontrivial bias. Use of subgroup weighting and linking using the anchor test only marginally improved linking accuracy compared to using the weak anchor alone when the degree of group nonequivalence was small.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-20"},"PeriodicalIF":0.0,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12328","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42240257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Benchmark Keystroke Biometrics Accuracy From High-Stakes Writing Tasks 从高风险的写作任务基准击键生物识别的准确性

Q3 Social Sciences

ETS Research Report Series

Pub Date : 2021-08-11 DOI: 10.1002/ets2.12326

Ikkyu Choi, Jiangang Hao, Paul Deane, Mo Zhang

Biometrics are physical or behavioral human characteristics that can be used to identify a person. It is widely known that keystroke or typing dynamics for short, fixed texts (e.g., passwords) could serve as a behavioral biometric. In this study, we investigate whether keystroke data from essay responses can lead to a reliable biometric measure, with implications for test security and the monitoring of writing fluency and style changes. Based on keystroke data collected from a high-stakes writing testing setting, we established a preliminary biometric benchmark for detecting test-taker identity by using features extracted from their writing process logs. We report a benchmark keystroke biometric accuracy of equal error rate of 4.7% for identifying same versus different individuals on an essay task. In particular, we show that the inclusion of writing process features (e.g., features designed to describe the writing process) in addition to the widely used typing-timing features (e.g., features based on the time intervals between two-letter key sequences) improves the accuracy of the keystroke biometrics. The proposed keystroke biometrics can have important implications for the writing assessments administered through the remotely proctored tests that have been widely adopted during the COVID pandemic.

生物特征是指可以用来识别一个人的身体或行为特征。众所周知，短而固定的文本(如密码)的击键或键入动态可以作为行为生物识别技术。在这项研究中，我们调查了来自论文回答的击键数据是否可以导致可靠的生物特征测量，这对测试安全性和写作流畅性和风格变化的监测具有重要意义。基于从高风险写作测试中收集的击键数据，我们建立了一个初步的生物识别基准，通过从他们的写作过程日志中提取特征来检测测试者的身份。我们报告了在一篇论文任务中识别相同与不同个体的基准击键生物识别准确率为4.7%。特别是，我们表明，除了广泛使用的打字计时特征(例如，基于两个字母键序列之间的时间间隔的特征)之外，还包括书写过程特征(例如，设计用于描述书写过程的特征)，从而提高了击键生物识别的准确性。拟议的击键生物识别技术可对通过远程监考进行的写作评估产生重要影响，这种测试在COVID大流行期间已被广泛采用。©2021教育考试服务

{"title":"Benchmark Keystroke Biometrics Accuracy From High-Stakes Writing Tasks","authors":"Ikkyu Choi, Jiangang Hao, Paul Deane, Mo Zhang","doi":"10.1002/ets2.12326","DOIUrl":"10.1002/ets2.12326","url":null,"abstract":"Biometrics are physical or behavioral human characteristics that can be used to identify a person. It is widely known that keystroke or typing dynamics for short, fixed texts (e.g., passwords) could serve as a behavioral biometric. In this study, we investigate whether keystroke data from essay responses can lead to a reliable biometric measure, with implications for test security and the monitoring of writing fluency and style changes. Based on keystroke data collected from a high-stakes writing testing setting, we established a preliminary biometric benchmark for detecting test-taker identity by using features extracted from their writing process logs. We report a benchmark keystroke biometric accuracy of equal error rate of 4.7% for identifying same versus different individuals on an essay task. In particular, we show that the inclusion of writing process features (e.g., features designed to describe the writing process) in addition to the widely used typing-timing features (e.g., features based on the time intervals between two-letter key sequences) improves the accuracy of the keystroke biometrics. The proposed keystroke biometrics can have important implications for the writing assessments administered through the remotely proctored tests that have been widely adopted during the COVID pandemic.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12326","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48093103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1