Journal of Educational Measurement最新文献

英文中文

Sociocognitive Processes and Item Response Models: A Didactic Example 社会认知过程与项目反应模型：教学实例

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2023-09-15 DOI: 10.1111/jedm.12376

Tao Gong, Lan Shuai, Robert J. Mislevy

The usual interpretation of the person and task variables in between-persons measurement models such as item response theory (IRT) is as attributes of persons and tasks, respectively. They can be viewed instead as ensemble descriptors of patterns of interactions among persons and situations that arise from sociocognitive complex adaptive system (CASs). This view offers insights for interpreting and using between-persons measurement models and connecting with sociocognitive research. In this article, we use data generated from an agent-based model to illustrate relations between “social” and “cognitive” features of a simple underlying CAS and the variables of an IRT model fit to resulting data. We note how the ideas connect to explanatory item response modeling and briefly comment on implications for score interpretations and uses in practice.

在人与人之间的测量模型（如项目反应理论（IRT））中，通常将人和任务变量分别解释为人和任务的属性。相反，它们可以被看作是社会认知复杂适应系统（CAS）中产生的人与情境之间互动模式的集合描述符。这种观点为解释和使用人与人之间的测量模型以及与社会认知研究的联系提供了启示。在本文中，我们利用一个基于代理的模型所产生的数据，来说明一个简单的基本 CAS 的 "社会 "和 "认知 "特征之间的关系，以及与所产生的数据相适应的 IRT 模型的变量之间的关系。我们指出了这些观点与解释性项目反应模型之间的联系，并简要评述了对分数解释和实际应用的影响。

引用次数: 0

Measuring the Impact of Peer Interaction in Group Oral Assessments with an Extended Many-Facet Rasch Model 用扩展的多面 Rasch 模型衡量小组口语评估中同伴互动的影响

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2023-09-15 DOI: 10.1111/jedm.12375

Kuan-Yu Jin, Thomas Eckes

Many language proficiency tests include group oral assessments involving peer interaction. In such an assessment, examinees discuss a common topic with others. Human raters score each examinee's spoken performance on specially designed criteria. However, measurement models for analyzing group assessment data usually assume local person independence and thus fail to consider the impact of peer interaction on the assessment outcomes. This research advances an extended many-facet Rasch model for group assessments (MFRM-GA), accounting for local person dependence. In a series of simulations, we examined the MFRM-GA's parameter recovery and the consequences of ignoring peer interactions under the traditional modeling approach. We also used a real dataset from the English-speaking test of the Language Proficiency Assessment for Teachers (LPAT) routinely administered in Hong Kong to illustrate the efficiency of the new model. The discussion focuses on the model's usefulness for measuring oral language proficiency, practical implications, and future research perspectives.

许多语言水平测试都包括涉及同伴互动的小组口语评估。在这种评估中，受试者与其他人讨论一个共同的话题。人工评分员根据专门设计的标准对每个受测者的口语表现进行评分。然而，用于分析小组评估数据的测量模型通常假定本地人是独立的，因此无法考虑同伴互动对评估结果的影响。本研究提出了一种用于小组测评的扩展多面 Rasch 模型（MFRM-GA），该模型考虑了局部人的依赖性。在一系列模拟中，我们检验了 MFRM-GA 的参数恢复情况，以及在传统建模方法下忽略同伴互动的后果。我们还使用了一个真实的数据集，该数据集来自于香港教师语言能力评估（LPAT）的英语测试，该测试在香港常规进行，以说明新模型的效率。讨论的重点是该模型在测量口语能力方面的实用性、实际意义以及未来的研究前景。

引用次数: 0

Derek C. Briggs Historical and Conceptual Foundations of Measurement in the Human Sciences: Credos and Controversies Derek C. Briggs 人类科学测量的历史和概念基础：信誉与争议

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2023-09-09 DOI: 10.1111/jedm.12374

David Torres Irribarra

引用次数: 0

Using Response Time in Multidimensional Computerized Adaptive Testing 响应时间在多维计算机自适应测试中的应用

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2023-07-07 DOI: 10.1111/jedm.12373

Yinhong He, Yuanyuan Qi

In multidimensional computerized adaptive testing (MCAT), item selection strategies are generally constructed based on responses, and they do not consider the response times required by items. This study constructed two new criteria (referred to as DT-inc and DT) for MCAT item selection by utilizing information from response times. The new designs maximize the amount of information per unit time. Furthermore, these two new designs were extended to the DT_S-inc and DT_S designs to efficiently estimate intentional abilities. Moreover, the EAP method for ability estimation was also equipped with response time. The performances of the response-time-based EAP (RT-based EAP) and the new designs were evaluated in simulation and empirical studies. The results showed that the RT-based EAP significantly improved the ability estimation precision compared with the EAP without using response time, and the new designs dramatically saved testing times for examinees with a small sacrifice of ability estimation precision and item pool usage.

在多维计算机自适应测试(MCAT)中，题项选择策略通常是基于反应来构建的，而没有考虑题项所需的反应时间。本研究利用反应时间的信息，构建了MCAT题项选择的DT-inc和DT两个新标准。新的设计使单位时间内的信息量最大化。此外，将这两种新设计扩展到DTS-inc和DTS设计中，以有效地评估意向能力。此外，EAP能力估计方法还配备了响应时间。通过仿真和实证研究对基于响应时间的EAP (RT-based EAP)和新设计的性能进行了评价。结果表明，与不考虑反应时间的EAP相比，基于rt的EAP显著提高了考生的能力估计精度，新设计显著节省了考生的测试时间，同时降低了能力估计精度和题库使用率。

引用次数: 0

Digital dependence: Online fatigue and coping strategies during the COVID-19 lockdown. 数字依赖：COVID-19 封锁期间的在线疲劳和应对策略。

4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2023-07-01 Epub Date: 2023-02-11 DOI: 10.1177/01634437231154781

Emilie Munch Gregersen, Sofie Læbo Astrupgaard, Malene Hornstrup Jespersen, Tobias Priesholm Gårdhus, Kristoffer Albris

As the COVID-19 pandemic lockdowns forced populations across the world to become completely dependent on digital devices for working, studying, and socializing, there has been no shortage of published studies about the possible negative effects of the increased use of digital devices during this exceptional period. In seeking to empirically address how the concern with digital dependency has been experienced during the pandemic, we present findings from a study of daily self-reported logbooks by 59 university students in Copenhagen, Denmark, over 4 weeks in April and May 2020, investigating their everyday use of digital devices. We highlight two main findings. First, students report high levels of online fatigue, expressed as frustration with their constant reliance on digital devices. On the other hand, students found creative ways of using digital devices for maintaining social relations, helping them to cope with isolation. Such online interactions were nevertheless seen as a poor substitute for physical interactions in the long run. Our findings show how the dependence on digital devices was marked by ambivalence, where digital communication was seen as both the cure against, and cause of, feeling isolated and estranged from a sense of normality.

COVID-19 大流行的封锁迫使世界各地的人们在工作、学习和社交中完全依赖于数字设备，关于在这一特殊时期更多地使用数字设备可能产生的负面影响的研究也不乏发表。为了以实证研究的方式探讨大流行病期间人们对数码设备依赖性的担忧，我们在 2020 年 4 月至 5 月的 4 周时间里，对丹麦哥本哈根的 59 名大学生每天自我报告的日志进行了研究，调查了他们对数码设备的日常使用情况。我们强调两个主要发现。首先，学生们报告了高度的在线疲劳，表现为对持续依赖数字设备的挫败感。另一方面，学生们发现了使用数字设备维持社会关系的创造性方法，帮助他们应对孤独。然而，从长远来看，这种在线互动并不能很好地替代实体互动。我们的研究结果表明，对数字设备的依赖带有矛盾的特点，数字通信既被视为消除孤独感和疏离感的良药，也被视为造成孤独感和疏离感的原因。

{"title":"Digital dependence: Online fatigue and coping strategies during the COVID-19 lockdown.","authors":"Emilie Munch Gregersen, Sofie Læbo Astrupgaard, Malene Hornstrup Jespersen, Tobias Priesholm Gårdhus, Kristoffer Albris","doi":"10.1177/01634437231154781","DOIUrl":"10.1177/01634437231154781","url":null,"abstract":"As the COVID-19 pandemic lockdowns forced populations across the world to become completely dependent on digital devices for working, studying, and socializing, there has been no shortage of published studies about the possible negative effects of the increased use of digital devices during this exceptional period. In seeking to empirically address how the concern with digital dependency has been experienced during the pandemic, we present findings from a study of daily self-reported logbooks by 59 university students in Copenhagen, Denmark, over 4 weeks in April and May 2020, investigating their everyday use of digital devices. We highlight two main findings. First, students report high levels of online fatigue, expressed as frustration with their constant reliance on digital devices. On the other hand, students found creative ways of using digital devices for maintaining social relations, helping them to cope with isolation. Such online interactions were nevertheless seen as a poor substitute for physical interactions in the long run. Our findings show how the dependence on digital devices was marked by ambivalence, where digital communication was seen as both the cure against, and cause of, feeling isolated and estranged from a sense of normality.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"33 1","pages":"967-984"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9922647/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85419232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests 测试项目格式中的性别偏见：来自PISA 2009、2012和2015年数学和阅读测试的证据

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2023-06-09 DOI: 10.1111/jedm.12372

Benjamin R. Shear

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents evidence that among nationally representative samples of 15-year-olds in the United States participating in the 2009, 2012, and 2015 PISA math and reading tests, there are consistent item format by gender differences. On average, male students answer multiple-choice items correctly relatively more often and female students answer constructed-response items correctly relatively more often. These patterns were consistent across 34 additional participating PISA jurisdictions, although the size of the format differences varied and were larger on average in reading than math. The average magnitude of the format differences is not large enough to be flagged in routine differential item functioning analyses intended to detect test bias but is large enough to raise questions about the validity of inferences based on comparisons of scores across gender groups. Researchers and other test users should account for test item format, particularly when comparing scores across gender groups.

大规模标准化考试通常用于衡量学生的整体成绩和学生分组。这些用途假设测试提供了跨学生亚组结果的可比测量，但先前的研究表明，跨性别群体的分数比较可能因使用的测试项目类型而变得复杂。本文提供的证据表明，在参加2009年、2012年和2015年PISA数学和阅读测试的美国15岁学生的全国代表性样本中，性别差异的项目格式是一致的。平均而言，男学生答对多项选择题的频率相对较高，女学生答对构念题的频率相对较高。这些模式在另外34个参与PISA的司法管辖区是一致的，尽管格式差异的大小各不相同，阅读的平均差异大于数学。格式差异的平均幅度不足以在旨在检测测试偏差的常规差异项目功能分析中进行标记，但足以对基于跨性别群体得分比较的推断的有效性提出质疑。研究人员和其他测试用户应该考虑到测试项目的格式，特别是在比较不同性别群体的分数时。

{"title":"Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests","authors":"Benjamin R. Shear","doi":"10.1111/jedm.12372","DOIUrl":"10.1111/jedm.12372","url":null,"abstract":"Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents evidence that among nationally representative samples of 15-year-olds in the United States participating in the 2009, 2012, and 2015 PISA math and reading tests, there are consistent item format by gender differences. On average, male students answer multiple-choice items correctly relatively more often and female students answer constructed-response items correctly relatively more often. These patterns were consistent across 34 additional participating PISA jurisdictions, although the size of the format differences varied and were larger on average in reading than math. The average magnitude of the format differences is not large enough to be flagged in routine differential item functioning analyses intended to detect test bias but is large enough to raise questions about the validity of inferences based on comparisons of scores across gender groups. Researchers and other test users should account for test item format, particularly when comparing scores across gender groups.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 4","pages":"676-696"},"PeriodicalIF":1.3,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42035945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting Differential Item Functioning in CAT Using IRT Residual DIF Approach 利用IRT残差DIF方法检测CAT中不同项目的功能

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2023-04-28 DOI: 10.1111/jedm.12366

Hwanggyu Lim, Edison M. Choe

The residual differential item functioning (RDIF) detection framework was developed recently under a linear testing context. To explore the potential application of this framework to computerized adaptive testing (CAT), the present study investigated the utility of the RDIF_R statistic both as an index for detecting uniform DIF of pretest items in CAT and as a direct measure of the effect size of uniform DIF. Extensive CAT simulations revealed RDIF_R to have well-controlled Type I error and slightly higher power to detect uniform DIF compared with CATSIB, especially when pretest items were calibrated using fixed-item parameter calibration. Moreover, RDIF_R accurately estimated the amount of uniform DIF irrespective of the presence of impact. Therefore, RDIF_R demonstrates its potential as a useful tool for evaluating both the statistical and practical significance of uniform DIF in CAT.

残差项目功能(RDIF)检测框架是近年来在线性测试环境下发展起来的。为了探索这一框架在计算机化自适应测试(CAT)中的潜在应用，本研究调查了RDIFR统计量作为检测CAT预试项目均匀DIF的指标和作为均匀DIF效应大小的直接测量的效用。广泛的CAT模拟表明，与CATSIB相比，RDIFR具有良好控制的I型误差，并且检测均匀DIF的能力略高，特别是当使用固定项目参数校准预测项目时。此外，RDIFR准确地估计了均匀DIF的量，而不考虑是否存在冲击。因此，RDIFR显示了其作为评估CAT中均匀DIF的统计和实际意义的有用工具的潜力。

引用次数: 0

Controlling the Speededness of Assembled Test Forms: A Generalization to the Three-Parameter Lognormal Response Time Model 控制组合测试表格的速度:对三参数对数正态响应时间模型的推广

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2023-04-27 DOI: 10.1111/jedm.12364

Benjamin Becker, Sebastian Weirich, Frank Goldhammer, Dries Debeer

When designing or modifying a test, an important challenge is controlling its speededness. To achieve this, van der Linden (2011a, 2011b) proposed using a lognormal response time model, more specifically the two-parameter lognormal model, and automated test assembly (ATA) via mixed integer linear programming. However, this approach has a severe limitation, in that the two-parameter lognormal model lacks a slope parameter. This means that the model assumes that all items are equally speed sensitive. From a conceptual perspective, this assumption seems very restrictive. Furthermore, various other empirical studies and new data analyses performed by us show that this assumption almost never holds in practice. To overcome this shortcoming, we bring together the already frequently used three-parameter lognormal model for response times, which contains a slope parameter, and the ATA approach for controlling speededness by van der Linden. Using multiple empirically based illustrations, the proposed extension is illustrated, including complete and documented R code. Both the original van der Linden approach and our newly proposed approach are available to practitioners in the freely available R package eatATA.

当设计或修改测试时，一个重要的挑战是控制测试的速度。为了实现这一点，van der Linden (2011a, 2011b)提出使用对数正态响应时间模型，更具体地说是双参数对数正态模型，并通过混合整数线性规划实现自动化测试装配(ATA)。然而，这种方法有一个严重的局限性，即双参数对数正态模型缺乏斜率参数。这意味着该模型假定所有项目对速度都同样敏感。从概念的角度来看，这个假设似乎非常有限。此外，我们进行的各种其他实证研究和新数据分析表明，这一假设几乎从未在实践中成立。为了克服这一缺点，我们将已经经常使用的响应时间的三参数对数正态模型(包含一个斜率参数)和由范德林登控制速度的ATA方法结合在一起。使用多个基于经验的插图，说明了建议的扩展，包括完整的和文档化的R代码。原始的van der Linden方法和我们新提出的方法都可以在免费的R包eatATA中获得。

{"title":"Controlling the Speededness of Assembled Test Forms: A Generalization to the Three-Parameter Lognormal Response Time Model","authors":"Benjamin Becker, Sebastian Weirich, Frank Goldhammer, Dries Debeer","doi":"10.1111/jedm.12364","DOIUrl":"10.1111/jedm.12364","url":null,"abstract":"When designing or modifying a test, an important challenge is controlling its speededness. To achieve this, van der Linden (2011a, 2011b) proposed using a lognormal response time model, more specifically the two-parameter lognormal model, and automated test assembly (ATA) via mixed integer linear programming. However, this approach has a severe limitation, in that the two-parameter lognormal model lacks a slope parameter. This means that the model assumes that all items are equally speed sensitive. From a conceptual perspective, this assumption seems very restrictive. Furthermore, various other empirical studies and new data analyses performed by us show that this assumption almost never holds in practice. To overcome this shortcoming, we bring together the already frequently used three-parameter lognormal model for response times, which contains a slope parameter, and the ATA approach for controlling speededness by van der Linden. Using multiple empirically based illustrations, the proposed extension is illustrated, including complete and documented R code. Both the original van der Linden approach and our newly proposed approach are available to practitioners in the freely available R package eatATA.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 4","pages":"551-574"},"PeriodicalIF":1.3,"publicationDate":"2023-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12364","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49199830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Note on Latent Traits Estimates under IRT Models with Missingness 含缺失的IRT模型下潜在性状估计的注解

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2023-04-26 DOI: 10.1111/jedm.12365

Jinxin Guo, Xin Xu, Tao Xin

Missingness due to not-reached items and omitted items has received much attention in the recent psychometric literature. Such missingness, if not handled properly, would lead to biased parameter estimation, as well as inaccurate inference of examinees, and further erode the validity of the test. This paper reviews some commonly used IRT based models allowing missingness, followed by three popular examinee scoring methods, including maximum likelihood estimation, maximum a posteriori, and expected a posteriori. Simulation studies were conducted to compare these examinee scoring methods across these commonly used models in the presence of missingness. Results showed that all the methods could infer examinees' ability accurately when the missingness is ignorable. If the missingness is nonignorable, incorporating those missing responses would improve the precision in estimating abilities for examinees with missingness, especially when the test length is short. In terms of examinee scoring methods, expected a posteriori method performed better for evaluating latent traits under models allowing missingness. An empirical study based on the PISA 2015 Science Test was further performed.

在最近的心理测量学文献中，由于未到达项目和遗漏项目而导致的缺失受到了广泛的关注。这种缺失如果处理不当，会导致参数估计偏倚，导致考生推理不准确，进一步削弱考试的效度。本文综述了一些常用的基于IRT的遗漏模型，然后介绍了三种常用的考生评分方法，包括最大似然估计、最大后验和期望后验。模拟研究进行比较这些考生评分方法在这些常用的模型在缺失的存在。结果表明，在缺失可忽略的情况下，所有方法都能准确地推断出考生的能力。如果缺失是不可忽略的，将这些缺失的回答合并将提高对缺失考生能力的估计精度，特别是当考试长度较短时。在考生评分方法方面，在允许缺失的模型下，期望后验方法能更好地评价潜在特征。基于2015年PISA科学测试，进一步进行实证研究。

{"title":"A Note on Latent Traits Estimates under IRT Models with Missingness","authors":"Jinxin Guo, Xin Xu, Tao Xin","doi":"10.1111/jedm.12365","DOIUrl":"10.1111/jedm.12365","url":null,"abstract":"Missingness due to not-reached items and omitted items has received much attention in the recent psychometric literature. Such missingness, if not handled properly, would lead to biased parameter estimation, as well as inaccurate inference of examinees, and further erode the validity of the test. This paper reviews some commonly used IRT based models allowing missingness, followed by three popular examinee scoring methods, including maximum likelihood estimation, maximum a posteriori, and expected a posteriori. Simulation studies were conducted to compare these examinee scoring methods across these commonly used models in the presence of missingness. Results showed that all the methods could infer examinees' ability accurately when the missingness is ignorable. If the missingness is nonignorable, incorporating those missing responses would improve the precision in estimating abilities for examinees with missingness, especially when the test length is short. In terms of examinee scoring methods, expected a posteriori method performed better for evaluating latent traits under models allowing missingness. An empirical study based on the PISA 2015 Science Test was further performed.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 4","pages":"575-625"},"PeriodicalIF":1.3,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44924100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online Monitoring of Test-Taking Behavior Based on Item Responses and Response Times 基于项目反应和反应时间的测试行为在线监控

IF 1.3 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement

Pub Date : 2023-04-17 DOI: 10.1111/jedm.12367

Suhwa Han, Hyeon-Ah Kang

The study presents multivariate sequential monitoring procedures for examining test-taking behaviors online. The procedures monitor examinee's responses and response times and signal aberrancy as soon as significant change is identifieddetected in the test-taking behavior. The study in particular proposes three schemes to track different indicators of a test-taking mode—the observable manifest variables, latent trait variables, and measurement likelihood. For each procedure, sequential sampling strategies are presented to implement online monitoring. Numerical experimentation based on simulated data suggests that the proposed procedures demonstrate adequate performance. The procedures identified examinees with aberrant behaviors with high detection power and timeliness, while maintaining error rates reasonably small. Experimental application to real data also suggested that the procedures have practical relevance to real assessments. Based on the observations from the experiential analysis, the study discusses implications and guidelines for practical use.

本研究提出了在线考试行为检查的多变量顺序监测程序。该程序监测考生的反应和反应时间，并在发现考试行为发生重大变化时立即发出异常信号。该研究特别提出了三种方案来跟踪测试模式的不同指标-可观察的明显变量，潜在特征变量和测量似然。对于每个过程，提出了顺序采样策略来实现在线监测。基于模拟数据的数值实验表明，该方法具有良好的性能。该程序对考生异常行为的识别具有较高的检测能力和及时性，同时使错误率保持在合理的低水平。对实际数据的实验应用也表明，该程序与实际评估具有实际相关性。在实证分析的基础上，探讨了实证研究的启示和实践指导。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Educational Measurement

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀