首页 > 最新文献

Journal of Educational Measurement最新文献

英文 中文
Incorporating Test-Taking Engagement into Multistage Adaptive Testing Design for Large-Scale Assessments 将应试参与纳入大规模评估的多阶段自适应测试设计
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2023-11-10 DOI: 10.1111/jedm.12380
Okan Bulut, Guher Gorgun, Hacer Karamese

The use of multistage adaptive testing (MST) has gradually increased in large-scale testing programs as MST achieves a balanced compromise between linear test design and item-level adaptive testing. MST works on the premise that each examinee gives their best effort when attempting the items, and their responses truly reflect what they know or can do. However, research shows that large-scale assessments may suffer from a lack of test-taking engagement, especially if they are low stakes. Examinees with low test-taking engagement are likely to show noneffortful responding (e.g., answering the items very rapidly without reading the item stem or response options). To alleviate the impact of noneffortful responses on the measurement accuracy of MST, test-taking engagement can be operationalized as a latent trait based on response times and incorporated into the on-the-fly module assembly procedure. To demonstrate the proposed approach, a Monte-Carlo simulation study was conducted based on item parameters from an international large-scale assessment. The results indicated that the on-the-fly module assembly considering both ability and test-taking engagement could minimize the impact of noneffortful responses, yielding more accurate ability estimates and classifications. Implications for practice and directions for future research were discussed.

由于多级自适应测试在线性测试设计和项目级自适应测试之间取得了平衡,因此在大规模测试项目中,多级自适应测试(MST)的使用逐渐增加。MST的工作前提是每位考生在尝试试题时都尽了最大的努力,他们的回答真实地反映了他们所知道或能做的事情。然而,研究表明,大规模的评估可能会受到缺乏参与考试的影响,特别是如果它们是低风险的。参与度低的考生可能表现出不费力的反应(例如,在不阅读题干或回答选项的情况下非常快速地回答问题)。为了减轻不费力反应对MST测量精度的影响,应试投入可以作为一个基于反应时间的潜在特质进行操作,并纳入实时模块组装过程。为了验证所提出的方法,基于国际大规模评估的项目参数进行了蒙特卡罗模拟研究。结果表明,同时考虑能力和应试参与度的即时模块组合可以最大限度地减少不费力回答的影响,从而产生更准确的能力估计和分类。讨论了实践意义和未来研究方向。
{"title":"Incorporating Test-Taking Engagement into Multistage Adaptive Testing Design for Large-Scale Assessments","authors":"Okan Bulut,&nbsp;Guher Gorgun,&nbsp;Hacer Karamese","doi":"10.1111/jedm.12380","DOIUrl":"10.1111/jedm.12380","url":null,"abstract":"<p>The use of multistage adaptive testing (MST) has gradually increased in large-scale testing programs as MST achieves a balanced compromise between linear test design and item-level adaptive testing. MST works on the premise that each examinee gives their best effort when attempting the items, and their responses truly reflect what they know or can do. However, research shows that large-scale assessments may suffer from a lack of test-taking engagement, especially if they are low stakes. Examinees with low test-taking engagement are likely to show noneffortful responding (e.g., answering the items very rapidly without reading the item stem or response options). To alleviate the impact of noneffortful responses on the measurement accuracy of MST, test-taking engagement can be operationalized as a latent trait based on response times and incorporated into the on-the-fly module assembly procedure. To demonstrate the proposed approach, a Monte-Carlo simulation study was conducted based on item parameters from an international large-scale assessment. The results indicated that the on-the-fly module assembly considering both ability and test-taking engagement could minimize the impact of noneffortful responses, yielding more accurate ability estimates and classifications. Implications for practice and directions for future research were discussed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"62 1","pages":"57-80"},"PeriodicalIF":1.4,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12380","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135137584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Information Functions of Rank-2PL Models for Forced-Choice Questionnaires 强迫选择问卷的等级-2PL 模型的信息函数
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2023-10-29 DOI: 10.1111/jedm.12379
Jianbin Fu, Xuan Tan, Patrick C. Kyllonen

This paper presents the item and test information functions of the Rank two-parameter logistic models (Rank-2PLM) for items with two (pair) and three (triplet) statements in forced-choice questionnaires. The Rank-2PLM model for pairs is the MUPP-2PLM (Multi-Unidimensional Pairwise Preference) and, for triplets, is the Triplet-2PLM. Fisher's information and directional information are described, and the test information for Maximum Likelihood (ML), Maximum A Posterior (MAP), and Expected A Posterior (EAP) trait score estimates is distinguished. Expected item/test information indexes at various levels are proposed and plotted to provide diagnostic information on items and tests. The expected test information indexes for EAP scores may be difficult to compute due to a typical test's vast number of item response patterns. The relationships of item/test information with discrimination parameters of statements, standard error, and reliability estimates of trait score estimates are discussed and demonstrated using real data. Practical suggestions for checking the various expected item/test information indexes and plots are provided.

本文介绍了强迫选择问卷中包含两个(成对)和三个(三连音)陈述的项目的 Rank 双参数逻辑模型(Rank-2PLM)的项目信息函数和测试信息函数。针对成对陈述的 Rank-2PLM 模型为 MUPP-2PLM(多维成对偏好),针对三重陈述的 Rank-2PLM 模型为 Triplet-2PLM。描述了费雪信息和方向信息,并区分了最大似然(ML)、最大 A 后验(MAP)和期望 A 后验(EAP)性状分数估计的测试信息。提出并绘制了不同水平的预期项目/测验信息指数,以提供项目和测验的诊断信息。由于典型测验的项目反应模式数量庞大,EAP 分数的预期测验信息指数可能难以计算。本文讨论了项目/测验信息与语句辨别参数、标准误差和特质分值估计的可靠性估计之间的关系,并使用真实数据进行了演示。此外,还提供了检查各种预期项目/测验信息指数和绘图的实用建议。
{"title":"Information Functions of Rank-2PL Models for Forced-Choice Questionnaires","authors":"Jianbin Fu,&nbsp;Xuan Tan,&nbsp;Patrick C. Kyllonen","doi":"10.1111/jedm.12379","DOIUrl":"10.1111/jedm.12379","url":null,"abstract":"<p>This paper presents the item and test information functions of the Rank two-parameter logistic models (Rank-2PLM) for items with two (pair) and three (triplet) statements in forced-choice questionnaires. The Rank-2PLM model for pairs is the MUPP-2PLM (Multi-Unidimensional Pairwise Preference) and, for triplets, is the Triplet-2PLM. Fisher's information and directional information are described, and the test information for Maximum Likelihood (ML), Maximum A Posterior (MAP), and Expected A Posterior (EAP) trait score estimates is distinguished. Expected item/test information indexes at various levels are proposed and plotted to provide diagnostic information on items and tests. The expected test information indexes for EAP scores may be difficult to compute due to a typical test's vast number of item response patterns. The relationships of item/test information with discrimination parameters of statements, standard error, and reliability estimates of trait score estimates are discussed and demonstrated using real data. Practical suggestions for checking the various expected item/test information indexes and plots are provided.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 1","pages":"125-149"},"PeriodicalIF":1.3,"publicationDate":"2023-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136134855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Multidimensional DIF in Polytomous Items with IRT Methods and Estimation Approaches 用 IRT 方法和估计方法检测多同调项目中的多维 DIF
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2023-10-15 DOI: 10.1111/jedm.12377
Güler Yavuz Temel

The purpose of this study was to investigate multidimensional DIF with a simple and nonsimple structure in the context of multidimensional Graded Response Model (MGRM). This study examined and compared the performance of the IRT-LR and Wald test using MML-EM and MHRM estimation approaches with different test factors and test structures in simulation studies and applying real data sets. When the test structure included two dimensions, the IRT-LR (MML-EM) generally performed better than the Wald test and provided higher power rates. If the test included three dimensions, the methods provided similar performance in DIF detection. In contrast to these results, when the number of dimensions in the test was four, MML-EM estimation completely lost precision in estimating the nonuniform DIF, even with large sample sizes. The Wald with MHRM estimation approaches outperformed the Wald test (MML-EM) and IRT-LR (MML-EM). The Wald test had higher power rate and acceptable type I error rates for nonuniform DIF with the MHRM estimation approach.The small and/or unbalanced sample sizes, small DIF magnitudes, unequal ability distributions between groups, number of dimensions, estimation methods and test structure were evaluated as important test factors for detecting multidimensional DIF.

本研究的目的是在多维分级反应模型(MGRM)的背景下研究具有简单和非简单结构的多维 DIF。本研究在模拟研究和应用真实数据集时,使用 MML-EM 和 MHRM 估算方法,检验并比较了不同测试因子和测试结构下 IRT-LR 和 Wald 检验的性能。当测试结构包括两个维度时,IRT-LR(MML-EM)的性能通常优于 Wald 检验,并提供更高的功率率。如果测试包括三个维度,这两种方法的 DIF 检测性能相似。与这些结果相反,当测试的维度数为四个时,MML-EM 估计在估计非均匀 DIF 方面完全失去了精确性,即使样本量很大也是如此。采用 MML-EM 估计方法的 Wald 检验结果优于 Wald 检验(MML-EM)和 IRT-LR (MML-EM)。小样本量和/或不平衡样本量、小 DIF 量级、组间能力分布不均、维度数量、估计方法和测试结构被评估为检测多维 DIF 的重要测试因素。
{"title":"Detecting Multidimensional DIF in Polytomous Items with IRT Methods and Estimation Approaches","authors":"Güler Yavuz Temel","doi":"10.1111/jedm.12377","DOIUrl":"10.1111/jedm.12377","url":null,"abstract":"<p>The purpose of this study was to investigate multidimensional DIF with a simple and nonsimple structure in the context of multidimensional Graded Response Model (MGRM). This study examined and compared the performance of the IRT-LR and Wald test using MML-EM and MHRM estimation approaches with different test factors and test structures in simulation studies and applying real data sets. When the test structure included two dimensions, the IRT-LR (MML-EM) generally performed better than the Wald test and provided higher power rates. If the test included three dimensions, the methods provided similar performance in DIF detection. In contrast to these results, when the number of dimensions in the test was four, MML-EM estimation completely lost precision in estimating the nonuniform DIF, even with large sample sizes. The Wald with MHRM estimation approaches outperformed the Wald test (MML-EM) and IRT-LR (MML-EM). The Wald test had higher power rate and acceptable type I error rates for nonuniform DIF with the MHRM estimation approach.The small and/or unbalanced sample sizes, small DIF magnitudes, unequal ability distributions between groups, number of dimensions, estimation methods and test structure were evaluated as important test factors for detecting multidimensional DIF.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 1","pages":"69-98"},"PeriodicalIF":1.3,"publicationDate":"2023-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136185515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSAEM Estimation for Confirmatory Multidimensional Four-Parameter Normal Ogive Models 确认性多维四参数正态椭圆模型的 MSAEM 估计
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2023-10-09 DOI: 10.1111/jedm.12378
Jia Liu, Xiangbin Meng, Gongjun Xu, Wei Gao, Ningzhong Shi

In this paper, we develop a mixed stochastic approximation expectation-maximization (MSAEM) algorithm coupled with a Gibbs sampler to compute the marginalized maximum a posteriori estimate (MMAPE) of a confirmatory multidimensional four-parameter normal ogive (M4PNO) model. The proposed MSAEM algorithm not only has the computational advantages of the stochastic approximation expectation-maximization (SAEM) algorithm for multidimensional data, but it also alleviates the potential instability caused by label-switching, and then improved the estimation accuracy. Simulation studies are conducted to illustrate the good performance of the proposed MSAEM method, where MSAEM consistently performs better than SAEM and some other existing methods in multidimensional item response theory. Moreover, the proposed method is applied to a real data set from the 2018 Programme for International Student Assessment (PISA) to demonstrate the usefulness of the 4PNO model as well as MSAEM in practice.

本文开发了一种与吉布斯采样器相结合的混合随机逼近期望最大化(MSAEM)算法,用于计算确证多维四参数正态椭圆(M4PNO)模型的边际最大后验估计值(MMAPE)。所提出的 MSAEM 算法不仅具有多维数据随机逼近期望最大化(SAEM)算法的计算优势,而且缓解了标签切换可能导致的不稳定性,进而提高了估计精度。仿真研究说明了所提出的 MSAEM 方法的良好性能,MSAEM 的性能一直优于 SAEM 和其他一些现有的多维项目反应理论方法。此外,还将提出的方法应用于 2018 年国际学生评估项目(PISA)的真实数据集,以证明 4PNO 模型以及 MSAEM 在实践中的实用性。
{"title":"MSAEM Estimation for Confirmatory Multidimensional Four-Parameter Normal Ogive Models","authors":"Jia Liu,&nbsp;Xiangbin Meng,&nbsp;Gongjun Xu,&nbsp;Wei Gao,&nbsp;Ningzhong Shi","doi":"10.1111/jedm.12378","DOIUrl":"10.1111/jedm.12378","url":null,"abstract":"<p>In this paper, we develop a mixed stochastic approximation expectation-maximization (MSAEM) algorithm coupled with a Gibbs sampler to compute the marginalized maximum a posteriori estimate (MMAPE) of a confirmatory multidimensional four-parameter normal ogive (M4PNO) model. The proposed MSAEM algorithm not only has the computational advantages of the stochastic approximation expectation-maximization (SAEM) algorithm for multidimensional data, but it also alleviates the potential instability caused by label-switching, and then improved the estimation accuracy. Simulation studies are conducted to illustrate the good performance of the proposed MSAEM method, where MSAEM consistently performs better than SAEM and some other existing methods in multidimensional item response theory. Moreover, the proposed method is applied to a real data set from the 2018 Programme for International Student Assessment (PISA) to demonstrate the usefulness of the 4PNO model as well as MSAEM in practice.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 1","pages":"99-124"},"PeriodicalIF":1.3,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135146227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring the Impact of Peer Interaction in Group Oral Assessments with an Extended Many-Facet Rasch Model 用扩展的多面 Rasch 模型衡量小组口语评估中同伴互动的影响
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2023-09-15 DOI: 10.1111/jedm.12375
Kuan-Yu Jin, Thomas Eckes

Many language proficiency tests include group oral assessments involving peer interaction. In such an assessment, examinees discuss a common topic with others. Human raters score each examinee's spoken performance on specially designed criteria. However, measurement models for analyzing group assessment data usually assume local person independence and thus fail to consider the impact of peer interaction on the assessment outcomes. This research advances an extended many-facet Rasch model for group assessments (MFRM-GA), accounting for local person dependence. In a series of simulations, we examined the MFRM-GA's parameter recovery and the consequences of ignoring peer interactions under the traditional modeling approach. We also used a real dataset from the English-speaking test of the Language Proficiency Assessment for Teachers (LPAT) routinely administered in Hong Kong to illustrate the efficiency of the new model. The discussion focuses on the model's usefulness for measuring oral language proficiency, practical implications, and future research perspectives.

许多语言水平测试都包括涉及同伴互动的小组口语评估。在这种评估中,受试者与其他人讨论一个共同的话题。人工评分员根据专门设计的标准对每个受测者的口语表现进行评分。然而,用于分析小组评估数据的测量模型通常假定本地人是独立的,因此无法考虑同伴互动对评估结果的影响。本研究提出了一种用于小组测评的扩展多面 Rasch 模型(MFRM-GA),该模型考虑了局部人的依赖性。在一系列模拟中,我们检验了 MFRM-GA 的参数恢复情况,以及在传统建模方法下忽略同伴互动的后果。我们还使用了一个真实的数据集,该数据集来自于香港教师语言能力评估(LPAT)的英语测试,该测试在香港常规进行,以说明新模型的效率。讨论的重点是该模型在测量口语能力方面的实用性、实际意义以及未来的研究前景。
{"title":"Measuring the Impact of Peer Interaction in Group Oral Assessments with an Extended Many-Facet Rasch Model","authors":"Kuan-Yu Jin,&nbsp;Thomas Eckes","doi":"10.1111/jedm.12375","DOIUrl":"10.1111/jedm.12375","url":null,"abstract":"<p>Many language proficiency tests include group oral assessments involving peer interaction. In such an assessment, examinees discuss a common topic with others. Human raters score each examinee's spoken performance on specially designed criteria. However, measurement models for analyzing group assessment data usually assume local person independence and thus fail to consider the impact of peer interaction on the assessment outcomes. This research advances an extended many-facet Rasch model for group assessments (MFRM-GA), accounting for local person dependence. In a series of simulations, we examined the MFRM-GA's parameter recovery and the consequences of ignoring peer interactions under the traditional modeling approach. We also used a real dataset from the English-speaking test of the Language Proficiency Assessment for Teachers (LPAT) routinely administered in Hong Kong to illustrate the efficiency of the new model. The discussion focuses on the model's usefulness for measuring oral language proficiency, practical implications, and future research perspectives.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 1","pages":"47-68"},"PeriodicalIF":1.3,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135352749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sociocognitive Processes and Item Response Models: A Didactic Example 社会认知过程与项目反应模型:教学实例
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2023-09-15 DOI: 10.1111/jedm.12376
Tao Gong, Lan Shuai, Robert J. Mislevy

The usual interpretation of the person and task variables in between-persons measurement models such as item response theory (IRT) is as attributes of persons and tasks, respectively. They can be viewed instead as ensemble descriptors of patterns of interactions among persons and situations that arise from sociocognitive complex adaptive system (CASs). This view offers insights for interpreting and using between-persons measurement models and connecting with sociocognitive research. In this article, we use data generated from an agent-based model to illustrate relations between “social” and “cognitive” features of a simple underlying CAS and the variables of an IRT model fit to resulting data. We note how the ideas connect to explanatory item response modeling and briefly comment on implications for score interpretations and uses in practice.

在人与人之间的测量模型(如项目反应理论(IRT))中,通常将人和任务变量分别解释为人和任务的属性。相反,它们可以被看作是社会认知复杂适应系统(CAS)中产生的人与情境之间互动模式的集合描述符。这种观点为解释和使用人与人之间的测量模型以及与社会认知研究的联系提供了启示。在本文中,我们利用一个基于代理的模型所产生的数据,来说明一个简单的基本 CAS 的 "社会 "和 "认知 "特征之间的关系,以及与所产生的数据相适应的 IRT 模型的变量之间的关系。我们指出了这些观点与解释性项目反应模型之间的联系,并简要评述了对分数解释和实际应用的影响。
{"title":"Sociocognitive Processes and Item Response Models: A Didactic Example","authors":"Tao Gong,&nbsp;Lan Shuai,&nbsp;Robert J. Mislevy","doi":"10.1111/jedm.12376","DOIUrl":"10.1111/jedm.12376","url":null,"abstract":"<p>The usual interpretation of the person and task variables in between-persons measurement models such as item response theory (IRT) is as attributes of persons and tasks, respectively. They can be viewed instead as ensemble descriptors of patterns of interactions among persons and situations that arise from sociocognitive complex adaptive system (CASs). This view offers insights for interpreting and using between-persons measurement models and connecting with sociocognitive research. In this article, we use data generated from an agent-based model to illustrate relations between “social” and “cognitive” features of a simple underlying CAS and the variables of an IRT model fit to resulting data. We note how the ideas connect to explanatory item response modeling and briefly comment on implications for score interpretations and uses in practice.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 1","pages":"150-173"},"PeriodicalIF":1.3,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135397635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Derek C. Briggs Historical and Conceptual Foundations of Measurement in the Human Sciences: Credos and Controversies Derek C. Briggs 人类科学测量的历史和概念基础:信誉与争议
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2023-09-09 DOI: 10.1111/jedm.12374
David Torres Irribarra
{"title":"Derek C. Briggs Historical and Conceptual Foundations of Measurement in the Human Sciences: Credos and Controversies","authors":"David Torres Irribarra","doi":"10.1111/jedm.12374","DOIUrl":"10.1111/jedm.12374","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 4","pages":"739-746"},"PeriodicalIF":1.3,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136192279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Response Time in Multidimensional Computerized Adaptive Testing 响应时间在多维计算机自适应测试中的应用
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2023-07-07 DOI: 10.1111/jedm.12373
Yinhong He, Yuanyuan Qi

In multidimensional computerized adaptive testing (MCAT), item selection strategies are generally constructed based on responses, and they do not consider the response times required by items. This study constructed two new criteria (referred to as DT-inc and DT) for MCAT item selection by utilizing information from response times. The new designs maximize the amount of information per unit time. Furthermore, these two new designs were extended to the DTS-inc and DTS designs to efficiently estimate intentional abilities. Moreover, the EAP method for ability estimation was also equipped with response time. The performances of the response-time-based EAP (RT-based EAP) and the new designs were evaluated in simulation and empirical studies. The results showed that the RT-based EAP significantly improved the ability estimation precision compared with the EAP without using response time, and the new designs dramatically saved testing times for examinees with a small sacrifice of ability estimation precision and item pool usage.

在多维计算机自适应测试(MCAT)中,题项选择策略通常是基于反应来构建的,而没有考虑题项所需的反应时间。本研究利用反应时间的信息,构建了MCAT题项选择的DT-inc和DT两个新标准。新的设计使单位时间内的信息量最大化。此外,将这两种新设计扩展到DTS-inc和DTS设计中,以有效地评估意向能力。此外,EAP能力估计方法还配备了响应时间。通过仿真和实证研究对基于响应时间的EAP (RT-based EAP)和新设计的性能进行了评价。结果表明,与不考虑反应时间的EAP相比,基于rt的EAP显著提高了考生的能力估计精度,新设计显著节省了考生的测试时间,同时降低了能力估计精度和题库使用率。
{"title":"Using Response Time in Multidimensional Computerized Adaptive Testing","authors":"Yinhong He,&nbsp;Yuanyuan Qi","doi":"10.1111/jedm.12373","DOIUrl":"10.1111/jedm.12373","url":null,"abstract":"<p>In multidimensional computerized adaptive testing (MCAT), item selection strategies are generally constructed based on responses, and they do not consider the response times required by items. This study constructed two new criteria (referred to as DT-inc and DT) for MCAT item selection by utilizing information from response times. The new designs maximize the amount of information per unit time. Furthermore, these two new designs were extended to the DT<sub>S</sub>-inc and DT<sub>S</sub> designs to efficiently estimate intentional abilities. Moreover, the EAP method for ability estimation was also equipped with response time. The performances of the response-time-based EAP (RT-based EAP) and the new designs were evaluated in simulation and empirical studies. The results showed that the RT-based EAP significantly improved the ability estimation precision compared with the EAP without using response time, and the new designs dramatically saved testing times for examinees with a small sacrifice of ability estimation precision and item pool usage.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 4","pages":"697-738"},"PeriodicalIF":1.3,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48931962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital dependence: Online fatigue and coping strategies during the COVID-19 lockdown. 数字依赖:COVID-19 封锁期间的在线疲劳和应对策略。
4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2023-07-01 Epub Date: 2023-02-11 DOI: 10.1177/01634437231154781
Emilie Munch Gregersen, Sofie Læbo Astrupgaard, Malene Hornstrup Jespersen, Tobias Priesholm Gårdhus, Kristoffer Albris

As the COVID-19 pandemic lockdowns forced populations across the world to become completely dependent on digital devices for working, studying, and socializing, there has been no shortage of published studies about the possible negative effects of the increased use of digital devices during this exceptional period. In seeking to empirically address how the concern with digital dependency has been experienced during the pandemic, we present findings from a study of daily self-reported logbooks by 59 university students in Copenhagen, Denmark, over 4 weeks in April and May 2020, investigating their everyday use of digital devices. We highlight two main findings. First, students report high levels of online fatigue, expressed as frustration with their constant reliance on digital devices. On the other hand, students found creative ways of using digital devices for maintaining social relations, helping them to cope with isolation. Such online interactions were nevertheless seen as a poor substitute for physical interactions in the long run. Our findings show how the dependence on digital devices was marked by ambivalence, where digital communication was seen as both the cure against, and cause of, feeling isolated and estranged from a sense of normality.

COVID-19 大流行的封锁迫使世界各地的人们在工作、学习和社交中完全依赖于数字设备,关于在这一特殊时期更多地使用数字设备可能产生的负面影响的研究也不乏发表。为了以实证研究的方式探讨大流行病期间人们对数码设备依赖性的担忧,我们在 2020 年 4 月至 5 月的 4 周时间里,对丹麦哥本哈根的 59 名大学生每天自我报告的日志进行了研究,调查了他们对数码设备的日常使用情况。我们强调两个主要发现。首先,学生们报告了高度的在线疲劳,表现为对持续依赖数字设备的挫败感。另一方面,学生们发现了使用数字设备维持社会关系的创造性方法,帮助他们应对孤独。然而,从长远来看,这种在线互动并不能很好地替代实体互动。我们的研究结果表明,对数字设备的依赖带有矛盾的特点,数字通信既被视为消除孤独感和疏离感的良药,也被视为造成孤独感和疏离感的原因。
{"title":"Digital dependence: Online fatigue and coping strategies during the COVID-19 lockdown.","authors":"Emilie Munch Gregersen, Sofie Læbo Astrupgaard, Malene Hornstrup Jespersen, Tobias Priesholm Gårdhus, Kristoffer Albris","doi":"10.1177/01634437231154781","DOIUrl":"10.1177/01634437231154781","url":null,"abstract":"<p><p>As the COVID-19 pandemic lockdowns forced populations across the world to become completely dependent on digital devices for working, studying, and socializing, there has been no shortage of published studies about the possible negative effects of the increased use of digital devices during this exceptional period. In seeking to empirically address how the concern with digital dependency has been experienced during the pandemic, we present findings from a study of daily self-reported logbooks by 59 university students in Copenhagen, Denmark, over 4 weeks in April and May 2020, investigating their everyday use of digital devices. We highlight two main findings. First, students report high levels of online fatigue, expressed as frustration with their constant reliance on digital devices. On the other hand, students found creative ways of using digital devices for maintaining social relations, helping them to cope with isolation. Such online interactions were nevertheless seen as a poor substitute for physical interactions in the long run. Our findings show how the dependence on digital devices was marked by ambivalence, where digital communication was seen as both the cure against, and cause of, feeling isolated and estranged from a sense of normality.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"33 1","pages":"967-984"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9922647/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85419232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests 测试项目格式中的性别偏见:来自PISA 2009、2012和2015年数学和阅读测试的证据
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2023-06-09 DOI: 10.1111/jedm.12372
Benjamin R. Shear

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents evidence that among nationally representative samples of 15-year-olds in the United States participating in the 2009, 2012, and 2015 PISA math and reading tests, there are consistent item format by gender differences. On average, male students answer multiple-choice items correctly relatively more often and female students answer constructed-response items correctly relatively more often. These patterns were consistent across 34 additional participating PISA jurisdictions, although the size of the format differences varied and were larger on average in reading than math. The average magnitude of the format differences is not large enough to be flagged in routine differential item functioning analyses intended to detect test bias but is large enough to raise questions about the validity of inferences based on comparisons of scores across gender groups. Researchers and other test users should account for test item format, particularly when comparing scores across gender groups.

大规模标准化考试通常用于衡量学生的整体成绩和学生分组。这些用途假设测试提供了跨学生亚组结果的可比测量,但先前的研究表明,跨性别群体的分数比较可能因使用的测试项目类型而变得复杂。本文提供的证据表明,在参加2009年、2012年和2015年PISA数学和阅读测试的美国15岁学生的全国代表性样本中,性别差异的项目格式是一致的。平均而言,男学生答对多项选择题的频率相对较高,女学生答对构念题的频率相对较高。这些模式在另外34个参与PISA的司法管辖区是一致的,尽管格式差异的大小各不相同,阅读的平均差异大于数学。格式差异的平均幅度不足以在旨在检测测试偏差的常规差异项目功能分析中进行标记,但足以对基于跨性别群体得分比较的推断的有效性提出质疑。研究人员和其他测试用户应该考虑到测试项目的格式,特别是在比较不同性别群体的分数时。
{"title":"Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests","authors":"Benjamin R. Shear","doi":"10.1111/jedm.12372","DOIUrl":"10.1111/jedm.12372","url":null,"abstract":"<p>Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents evidence that among nationally representative samples of 15-year-olds in the United States participating in the 2009, 2012, and 2015 PISA math and reading tests, there are consistent item format by gender differences. On average, male students answer multiple-choice items correctly relatively more often and female students answer constructed-response items correctly relatively more often. These patterns were consistent across 34 additional participating PISA jurisdictions, although the size of the format differences varied and were larger on average in reading than math. The average magnitude of the format differences is not large enough to be flagged in routine differential item functioning analyses intended to detect test bias but is large enough to raise questions about the validity of inferences based on comparisons of scores across gender groups. Researchers and other test users should account for test item format, particularly when comparing scores across gender groups.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 4","pages":"676-696"},"PeriodicalIF":1.3,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42035945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Educational Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1