首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
Discriminant Validity of Interval Response Formats: Investigating the Dimensional Structure of Interval Widths. 区间反应格式的区分效力:调查区间宽度的维度结构。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-01 Epub Date: 2024-11-25 DOI: 10.1177/00131644241283400
Matthias Kloft, Daniel W Heck

In psychological research, respondents are usually asked to answer questions with a single response value. A useful alternative are interval response formats like the dual-range slider (DRS) where respondents provide an interval with a lower and an upper bound for each item. Interval responses may be used to measure psychological constructs such as variability in the domain of personality (e.g., self-ratings), uncertainty in estimation tasks (e.g., forecasting), and ambiguity in judgments (e.g., concerning the pragmatic use of verbal quantifiers). However, it is unclear whether respondents are sensitive to the requirements of a particular task and whether interval widths actually measure the constructs of interest. To test the discriminant validity of interval widths, we conducted a study in which respondents answered 92 items belonging to seven different tasks from the domains of personality, estimation, and judgment. We investigated the dimensional structure of interval widths by fitting exploratory and confirmatory factor models while using an appropriate multivariate logit function to transform the bounded interval responses. The estimated factorial structure closely followed the theoretically assumed structure of the tasks, which varied in their degree of similarity. We did not find a strong overarching general factor, which speaks against a response style influencing interval widths across all tasks and domains. Overall, this indicates that respondents are sensitive to the requirements of different tasks and domains when using interval response formats.

在心理研究中,受访者通常会被要求用单一的回答值来回答问题。双区间滑动条(DRS)等区间回答格式是一种有用的替代方法,在这种方法中,受访者为每个项目提供一个具有下限和上限的区间。区间回答可用于测量心理结构,如人格领域中的变异性(如自我评价)、估计任务中的不确定性(如预测)以及判断中的模糊性(如有关言语量词的实际使用)。然而,目前还不清楚被调查者是否对特定任务的要求敏感,也不清楚区间宽度是否真正测量了所关注的结构。为了检验区间宽度的判别效度,我们进行了一项研究,受访者回答了属于人格、估计和判断等领域的七项不同任务的 92 个项目。我们通过拟合探索性和确认性因子模型来研究区间宽度的维度结构,同时使用适当的多元对数函数来转换有界区间的回答。估计的因子结构与理论上假设的任务结构密切相关,而任务的相似程度各不相同。我们并没有发现一个强有力的总体因素,这说明在所有任务和领域中,影响区间宽度的反应风格并不存在。总体而言,这表明受访者在使用间隔回答格式时对不同任务和领域的要求非常敏感。
{"title":"Discriminant Validity of Interval Response Formats: Investigating the Dimensional Structure of Interval Widths.","authors":"Matthias Kloft, Daniel W Heck","doi":"10.1177/00131644241283400","DOIUrl":"10.1177/00131644241283400","url":null,"abstract":"<p><p>In psychological research, respondents are usually asked to answer questions with a single response value. A useful alternative are interval response formats like the dual-range slider (DRS) where respondents provide an interval with a lower and an upper bound for each item. Interval responses may be used to measure psychological constructs such as variability in the domain of personality (e.g., self-ratings), uncertainty in estimation tasks (e.g., forecasting), and ambiguity in judgments (e.g., concerning the pragmatic use of verbal quantifiers). However, it is unclear whether respondents are sensitive to the requirements of a particular task and whether interval widths actually measure the constructs of interest. To test the discriminant validity of interval widths, we conducted a study in which respondents answered 92 items belonging to seven different tasks from the domains of personality, estimation, and judgment. We investigated the dimensional structure of interval widths by fitting exploratory and confirmatory factor models while using an appropriate multivariate logit function to transform the bounded interval responses. The estimated factorial structure closely followed the theoretically assumed structure of the tasks, which varied in their degree of similarity. We did not find a strong overarching general factor, which speaks against a response style influencing interval widths across all tasks and domains. Overall, this indicates that respondents are sensitive to the requirements of different tasks and domains when using interval response formats.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"565-588"},"PeriodicalIF":2.3,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586930/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142727066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novick Meets Bayes: Improving the Assessment of Individual Students in Educational Practice and Research by Capitalizing on Assessors' Prior Beliefs. 诺维克与贝叶斯:利用评估者的先验信念,改进教育实践和研究中对学生个体的评估。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-01 Epub Date: 2024-11-25 DOI: 10.1177/00131644241296139
Steffen Zitzmann, Gabe A Orona, Julian F Lohmann, Christoph König, Lisa Bardach, Martin Hecht

The assessment of individual students is not only crucial in the school setting but also at the core of educational research. Although classical test theory focuses on maximizing insights from student responses, the Bayesian perspective incorporates the assessor's prior belief, thereby enriching assessment with knowledge gained from previous interactions with the student or with similar students. We propose and illustrate a formal Bayesian approach that not only allows to form a stronger belief about a student's competency but also offers a more accurate assessment than classical test theory. In addition, we propose a straightforward method for gauging prior beliefs using two specific items and point to the possibility to integrate additional information.

对学生个体的评估不仅在学校环境中至关重要,而且也是教育研究的核心。尽管经典测试理论侧重于最大限度地从学生的回答中获得启示,但贝叶斯视角将评估者的先验信念纳入其中,从而利用以前与学生或类似学生的互动中获得的知识丰富评估内容。我们提出并举例说明了一种正式的贝叶斯方法,这种方法不仅能让评估者对学生的能力形成更强的信念,还能提供比经典测试理论更准确的评估。此外,我们还提出了一种利用两个特定项目衡量先验信念的直接方法,并指出了整合其他信息的可能性。
{"title":"Novick Meets Bayes: Improving the Assessment of Individual Students in Educational Practice and Research by Capitalizing on Assessors' Prior Beliefs.","authors":"Steffen Zitzmann, Gabe A Orona, Julian F Lohmann, Christoph König, Lisa Bardach, Martin Hecht","doi":"10.1177/00131644241296139","DOIUrl":"10.1177/00131644241296139","url":null,"abstract":"<p><p>The assessment of individual students is not only crucial in the school setting but also at the core of educational research. Although classical test theory focuses on maximizing insights from student responses, the Bayesian perspective incorporates the assessor's prior belief, thereby enriching assessment with knowledge gained from previous interactions with the student or with similar students. We propose and illustrate a formal Bayesian approach that not only allows to form a stronger belief about a student's competency but also offers a more accurate assessment than classical test theory. In addition, we propose a straightforward method for gauging prior beliefs using two specific items and point to the possibility to integrate additional information.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"483-506"},"PeriodicalIF":2.3,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586934/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142727068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Invariance: What Does Measurement Invariance Allow Us to Claim? 不变性:测量不变性能让我们宣称什么?
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-01 Epub Date: 2024-10-28 DOI: 10.1177/00131644241282982
John Protzko

Measurement involves numerous theoretical and empirical steps-ensuring our measures are operating the same in different groups is one step. Measurement invariance occurs when the factor loadings and item intercepts or thresholds of a scale operate similarly for people at the same level of the latent variable in different groups. This is commonly assumed to mean the scale is measuring the same thing in those groups. Here we test the assumption of extending measurement invariance to mean common measurement by randomly assigning American adults (N = 1500) to fill out scales assessing a coherent factor (search for meaning in life) or a nonsense factor measuring nothing. We find a nonsense scale with items measuring nothing shows strong measurement invariance with the original scale, is reliable, and covaries with other constructs. We show measurement invariance can occur without measurement. Thus, we cannot infer that measurement invariance means one is measuring the same thing, it may be a necessary but not a sufficient condition.

测量涉及许多理论和经验步骤--确保我们的测量在不同群体中的操作相同就是其中一步。当一个量表的因子载荷和项目截距或阈值在不同群体中处于同一潜变量水平的人身上运行相似时,就会出现测量不变性。这通常被假定为量表在这些群体中测量的是相同的东西。在这里,我们通过随机分配美国成年人(N = 1500)填写量表,评估一个连贯因子(寻找人生意义)或一个什么都不测量的无意义因子,来检验将测量不变性扩展到共同测量的假设。我们发现,在无意义量表中,什么都不测量的项目显示出与原始量表很强的测量不变性、可靠性以及与其他结构的协变性。我们表明,测量不变性可以在没有测量的情况下发生。因此,我们不能推断测量不变性意味着测量的是同一事物,它可能是一个必要条件,但不是充分条件。
{"title":"Invariance: What Does Measurement Invariance Allow Us to Claim?","authors":"John Protzko","doi":"10.1177/00131644241282982","DOIUrl":"10.1177/00131644241282982","url":null,"abstract":"<p><p>Measurement involves numerous theoretical and empirical steps-ensuring our measures are operating the same in different groups is one step. Measurement invariance occurs when the factor loadings and item intercepts or thresholds of a scale operate similarly for people at the same level of the latent variable in different groups. This is commonly assumed to mean the scale is measuring the same thing in those groups. Here we test the assumption of extending measurement invariance to mean common measurement by randomly assigning American adults (<i>N</i> = 1500) to fill out scales assessing a coherent factor (search for meaning in life) or a nonsense factor measuring nothing. We find a nonsense scale with items measuring nothing shows strong measurement invariance with the original scale, is reliable, and covaries with other constructs. We show measurement invariance can occur without measurement. Thus, we cannot infer that measurement invariance means one is measuring the same thing, it may be a necessary but not a sufficient condition.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"458-482"},"PeriodicalIF":2.3,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562939/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Performance of a Regularized Differential Item Functioning Method for Testlet-Based Polytomous Items. 评估基于测试的多同构项目的正则化微分项目功能方法的性能。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-31 DOI: 10.1177/00131644251342512
Jing Huang, M David Miller, Anne Corinne Huggins-Manley, Walter L Leite, Herman T Knopf, Albert D Ritzhaupt

This study investigated the effect of testlets on regularization-based differential item functioning (DIF) detection in polytomous items, focusing on the generalized partial credit model with lasso penalization (GPCMlasso) DIF method. Five factors were manipulated: sample size, magnitude of testlet effect, magnitude of DIF, number of DIF items, and type of DIF-inducing covariates. Model performance was evaluated using false-positive rate (FPR) and true-positive rate (TPR). Results showed that the simulation had effective control of FPR across conditions, while the TPR was differentially influenced by the manipulated factors. Generally, the small testlet effect did not noticeably affect the GPCMlasso model's performance regarding FPR and TPR. The findings provide evidence of the effectiveness of the GPCMlasso method for DIF detection in polytomous items when testlets were present. The implications for future research and limitations were also discussed.

本研究探讨了测试对基于正则化的多同构项目差异项目功能(DIF)检测的影响,重点研究了基于套索惩罚的广义部分信用模型(GPCMlasso) DIF方法。五个因素被操纵:样本量、测试效应的大小、DIF的大小、DIF项目的数量和诱发DIF的协变量类型。采用假阳性率(FPR)和真阳性率(TPR)评价模型性能。结果表明,仿真对不同工况下的FPR有较好的控制效果,但TPR受操纵因素的影响存在差异。一般来说,小测试子效应对GPCMlasso模型在FPR和TPR方面的性能影响不明显。研究结果提供了证据的有效性的GPCMlasso方法的DIF检测多染色体项目时,存在的测试。讨论了未来研究的意义和局限性。
{"title":"Evaluating the Performance of a Regularized Differential Item Functioning Method for Testlet-Based Polytomous Items.","authors":"Jing Huang, M David Miller, Anne Corinne Huggins-Manley, Walter L Leite, Herman T Knopf, Albert D Ritzhaupt","doi":"10.1177/00131644251342512","DOIUrl":"10.1177/00131644251342512","url":null,"abstract":"<p><p>This study investigated the effect of testlets on regularization-based differential item functioning (DIF) detection in polytomous items, focusing on the generalized partial credit model with lasso penalization (GPCMlasso) DIF method. Five factors were manipulated: sample size, magnitude of testlet effect, magnitude of DIF, number of DIF items, and type of DIF-inducing covariates. Model performance was evaluated using false-positive rate (FPR) and true-positive rate (TPR). Results showed that the simulation had effective control of FPR across conditions, while the TPR was differentially influenced by the manipulated factors. Generally, the small testlet effect did not noticeably affect the GPCMlasso model's performance regarding FPR and TPR. The findings provide evidence of the effectiveness of the GPCMlasso method for DIF detection in polytomous items when testlets were present. The implications for future research and limitations were also discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251342512"},"PeriodicalIF":2.1,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12126468/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144207999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beta-Binomial Model for Count Data: An Application in Estimating Model-Based Oral Reading Fluency. 计数数据的β -二项模型:在评估基于模型的口语阅读流畅性中的应用。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-30 DOI: 10.1177/00131644251335914
Xin Qiao, Akihito Kamata, Yusuf Kara, Cornelis Potgieter, Joseph F T Nese

In this article, the beta-binomial model for count data is proposed and demonstrated in terms of its application in the context of oral reading fluency (ORF) assessment, where the number of words read correctly (WRC) is of interest. Existing studies adopted the binomial model for count data in similar assessment scenarios. The beta-binomial model, however, takes into account extra variability in count data that have been neglected by the binomial model. Therefore, it accommodates potential overdispersion in count data compared to the binomial model. To estimate model-based ORF scores, WRC and response times were jointly modeled. The full Bayesian Markov chain Monte Carlo method was adopted for model parameter estimation. A simulation study showed adequate parameter recovery of the beta-binomial model and evaluated the performance of model fit indices in selecting the true data-generating models. Further, an empirical analysis illustrated the application of the proposed model using a dataset from a computerized ORF assessment. The obtained findings were consistent with the simulation study and demonstrated the utility of adopting the beta-binomial model for count-type item responses from assessment data.

在本文中,提出了计数数据的β -二项模型,并就其在口语阅读流畅性(ORF)评估中的应用进行了演示,其中正确阅读的单词数(WRC)是感兴趣的。现有研究对类似评估情景的计数数据采用二项模型。然而,β -二项模型考虑到了二项模型所忽略的计数数据的额外可变性。因此,与二项模型相比,它可以容纳计数数据中潜在的过分散。为了估计基于模型的ORF分数,WRC和响应时间被联合建模。模型参数估计采用全贝叶斯马尔可夫链蒙特卡罗方法。仿真研究表明,β -二项模型具有足够的参数恢复能力,并评价了模型拟合指标在选择真实数据生成模型中的性能。此外,利用计算机化ORF评估的数据集进行实证分析,说明了所提出模型的应用。得到的结果与模拟研究一致,并证明了采用β -二项模型对评估数据的计数型项目反应的效用。
{"title":"Beta-Binomial Model for Count Data: An Application in Estimating Model-Based Oral Reading Fluency.","authors":"Xin Qiao, Akihito Kamata, Yusuf Kara, Cornelis Potgieter, Joseph F T Nese","doi":"10.1177/00131644251335914","DOIUrl":"10.1177/00131644251335914","url":null,"abstract":"<p><p>In this article, the beta-binomial model for count data is proposed and demonstrated in terms of its application in the context of oral reading fluency (ORF) assessment, where the number of words read correctly (WRC) is of interest. Existing studies adopted the binomial model for count data in similar assessment scenarios. The beta-binomial model, however, takes into account extra variability in count data that have been neglected by the binomial model. Therefore, it accommodates potential overdispersion in count data compared to the binomial model. To estimate model-based ORF scores, WRC and response times were jointly modeled. The full Bayesian Markov chain Monte Carlo method was adopted for model parameter estimation. A simulation study showed adequate parameter recovery of the beta-binomial model and evaluated the performance of model fit indices in selecting the true data-generating models. Further, an empirical analysis illustrated the application of the proposed model using a dataset from a computerized ORF assessment. The obtained findings were consistent with the simulation study and demonstrated the utility of adopting the beta-binomial model for count-type item responses from assessment data.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251335914"},"PeriodicalIF":2.1,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125017/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144198554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Thurstonian IRT Modeling: Logical Dependencies as an Accurate Reflection of Thurstone's Law of Comparative Judgment. 贝叶斯瑟斯顿IRT模型:逻辑依赖是瑟斯顿比较判断定律的准确反映。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-30 DOI: 10.1177/00131644251335586
Hannah Heister, Philipp Doebler, Susanne Frick

Thurstonian item response theory (Thurstonian IRT) is a well-established approach to latent trait estimation with forced choice data of arbitrary block lengths. In the forced choice format, test takers rank statements within each block. This rank is coded with binary variables. Since each rank is awarded exactly once per block, stochastic dependencies arise, for example, when options A and B have ranks 1 and 3, C must have rank 2 in a block of length 3. Although the original implementation of the Thurstonian IRT model can recover parameters well, it is not completely true to the mathematical model and Thurstone's law of comparative judgment, as impossible binary answer patterns have a positive probability. We refer to this problem as stochastic dependencies and it is due to unconstrained item intercepts. In addition, there are redundant binary comparisons resulting in what we call logical dependencies, for example, if within a block A < B and B < C , then A < C must follow and a binary variable for A < C is not needed. Since current Markov Chain Monte Carlo approaches to Bayesian computation are flexible and at the same time promise correct small sample inference, we investigate an alternative Bayesian implementation of the Thurstonian IRT model considering both stochastic and logical dependencies. We show analytically that the same parameters maximize the posterior likelihood, regardless of the presence or absence of redundant binary comparisons. A comparative simulation reveals a large reduction in computational effort for the alternative implementation, which is due to respecting both dependencies. Therefore, this investigation suggests that when fitting the Thurstonian IRT model, all dependencies should be considered.

Thurstonian项目反应理论(Thurstonian IRT)是一种基于任意块长度强迫选择数据的潜在特质估计方法。在强制选择的形式中,考生在每个单元中对语句进行排名。这个排名是用二进制变量编码的。由于每个区块只授予一次等级,因此会产生随机依赖性,例如,当选项A和B的等级分别为1和3时,C在长度为3的区块中必须是2级。虽然最初实现的thurston IRT模型可以很好地恢复参数,但它并不完全符合数学模型和Thurstone比较判断定律,因为不可能的二元答案模式具有正概率。我们把这个问题称为随机依赖,它是由于无约束的项目拦截。此外,存在冗余的二进制比较,导致我们所谓的逻辑依赖,例如,如果在块中有a B和B C,则必须遵循a C,并且不需要a C的二进制变量。由于目前贝叶斯计算的马尔可夫链蒙特卡罗方法是灵活的,同时承诺正确的小样本推理,我们研究了考虑随机和逻辑依赖的Thurstonian IRT模型的替代贝叶斯实现。我们分析表明,无论是否存在冗余的二进制比较,相同的参数都能使后验似然最大化。对比模拟显示,由于尊重这两种依赖关系,替代实现的计算工作量大大减少。因此,本研究提示在拟合thurston IRT模型时,应考虑所有的依赖关系。
{"title":"Bayesian Thurstonian IRT Modeling: Logical Dependencies as an Accurate Reflection of Thurstone's Law of Comparative Judgment.","authors":"Hannah Heister, Philipp Doebler, Susanne Frick","doi":"10.1177/00131644251335586","DOIUrl":"10.1177/00131644251335586","url":null,"abstract":"<p><p>Thurstonian item response theory (Thurstonian IRT) is a well-established approach to latent trait estimation with forced choice data of arbitrary block lengths. In the forced choice format, test takers rank statements within each block. This rank is coded with binary variables. Since each rank is awarded exactly once per block, stochastic dependencies arise, for example, when options A and B have ranks 1 and 3, C must have rank 2 in a block of length 3. Although the original implementation of the Thurstonian IRT model can recover parameters well, it is not completely true to the mathematical model and Thurstone's law of comparative judgment, as impossible binary answer patterns have a positive probability. We refer to this problem as stochastic dependencies and it is due to unconstrained item intercepts. In addition, there are redundant binary comparisons resulting in what we call logical dependencies, for example, if within a block <math><mrow><mi>A</mi> <mo><</mo> <mi>B</mi></mrow> </math> and <math><mrow><mi>B</mi> <mo><</mo> <mi>C</mi></mrow> </math> , then <math><mrow><mi>A</mi> <mo><</mo> <mi>C</mi></mrow> </math> must follow and a binary variable for <math><mrow><mi>A</mi> <mo><</mo> <mi>C</mi></mrow> </math> is not needed. Since current Markov Chain Monte Carlo approaches to Bayesian computation are flexible and at the same time promise correct small sample inference, we investigate an alternative Bayesian implementation of the Thurstonian IRT model considering both stochastic and logical dependencies. We show analytically that the same parameters maximize the posterior likelihood, regardless of the presence or absence of redundant binary comparisons. A comparative simulation reveals a large reduction in computational effort for the alternative implementation, which is due to respecting both dependencies. Therefore, this investigation suggests that when fitting the Thurstonian IRT model, all dependencies should be considered.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251335586"},"PeriodicalIF":2.1,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125010/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144198553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Biclustering to Detect Cheating in Real Time on Mixed-Format Tests. 用双聚类实时检测混合格式考试作弊。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-24 DOI: 10.1177/00131644251333143
Hyeryung Lee, Walter P Vispoel

We evaluated a real-time biclustering method for detecting cheating on mixed-format assessments that included dichotomous, polytomous, and multi-part items. Biclustering jointly groups examinees and items by identifying subgroups of test takers who exhibit similar response patterns on specific subsets of items. This method's flexibility and minimal assumptions about examinee behavior make it computationally efficient and highly adaptable. To further finetune accuracy and reduce false positives in real-time detection, enhanced statistical significance tests were incorporated into the illustrated algorithms. Two simulation studies were conducted to assess detection across varying testing conditions. In the first study, the method effectively detected cheating on tests composed entirely of either dichotomous or non-dichotomous items. In the second study, we examined tests with varying mixed item formats and again observed strong detection performance. In both studies, detection performance was examined at each timestamp in real time and evaluated under three varying conditions: proportion of cheaters, cheating group size, and proportion of compromised items. Across conditions, the method demonstrated strong computational efficiency, underscoring its suitability for real-time applications. Overall, these results highlight the adaptability, versatility, and effectiveness of biclustering in detecting cheating in real time while maintaining low false-positive rates.

我们评估了一种实时双聚类方法,用于检测混合格式评估中的作弊行为,包括二分类、多分类和多部分项目。双聚类通过识别在特定项目子集上表现出相似反应模式的考生的子组来联合分组考生和项目。该方法的灵活性和对考生行为的最小假设使其计算效率高,适应性强。为了进一步微调准确性并减少实时检测中的误报,在所示算法中加入了增强的统计显著性检验。进行了两个模拟研究,以评估在不同测试条件下的检测。在第一项研究中,该方法有效地检测了完全由二分类或非二分类组成的测试中的作弊行为。在第二项研究中,我们检查了不同混合项目格式的测试,再次观察到很强的检测性能。在这两项研究中,检测性能在每个时间戳都被实时检查,并在三种不同的条件下进行评估:作弊者的比例、作弊群体的规模和受损物品的比例。在各种条件下,该方法都显示出强大的计算效率,强调了其对实时应用的适用性。总的来说,这些结果突出了双聚类在实时检测作弊同时保持低假阳性率方面的适应性、多功能性和有效性。
{"title":"Using Biclustering to Detect Cheating in Real Time on Mixed-Format Tests.","authors":"Hyeryung Lee, Walter P Vispoel","doi":"10.1177/00131644251333143","DOIUrl":"10.1177/00131644251333143","url":null,"abstract":"<p><p>We evaluated a real-time biclustering method for detecting cheating on mixed-format assessments that included dichotomous, polytomous, and multi-part items. Biclustering jointly groups examinees and items by identifying subgroups of test takers who exhibit similar response patterns on specific subsets of items. This method's flexibility and minimal assumptions about examinee behavior make it computationally efficient and highly adaptable. To further finetune accuracy and reduce false positives in real-time detection, enhanced statistical significance tests were incorporated into the illustrated algorithms. Two simulation studies were conducted to assess detection across varying testing conditions. In the first study, the method effectively detected cheating on tests composed entirely of either dichotomous or non-dichotomous items. In the second study, we examined tests with varying mixed item formats and again observed strong detection performance. In both studies, detection performance was examined at each timestamp in real time and evaluated under three varying conditions: proportion of cheaters, cheating group size, and proportion of compromised items. Across conditions, the method demonstrated strong computational efficiency, underscoring its suitability for real-time applications. Overall, these results highlight the adaptability, versatility, and effectiveness of biclustering in detecting cheating in real time while maintaining low false-positive rates.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251333143"},"PeriodicalIF":2.1,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12104213/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144156794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Deep Reinforcement Learning to Decide Test Length. 使用深度强化学习来决定测试长度。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-03 DOI: 10.1177/00131644251332972
James Zoucha, Igor Himelfarb, Nai-En Tang

This study explored the application of deep reinforcement learning (DRL) as an innovative approach to optimize test length. The primary focus was to evaluate whether the current length of the National Board of Chiropractic Examiners Part I Exam is justified. By modeling the problem as a combinatorial optimization task within a Markov Decision Process framework, an algorithm capable of constructing test forms from a finite set of items while adhering to critical structural constraints, such as content representation and item difficulty distribution, was used. The findings reveal that although the DRL algorithm was successful in identifying shorter test forms that maintained comparable ability estimation accuracy, the existing test length of 240 items remains advisable as we found shorter test forms did not maintain structural constraints. Furthermore, the study highlighted the inherent adaptability of DRL to continuously learn about a test-taker's latent abilities and dynamically adjust to their response patterns, making it well-suited for personalized testing environments. This dynamic capability supports real-time decision-making in item selection, improving both efficiency and precision in ability estimation. Future research is encouraged to focus on expanding the item bank and leveraging advanced computational resources to enhance the algorithm's search capacity for shorter, structurally compliant test forms.

本研究探索了深度强化学习(DRL)作为优化测试长度的创新方法的应用。主要的焦点是评估目前全国脊医考试委员会第一部分考试的长度是否合理。通过将问题建模为马尔可夫决策过程框架中的组合优化任务,使用了一种算法,该算法能够从有限的一组项目中构建测试表单,同时遵守关键的结构约束,如内容表示和项目难度分布。研究结果表明,尽管DRL算法成功地识别了较短的测试表格,并保持了相当的能力估计准确性,但现有的240个项目的测试长度仍然是可取的,因为我们发现较短的测试表格没有保持结构约束。此外,该研究还强调了DRL固有的适应性,即不断了解考生的潜在能力并动态调整他们的反应模式,使其非常适合个性化的测试环境。这种动态能力支持项目选择的实时决策,提高了能力估计的效率和精度。鼓励未来的研究将重点放在扩展题库和利用先进的计算资源来增强算法对较短的、结构兼容的测试表单的搜索能力。
{"title":"Using Deep Reinforcement Learning to Decide Test Length.","authors":"James Zoucha, Igor Himelfarb, Nai-En Tang","doi":"10.1177/00131644251332972","DOIUrl":"https://doi.org/10.1177/00131644251332972","url":null,"abstract":"<p><p>This study explored the application of deep reinforcement learning (DRL) as an innovative approach to optimize test length. The primary focus was to evaluate whether the current length of the National Board of Chiropractic Examiners Part I Exam is justified. By modeling the problem as a combinatorial optimization task within a Markov Decision Process framework, an algorithm capable of constructing test forms from a finite set of items while adhering to critical structural constraints, such as content representation and item difficulty distribution, was used. The findings reveal that although the DRL algorithm was successful in identifying shorter test forms that maintained comparable ability estimation accuracy, the existing test length of 240 items remains advisable as we found shorter test forms did not maintain structural constraints. Furthermore, the study highlighted the inherent adaptability of DRL to continuously learn about a test-taker's latent abilities and dynamically adjust to their response patterns, making it well-suited for personalized testing environments. This dynamic capability supports real-time decision-making in item selection, improving both efficiency and precision in ability estimation. Future research is encouraged to focus on expanding the item bank and leveraging advanced computational resources to enhance the algorithm's search capacity for shorter, structurally compliant test forms.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251332972"},"PeriodicalIF":2.1,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12049363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143988676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Change in Adjusted R-Square and R-Square Indices: A Latent Variable Method Application. 评价调整后r方和r方指数的变化:一种潜在变量法的应用。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-11 DOI: 10.1177/00131644251329178
Tenko Raykov, Christine DiStefano

A procedure for interval estimation of the difference in the adjusted R-square index for nested linear models is discussed. The method yields as a byproduct confidence intervals for their standard R-square difference, as well as for the adjusted and standard R-squares associated with each model. The resulting interval estimate of the difference in adjusted R-square represents a useful and informative complement to the commonly used R-square change statistic and its significance test in model selection and contains substantially more information than that test. The outlined procedure is readily employed with popular software in empirical educational and psychological studies and is illustrated with numerical data.

讨论了嵌套线性模型调整后r平方指数差的区间估计方法。该方法产生其标准r平方差的副产物置信区间,以及与每个模型相关的调整和标准r平方的置信区间。调整后r方差异的区间估计值是对模型选择中常用的r方变化统计量及其显著性检验的有用且信息丰富的补充,并且包含比该检验多得多的信息。概述的程序很容易在经验教育和心理学研究中使用流行的软件,并用数值数据说明。
{"title":"Evaluating Change in Adjusted <i>R</i>-Square and <i>R</i>-Square Indices: A Latent Variable Method Application.","authors":"Tenko Raykov, Christine DiStefano","doi":"10.1177/00131644251329178","DOIUrl":"https://doi.org/10.1177/00131644251329178","url":null,"abstract":"<p><p>A procedure for interval estimation of the difference in the adjusted <i>R</i>-square index for nested linear models is discussed. The method yields as a byproduct confidence intervals for their standard <i>R</i>-square difference, as well as for the adjusted and standard <i>R</i>-squares associated with each model. The resulting interval estimate of the difference in adjusted <i>R</i>-square represents a useful and informative complement to the commonly used <i>R</i>-square change statistic and its significance test in model selection and contains substantially more information than that test. The outlined procedure is readily employed with popular software in empirical educational and psychological studies and is illustrated with numerical data.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251329178"},"PeriodicalIF":2.1,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11993540/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143985479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differential Item Functioning Effect Size Use for Validity Information. 差异项目功能效应大小用于有效性信息。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-11-22 DOI: 10.1177/00131644241293694
W Holmes Finch, Maria Dolores Hidalgo Montesinos, Brian F French, Maria Hernandez Finch

There has been an emphasis on effect sizes for differential item functioning (DIF) with the purpose to understand the magnitude of the differences that are detected through statistical significance testing. Several different effect sizes have been suggested that correspond to the method used for analysis, as have different guidelines for interpretation. The purpose of this simulation study was to compare the performance of the DIF effect size measures described for quantifying and comparing the amount of DIF in two assessments. Several factors were manipulated that were thought to influence the effect sizes or are known to influence DIF detection. This study asked the following two questions. First, do the effect sizes accurately capture aggregate DIF across items? Second, do effect sizes accurately identify which assessment has the least amount of DIF? We highlight effect sizes that had support for performing well across several simulated conditions. We also apply these effect sizes to a real data set to provide an example. Results of the study revealed that the log odds ratio of fixed effects (Ln OR ¯ FE ) and the variance of the Mantel-Haenszel log odds ratio ( τ ^ 2 ) were most accurate for identifying which test contains more DIF. We point to future directions with this work to aid the continued focus on effect sizes to understand DIF magnitude.

人们一直在强调差异项目功能(DIF)的效应大小,目的是了解通过统计显著性检验发现的差异的程度。根据分析方法的不同,提出了几种不同的效应大小,以及不同的解释准则。本模拟研究的目的是比较用于量化和比较两个评估中 DIF 量的 DIF 效果大小测量的性能。对一些被认为会影响效应大小或已知会影响 DIF 检测的因素进行了操作。本研究提出了以下两个问题。首先,效应大小是否准确地反映了各项目之间的总体 DIF?其次,效应大小是否能准确确定哪项评估的 DIF 量最少?我们强调了在几种模拟条件下表现良好的效应大小。我们还将这些效应量应用于一个真实数据集,以提供一个示例。研究结果表明,固定效应的对数几率比(Ln OR ¯ FE)和曼特尔-海恩泽尔对数几率比的方差(τ ^ 2)对于识别哪种测试包含更多的 DIF 最为准确。我们指出了这项工作的未来方向,有助于继续关注效应大小以了解 DIF 的程度。
{"title":"Differential Item Functioning Effect Size Use for Validity Information.","authors":"W Holmes Finch, Maria Dolores Hidalgo Montesinos, Brian F French, Maria Hernandez Finch","doi":"10.1177/00131644241293694","DOIUrl":"10.1177/00131644241293694","url":null,"abstract":"<p><p>There has been an emphasis on effect sizes for differential item functioning (DIF) with the purpose to understand the magnitude of the differences that are detected through statistical significance testing. Several different effect sizes have been suggested that correspond to the method used for analysis, as have different guidelines for interpretation. The purpose of this simulation study was to compare the performance of the DIF effect size measures described for quantifying and comparing the amount of DIF in two assessments. Several factors were manipulated that were thought to influence the effect sizes or are known to influence DIF detection. This study asked the following two questions. First, do the effect sizes accurately capture aggregate DIF across items? Second, do effect sizes accurately identify which assessment has the least amount of DIF? We highlight effect sizes that had support for performing well across several simulated conditions. We also apply these effect sizes to a real data set to provide an example. Results of the study revealed that the log odds ratio of fixed effects (Ln <math> <mrow> <msub> <mrow> <mover><mrow><mi>OR</mi></mrow> <mo>¯</mo></mover> </mrow> <mrow><mi>FE</mi></mrow> </msub> </mrow> </math> ) and the variance of the Mantel-Haenszel log odds ratio ( <math> <mrow> <msup> <mrow> <mover><mrow><mi>τ</mi></mrow> <mo>^</mo></mover> </mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> ) were most accurate for identifying which test contains more DIF. We point to future directions with this work to aid the continued focus on effect sizes to understand DIF magnitude.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"258-276"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583394/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142709569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1