首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
Consistent Factor Score Regression: A Better Alternative for Uncorrected Factor Score Regression? 一致性因子得分回归:未校正因子得分回归的更好选择?
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-04 DOI: 10.1177/00131644251399588
Jasper Bogaert, Wen Wei Loh, Yves Rosseel

Researchers in the behavioral, educational, and social sciences often aim to analyze relationships among latent variables. Structural equation modeling (SEM) is widely regarded as the gold standard for this purpose. A straightforward alternative for estimating the structural model parameters is uncorrected factor score regression (UFSR), where factor scores are first computed and then employed in regression or path analysis. Unfortunately, the most commonly used factor scores (i.e., Regression and Bartlett factor scores) may yield biased estimates and invalid inferences when using this approach. In recent years, factor score regression (FSR) has enjoyed several methodological advancements to address this inconsistency. Despite these advancements, the use of FSR with correlation-preserving factor scores, here termed consistent factor score regression (cFSR), has received limited attention. In this paper, we revisit cFSR and compare its advantages and disadvantages relative to other recent FSR and SEM methods. We conducted an extensive simulation study comparing cFSR with other estimation approaches, assessing their performance in terms of convergence rate, bias, efficiency, and type I error rate. The findings indicate that cFSR outperforms UFSR while maintaining the conceptual simplicity of UFSR. We encourage behavioral, educational, and social science researchers to avoid UFSR and adopt cFSR as an alternative to SEM.

行为科学、教育科学和社会科学的研究人员经常致力于分析潜在变量之间的关系。结构方程模型(SEM)被广泛认为是这方面的金标准。估计结构模型参数的直接替代方法是未校正因子得分回归(UFSR),其中首先计算因子得分,然后将其用于回归或路径分析。不幸的是,当使用这种方法时,最常用的因子得分(即回归和巴特利特因子得分)可能会产生有偏差的估计和无效的推断。近年来,因子得分回归(FSR)在解决这种不一致性方面取得了一些方法上的进步。尽管取得了这些进步,但将FSR与保持相关性的因子得分(这里称为一致因子得分回归(cFSR))结合使用,受到的关注有限。在本文中,我们回顾了cFSR,并比较了它相对于其他最近的FSR和SEM方法的优点和缺点。我们进行了广泛的模拟研究,比较了cFSR和其他估计方法,评估了它们在收敛速度、偏差、效率和I型错误率方面的性能。研究结果表明,cFSR优于UFSR,同时保持了UFSR概念的简单性。我们鼓励行为科学、教育科学和社会科学研究人员避免UFSR,而采用cFSR作为SEM的替代品。
{"title":"Consistent Factor Score Regression: A Better Alternative for Uncorrected Factor Score Regression?","authors":"Jasper Bogaert, Wen Wei Loh, Yves Rosseel","doi":"10.1177/00131644251399588","DOIUrl":"10.1177/00131644251399588","url":null,"abstract":"<p><p>Researchers in the behavioral, educational, and social sciences often aim to analyze relationships among latent variables. Structural equation modeling (SEM) is widely regarded as the gold standard for this purpose. A straightforward alternative for estimating the structural model parameters is uncorrected factor score regression (UFSR), where factor scores are first computed and then employed in regression or path analysis. Unfortunately, the most commonly used factor scores (i.e., Regression and Bartlett factor scores) may yield biased estimates and invalid inferences when using this approach. In recent years, factor score regression (FSR) has enjoyed several methodological advancements to address this inconsistency. Despite these advancements, the use of FSR with correlation-preserving factor scores, here termed consistent factor score regression (cFSR), has received limited attention. In this paper, we revisit cFSR and compare its advantages and disadvantages relative to other recent FSR and SEM methods. We conducted an extensive simulation study comparing cFSR with other estimation approaches, assessing their performance in terms of convergence rate, bias, efficiency, and type I error rate. The findings indicate that cFSR outperforms UFSR while maintaining the conceptual simplicity of UFSR. We encourage behavioral, educational, and social science researchers to avoid UFSR and adopt cFSR as an alternative to SEM.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251399588"},"PeriodicalIF":2.3,"publicationDate":"2026-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774818/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145932674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empowering Expert Judgment: A Data-Driven Decision Framework for Standard Setting in High-Dimensional and Data-Scarce Assessments. 授权专家判断:高维和数据稀缺评估中标准设定的数据驱动决策框架。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-02 DOI: 10.1177/00131644251405406
Tianpeng Zheng, Zhehan Jiang, Zhichen Guo, Yuanfang Liu

A critical methodological challenge in standard setting arises in small-sample, high-dimensional contexts where the number of items substantially exceeds the number of examinees. Under such conditions, conventional data-driven methods that rely on parametric models (e.g., item response theory) often become unstable or fail due to unreliable parameter estimation. This study investigates two families of data-driven methods: information-theoretic and unsupervised clustering, offering a potential solution to this challenge. Using a Monte Carlo simulation, we systematically evaluate 15 such methods to establish an evidence-based framework for practice. The simulation manipulated five factors, including sample size, the item-to-examinee ratio, mixture proportions, item quality, and ability separation. Method performance was evaluated using multiple criteria, including Relative Error, Classification Accuracy, Sensitivity, Specificity, and Youden's Index. Results indicated that no single method is universally superior; the optimal choice depends on the examinee mixture proportion. Specifically, the information-theoretic method QIR (quantile information ratio) excelled in scenarios with a dominant non-competent group, where high specificity was critical. Conversely, in highly selective contexts with balanced proficiency groups, the clustering methods CHI (Calinski-Harabasz index) and sum of squared error (SSE) demonstrated the highest classification effectiveness. Bayesian kernel density estimation (BKDE), however, consistently performed as a robust, balanced method across conditions. These findings provide practitioners with a clear decision framework for selecting a defensible, data-driven standard-setting method when traditional approaches are infeasible.

在小样本、高维的环境中,当项目的数量大大超过考生的数量时,在标准设置中出现了一个关键的方法挑战。在这种情况下,依赖于参数模型(如项目反应理论)的传统数据驱动方法往往会因参数估计不可靠而变得不稳定或失败。本研究调查了两类数据驱动的方法:信息论和无监督聚类,为这一挑战提供了一个潜在的解决方案。使用蒙特卡罗模拟,我们系统地评估了15种这样的方法,以建立一个基于证据的实践框架。模拟操作了五个因素,包括样本量、项目与考生比例、混合比例、项目质量和能力分离。使用多种标准评估方法的性能,包括相对误差、分类准确性、敏感性、特异性和约登指数。结果表明,没有一种方法具有普遍的优越性;最优选择取决于考生的混合比例。具体来说,信息论方法QIR(分位数信息比)在主要非胜任组的情况下表现出色,其中高特异性至关重要。相反,在高度选择性的背景下,平衡熟练程度组,聚类方法CHI (Calinski-Harabasz指数)和平方误差和(SSE)显示出最高的分类效率。然而,贝叶斯核密度估计(BKDE)在各种条件下始终是一种鲁棒、平衡的方法。这些发现为从业者提供了一个清晰的决策框架,以便在传统方法不可行的情况下选择一个可辩护的、数据驱动的标准制定方法。
{"title":"Empowering Expert Judgment: A Data-Driven Decision Framework for Standard Setting in High-Dimensional and Data-Scarce Assessments.","authors":"Tianpeng Zheng, Zhehan Jiang, Zhichen Guo, Yuanfang Liu","doi":"10.1177/00131644251405406","DOIUrl":"10.1177/00131644251405406","url":null,"abstract":"<p><p>A critical methodological challenge in standard setting arises in small-sample, high-dimensional contexts where the number of items substantially exceeds the number of examinees. Under such conditions, conventional data-driven methods that rely on parametric models (e.g., item response theory) often become unstable or fail due to unreliable parameter estimation. This study investigates two families of data-driven methods: information-theoretic and unsupervised clustering, offering a potential solution to this challenge. Using a Monte Carlo simulation, we systematically evaluate 15 such methods to establish an evidence-based framework for practice. The simulation manipulated five factors, including sample size, the item-to-examinee ratio, mixture proportions, item quality, and ability separation. Method performance was evaluated using multiple criteria, including Relative Error, Classification Accuracy, Sensitivity, Specificity, and Youden's Index. Results indicated that no single method is universally superior; the optimal choice depends on the examinee mixture proportion. Specifically, the information-theoretic method QIR (quantile information ratio) excelled in scenarios with a dominant non-competent group, where high specificity was critical. Conversely, in highly selective contexts with balanced proficiency groups, the clustering methods CHI (Calinski-Harabasz index) and sum of squared error (SSE) demonstrated the highest classification effectiveness. Bayesian kernel density estimation (BKDE), however, consistently performed as a robust, balanced method across conditions. These findings provide practitioners with a clear decision framework for selecting a defensible, data-driven standard-setting method when traditional approaches are infeasible.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251405406"},"PeriodicalIF":2.3,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12764426/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145905891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Residual-Based Fit Statistics for Item Response Theory Models in the Presence of Non-Responses. 项目反应理论模型在无反应条件下的残差拟合统计评价。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-24 DOI: 10.1177/00131644251393444
Minho Lee, Juyoung Jung

Residual-based fit statistics, which compare observed item statistics (e.g., proportions) with model-implied probabilities, are widely used to evaluate model fit, item fit, and local dependence in item response theory (IRT) models. Despite the prevalence of item non-responses in empirical studies, their impact on these statistics has not been systematically examined. Existing software (package) often applies heuristic treatments (e.g., listwise or pairwise deletion), which can distort fit statistics because missing data further inflate discrepancies between observed and expected proportions. This study evaluates the appropriateness of such treatments through extensive simulation. Results show that deletion methods degrade the accuracy of fit testing: fit indices are inflated under both null and power conditions, with the bias worsening as missingness increases. In addition, the impact of missing data exceeds that of model misspecification. Practical recommendations and alternative methods are discussed to guide applied researchers.

残差拟合统计是将观察到的项目统计量(如比例)与模型隐含概率进行比较,广泛用于评估项目反应理论(IRT)模型中的模型拟合、项目拟合和局部依赖性。尽管在实证研究中普遍存在项目无反应,但它们对这些统计数据的影响尚未得到系统的研究。现有的软件(包)经常应用启发式处理(例如,列表或成对删除),这可能会扭曲拟合统计,因为缺失的数据进一步扩大了观察到的和预期的比例之间的差异。本研究通过广泛的模拟来评估这些治疗的适当性。结果表明,删除方法降低了拟合检验的准确性:在零值和幂值条件下,拟合指标都会膨胀,随着缺失程度的增加,偏差会加剧。此外,数据缺失的影响超过了模型错配的影响。讨论了实用建议和替代方法,以指导应用研究人员。
{"title":"Evaluation of Residual-Based Fit Statistics for Item Response Theory Models in the Presence of Non-Responses.","authors":"Minho Lee, Juyoung Jung","doi":"10.1177/00131644251393444","DOIUrl":"10.1177/00131644251393444","url":null,"abstract":"<p><p>Residual-based fit statistics, which compare observed item statistics (e.g., proportions) with model-implied probabilities, are widely used to evaluate model fit, item fit, and local dependence in item response theory (IRT) models. Despite the prevalence of item non-responses in empirical studies, their impact on these statistics has not been systematically examined. Existing software (package) often applies heuristic treatments (e.g., listwise or pairwise deletion), which can distort fit statistics because missing data further inflate discrepancies between observed and expected proportions. This study evaluates the appropriateness of such treatments through extensive simulation. Results show that deletion methods degrade the accuracy of fit testing: fit indices are inflated under both null and power conditions, with the bias worsening as missingness increases. In addition, the impact of missing data exceeds that of model misspecification. Practical recommendations and alternative methods are discussed to guide applied researchers.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251393444"},"PeriodicalIF":2.3,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12738280/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145849277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conditional Reliability of Weighted Test Scores on a Bounded D-Scale. 有界d标度上加权考试成绩的条件信度。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-20 DOI: 10.1177/00131644251396543
Dimiter M Dimitrov, Dimitar V Atanasov

Based on previous research on conditional reliability for number-correct test scores, conditioned on levels of the logit scale in item response theory, this article deals with conditional reliability of classical-type weighted scores conditioned on latent levels of a bounded scale. This is done in the framework of the D-scoring method of measurement (D-scale, bounded from 0 to 1). Along with the conditional reliability of weighted D-scores, conditioned on latent levels of the D-scale, presented are some additional measures of precision-conditional standard error, conditional signal-to-noise ratio, and marginal reliability. A syntax code (in R) for all computations is also provided.

基于以往项目反应理论中以logit量表水平为条件的数字正确测验分数条件信度研究,本文研究了以有界量表潜在水平为条件的经典型加权分数条件信度。这是在测量的d评分方法(d量表,范围从0到1)的框架内完成的。随着加权d分数的条件信度,以d量表的潜在水平为条件,提出了一些额外的精度措施-条件标准误差,条件信噪比和边际信度。还提供了所有计算的语法代码(在R中)。
{"title":"Conditional Reliability of Weighted Test Scores on a Bounded <i>D</i>-Scale.","authors":"Dimiter M Dimitrov, Dimitar V Atanasov","doi":"10.1177/00131644251396543","DOIUrl":"10.1177/00131644251396543","url":null,"abstract":"<p><p>Based on previous research on conditional reliability for number-correct test scores, conditioned on levels of the logit scale in item response theory, this article deals with conditional reliability of classical-type weighted scores conditioned on latent levels of a bounded scale. This is done in the framework of the <i>D</i>-scoring method of measurement (<i>D</i>-scale, bounded from 0 to 1). Along with the conditional reliability of weighted <i>D</i>-scores, conditioned on latent levels of the <i>D</i>-scale, presented are some additional measures of precision-conditional standard error, conditional signal-to-noise ratio, and marginal reliability. A syntax code (in R) for all computations is also provided.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251396543"},"PeriodicalIF":2.3,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12718170/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145809788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collapsing Sparse Responses in Likert-Type Scale Data: Advantages and Disadvantages for Model Fit in CFA. 李克特尺度数据中的塌缩稀疏响应:CFA模型拟合的优缺点。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-19 DOI: 10.1177/00131644251401097
Jin Liu, Yu Bao, Christine DiStefano, Wei Jiang

Applied researchers often encounter situations where certain item response categories receive very few endorsements, resulting in sparse data. Collapsing categories may mitigate sparsity by increasing cell counts, yet the methodological consequences of this practice remain insufficiently explored. The current study examined the effects of response collapsing in Likert-type scale data through a simulation study under the confirmatory factor analysis model. Sparse response categories were collapsed to determine the impact on fit indices (i.e., chi-square, comparative fit index [CFI], Tucker-Lewis index [TLI], root mean square error of approximation [RMSEA], and standardized root mean square residual [SRMR]). Findings indicate that category collapsing has a significant impact when sparsity is severe, leading to reduced model rejections in both correctly specified and misspecified models. In addition, different fit indices exhibited varying sensitivities to data collapsing. Specifically, RMSEA was recommended for the correctly identified model, and TLI with a cut-off value of .95 was recommended for the misspecified models. The empirical analysis was aligned with the simulation results. These results provide valuable insights for researchers confronted with sparse data in applied measurement contexts.

应用研究人员经常遇到这样的情况:某些项目响应类别得到的认可很少,导致数据稀疏。压缩类别可以通过增加细胞计数来减轻稀疏性,但这种做法的方法学后果仍未得到充分探讨。本研究通过验证性因子分析模型下的模拟研究,检验了李克特量表数据中反应崩溃的影响。对稀疏响应类别进行折叠,以确定对拟合指数(即卡方、比较拟合指数[CFI]、Tucker-Lewis指数[TLI]、近似均方根误差[RMSEA]和标准化均方根残差[SRMR])的影响。研究结果表明,当稀疏度严重时,类别坍缩具有显著影响,导致正确指定和错误指定模型的模型拒绝减少。此外,不同的拟合指标对数据崩溃的敏感性也不同。具体来说,RMSEA被推荐用于正确识别的模型,TLI的临界值为。对于指定错误的型号,建议使用95。实证分析与仿真结果一致。这些结果为在应用测量环境中面对稀疏数据的研究人员提供了有价值的见解。
{"title":"Collapsing Sparse Responses in Likert-Type Scale Data: Advantages and Disadvantages for Model Fit in CFA.","authors":"Jin Liu, Yu Bao, Christine DiStefano, Wei Jiang","doi":"10.1177/00131644251401097","DOIUrl":"10.1177/00131644251401097","url":null,"abstract":"<p><p>Applied researchers often encounter situations where certain item response categories receive very few endorsements, resulting in sparse data. Collapsing categories may mitigate sparsity by increasing cell counts, yet the methodological consequences of this practice remain insufficiently explored. The current study examined the effects of response collapsing in Likert-type scale data through a simulation study under the confirmatory factor analysis model. Sparse response categories were collapsed to determine the impact on fit indices (i.e., chi-square, comparative fit index [CFI], Tucker-Lewis index [TLI], root mean square error of approximation [RMSEA], and standardized root mean square residual [SRMR]). Findings indicate that category collapsing has a significant impact when sparsity is severe, leading to reduced model rejections in both correctly specified and misspecified models. In addition, different fit indices exhibited varying sensitivities to data collapsing. Specifically, RMSEA was recommended for the correctly identified model, and TLI with a cut-off value of .95 was recommended for the misspecified models. The empirical analysis was aligned with the simulation results. These results provide valuable insights for researchers confronted with sparse data in applied measurement contexts.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251401097"},"PeriodicalIF":2.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12716976/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145803112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Experimental Study on the Impact of Survey Stakes on Response Inconsistency in Mixed-Worded Scales. 问卷利害关系对混合字词量表反应不一致性影响的实验研究。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-19 DOI: 10.1177/00131644251395323
Michalis P Michaelides, Evi Konstantinidou

Respondent behavior in questionnaires may vary in terms of attention, effort, and consistency depending on the survey administration context and motivational conditions. This pre-registered experimental study examined whether motivational context influences response inconsistency, response times, and the role of conscientiousness in survey responding. A sample of 66 university students in Cyprus completed five psychological scales under both low-stakes and high-stakes instructions in a counterbalanced within-subjects design. To identify inconsistent respondents, two index-based methods were used: the mean absolute difference (MAD) index and Mahalanobis distance. Results showed that inconsistent responding was somewhat more frequent under low-stakes conditions, although differences were generally small and significant only for selected scales when using a lenient MAD threshold. By contrast, internal consistency reliability was slightly higher, and response times were significantly longer under high-stakes instructions, indicating greater deliberation. Conscientiousness predicted lower inconsistency only in the low-stakes condition. Overall, high-stakes instructions did not substantially reduce inconsistent responding but fostered longer response times and modest gains in reliability, suggesting enhanced behavioral engagement. Implications for survey design and data quality in psychological and educational research are discussed.

根据调查管理背景和动机条件,调查问卷中的被调查者行为在注意力、努力和一致性方面可能有所不同。本预注册实验研究考察动机情境是否影响反应不一致、反应时间和尽责性在调查反应中的作用。在塞浦路斯,66名大学生在低风险和高风险的指导下完成了5个心理量表。为了识别不一致的受访者,使用了两种基于指数的方法:平均绝对差(MAD)指数和马氏距离。结果显示,在低风险条件下,不一致的反应更频繁,尽管当使用宽松的MAD阈值时,差异通常很小且仅在选定的量表上显着。相比之下,在高风险指令下,内部一致性可靠性略高,反应时间明显更长,表明更深思熟虑。责任心仅在低风险条件下预测较低的不一致性。总的来说,高风险指令并没有实质性地减少不一致的反应,但促进了更长的反应时间和适度的可靠性提高,表明增强了行为参与。讨论了调查设计和数据质量在心理和教育研究中的意义。
{"title":"An Experimental Study on the Impact of Survey Stakes on Response Inconsistency in Mixed-Worded Scales.","authors":"Michalis P Michaelides, Evi Konstantinidou","doi":"10.1177/00131644251395323","DOIUrl":"10.1177/00131644251395323","url":null,"abstract":"<p><p>Respondent behavior in questionnaires may vary in terms of attention, effort, and consistency depending on the survey administration context and motivational conditions. This pre-registered experimental study examined whether motivational context influences response inconsistency, response times, and the role of conscientiousness in survey responding. A sample of 66 university students in Cyprus completed five psychological scales under both low-stakes and high-stakes instructions in a counterbalanced within-subjects design. To identify inconsistent respondents, two index-based methods were used: the mean absolute difference (MAD) index and Mahalanobis distance. Results showed that inconsistent responding was somewhat more frequent under low-stakes conditions, although differences were generally small and significant only for selected scales when using a lenient MAD threshold. By contrast, internal consistency reliability was slightly higher, and response times were significantly longer under high-stakes instructions, indicating greater deliberation. Conscientiousness predicted lower inconsistency only in the low-stakes condition. Overall, high-stakes instructions did not substantially reduce inconsistent responding but fostered longer response times and modest gains in reliability, suggesting enhanced behavioral engagement. Implications for survey design and data quality in psychological and educational research are discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251395323"},"PeriodicalIF":2.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12716977/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145803101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconceptualizing Scoring Reliability Through Linguistic Similarity. 通过语言相似性重新定义评分信度。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-19 DOI: 10.1177/00131644251397428
Ji Yoon Jung, Ummugul Bezirhan, Matthias von Davier

Conventional cross-country scoring reliability in international large-scale assessments often depends on double scoring, which typically involves relatively small samples of multilingual responses. To extend the reach of reliability estimation, this study introduces the Linguistic-integrated Reliability Audit (LiRA), a novel method that measures scoring reliability using an entire dataset in a large-scale, multilingual context. LiRA automatically generates a second score for each response by analyzing its semantic alignment within a neighborhood of similar responses, then applies a weighted majority voting to determine a consensus score. Results demonstrate that LiRA provides a more comprehensive and systematic estimation of scoring reliability at the item, country, and language levels, while preserving the fundamental concepts of traditional reliability.

在国际大规模评估中,传统的跨国评分可靠性往往依赖于双重评分,这通常涉及相对较小的多语言回答样本。为了扩大可靠性估计的范围,本研究引入了语言集成可靠性审计(LiRA),这是一种在大规模、多语言环境下使用整个数据集测量评分可靠性的新方法。LiRA通过分析每个回答在邻近的类似回答中的语义一致性,自动为每个回答生成第二个分数,然后应用加权多数投票来确定共识分数。结果表明,在保留传统信度基本概念的同时,LiRA在项目、国家和语言水平上提供了更全面和系统的评分信度估计。
{"title":"Reconceptualizing Scoring Reliability Through Linguistic Similarity.","authors":"Ji Yoon Jung, Ummugul Bezirhan, Matthias von Davier","doi":"10.1177/00131644251397428","DOIUrl":"10.1177/00131644251397428","url":null,"abstract":"<p><p>Conventional cross-country scoring reliability in international large-scale assessments often depends on double scoring, which typically involves relatively small samples of multilingual responses. To extend the reach of reliability estimation, this study introduces the Linguistic-integrated Reliability Audit (LiRA), a novel method that measures scoring reliability using an entire dataset in a large-scale, multilingual context. LiRA automatically generates a second score for each response by analyzing its semantic alignment within a neighborhood of similar responses, then applies a weighted majority voting to determine a consensus score. Results demonstrate that LiRA provides a more comprehensive and systematic estimation of scoring reliability at the item, country, and language levels, while preserving the fundamental concepts of traditional reliability.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251397428"},"PeriodicalIF":2.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12716981/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145803134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When Cluster-Robust Inferences Fail. 当群集鲁棒推理失败时。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-19 DOI: 10.1177/00131644251393203
Francis Huang

Although cluster-robust standard errors (CRSEs) are commonly used to account for violations of observations independence found in nested data, an underappreciated issue is that there are several instances when CRSEs can fail to properly maintain the nominally accepted Type I error rate. These situations (e.g., analyzing data with imbalanced cluster sizes) can readily be found in various types of education-related datasets and are important to consider when computing statistical inference tests when using cluster-level predictors. Using a Monte Carlo simulation, we investigated these conditions and tested alternative estimators and degrees of freedom (df) adjustments to assess how well they could ameliorate the issues related to the use of the traditional CRSE (CR1) estimator using both continuous and dichotomous predictors. Findings showed that the bias-reduced linearization estimator (CR2) and the jackknife estimator (CR3) together with df adjustments were generally effective at maintaining Type I error rates for most of the conditions tested. Results also indicated that the CR1 when paired with df based on the effective cluster size was also acceptable. We emphasize the importance of clearly describing the nested data structure as the characteristics of the dataset can influence Type I error rates when using CRSEs.

虽然集群鲁棒性标准误差(crse)通常用于解释嵌套数据中发现的违反观测独立性的情况,但一个未被重视的问题是,在一些情况下,crse可能无法适当地维持名义上可接受的第一类错误率。这些情况(例如,分析具有不平衡簇大小的数据)可以很容易地在各种类型的教育相关数据集中找到,并且在使用簇级预测器计算统计推断测试时需要考虑这些情况。使用蒙特卡罗模拟,我们研究了这些条件,并测试了替代估计器和自由度(df)调整,以评估它们如何改善与使用传统CRSE (CR1)估计器相关的问题,同时使用连续和二分类预测器。结果表明,在大多数测试条件下,偏置减少线性化估计器(CR2)和折刀估计器(CR3)以及df调整通常有效地维持了I型错误率。结果还表明,基于有效簇大小的CR1与df配对时也是可以接受的。我们强调明确描述嵌套数据结构的重要性,因为数据集的特征会影响使用CRSEs时的第一类错误率。
{"title":"When Cluster-Robust Inferences Fail.","authors":"Francis Huang","doi":"10.1177/00131644251393203","DOIUrl":"10.1177/00131644251393203","url":null,"abstract":"<p><p>Although cluster-robust standard errors (CRSEs) are commonly used to account for violations of observations independence found in nested data, an underappreciated issue is that there are several instances when CRSEs can fail to properly maintain the nominally accepted Type I error rate. These situations (e.g., analyzing data with imbalanced cluster sizes) can readily be found in various types of education-related datasets and are important to consider when computing statistical inference tests when using cluster-level predictors. Using a Monte Carlo simulation, we investigated these conditions and tested alternative estimators and degrees of freedom (df) adjustments to assess how well they could ameliorate the issues related to the use of the traditional CRSE (CR1) estimator using both continuous and dichotomous predictors. Findings showed that the bias-reduced linearization estimator (CR2) and the jackknife estimator (CR3) together with df adjustments were generally effective at maintaining Type I error rates for most of the conditions tested. Results also indicated that the CR1 when paired with df based on the effective cluster size was also acceptable. We emphasize the importance of clearly describing the nested data structure as the characteristics of the dataset can influence Type I error rates when using CRSEs.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251393203"},"PeriodicalIF":2.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12716980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145803212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Linear Geometry to Nonlinear and Information-Geometric Settings in Test Theory: Bregman Projections as a Unifying Framework. 从线性几何到测试理论中的非线性和信息几何设置:作为统一框架的Bregman投影。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-12-12 DOI: 10.1177/00131644251393483
Bruno D Zumbo

This article develops a unified geometric framework linking expectation, regression, test theory, reliability, and item response theory through the concept of Bregman projection. Building on operator-theoretic and convex-analytic foundations, the framework extends the linear geometry of classical test theory (CTT) into nonlinear and information-geometric settings. Reliability and regression emerge as measures of projection efficiency-linear in Hilbert space and nonlinear under convex potentials. The exposition demonstrates that classical conditional expectation, least-squares regression, and information projections in exponential-family models share a common mathematical structure defined by Bregman divergence. By situating CTT within this broader geometric context, the article clarifies relationships between measurement, expectation, and statistical inference, providing a coherent foundation for nonlinear measurement and estimation in psychometrics.

本文通过Bregman投影的概念,建立了一个连接期望、回归、测试理论、信度和项目反应理论的统一几何框架。该框架建立在算子理论和凸解析的基础上,将经典测试理论(CTT)的线性几何扩展到非线性和信息几何环境。可靠性和回归作为投影效率的度量——在希尔伯特空间是线性的,在凸势下是非线性的。本文论证了指数族模型中的经典条件期望、最小二乘回归和信息投影具有由Bregman散度定义的共同数学结构。通过将CTT置于这一更广泛的几何背景下,本文澄清了测量、期望和统计推断之间的关系,为心理测量学中的非线性测量和估计提供了连贯的基础。
{"title":"From Linear Geometry to Nonlinear and Information-Geometric Settings in Test Theory: Bregman Projections as a Unifying Framework.","authors":"Bruno D Zumbo","doi":"10.1177/00131644251393483","DOIUrl":"10.1177/00131644251393483","url":null,"abstract":"<p><p>This article develops a unified geometric framework linking expectation, regression, test theory, reliability, and item response theory through the concept of Bregman projection. Building on operator-theoretic and convex-analytic foundations, the framework extends the linear geometry of classical test theory (CTT) into nonlinear and information-geometric settings. Reliability and regression emerge as measures of projection efficiency-linear in Hilbert space and nonlinear under convex potentials. The exposition demonstrates that classical conditional expectation, least-squares regression, and information projections in exponential-family models share a common mathematical structure defined by Bregman divergence. By situating CTT within this broader geometric context, the article clarifies relationships between measurement, expectation, and statistical inference, providing a coherent foundation for nonlinear measurement and estimation in psychometrics.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251393483"},"PeriodicalIF":2.3,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145762466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliability as Projection in Operator-Theoretic Test Theory: Conditional Expectation, Hilbert Space Geometry, and Implications for Psychometric Practice. 信度在算子理论测试理论中的投射:条件期望、希尔伯特空间几何及其对心理测量实践的影响。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-12 DOI: 10.1177/00131644251389891
Bruno D Zumbo

This article reconceptualizes reliability as a theorem derived from the projection geometry of Hilbert space rather than an assumption of classical test theory. Within this framework, the true score is defined as the conditional expectation E ( X G ) , representing the orthogonal projection of the observed score onto the σ-algebra of the latent variable. Reliability, expressed as Rel ( X ) = Var [ E ( X G ) ] / Var ( X ) , quantifies the efficiency of this projection-the squared cosine between X and its true-score projection. This formulation unifies reliability with regression R 2 , factor-analytic communality, and predictive accuracy in stochastic models. The operator-theoretic perspective clarifies that measurement error corresponds to the orthogonal complement of the projection, and reliability reflects the alignment between observed and latent scores. Numerical examples and measure-theoretic proofs illustrate the framework's generality. The approach provides a rigorous mathematical foundation for reliability, connecting psychometric theory with modern statistical and geometric analysis.

本文将可靠性重新定义为一个从希尔伯特空间的投影几何推导出来的定理,而不是一个经典检验理论的假设。在这个框架中,真实分数被定义为条件期望E (X∣G),表示观察到的分数在潜在变量的σ-代数上的正交投影。可靠性,表示为Rel (X) = Var [E (X∣G)] / Var (X),量化了该投影的效率- X与其真值投影之间的平方余弦。该公式将可靠性与回归r2、因子分析共同性和随机模型中的预测准确性统一起来。从算子理论的角度来看,测量误差对应于投影的正交补,而信度反映了观测分数和潜在分数之间的一致性。数值实例和测量理论证明说明了该框架的通用性。该方法为可靠性提供了严格的数学基础,将心理测量理论与现代统计和几何分析联系起来。
{"title":"Reliability as Projection in Operator-Theoretic Test Theory: Conditional Expectation, Hilbert Space Geometry, and Implications for Psychometric Practice.","authors":"Bruno D Zumbo","doi":"10.1177/00131644251389891","DOIUrl":"10.1177/00131644251389891","url":null,"abstract":"<p><p>This article reconceptualizes reliability as a theorem derived from the projection geometry of Hilbert space rather than an assumption of classical test theory. Within this framework, the true score is defined as the conditional expectation <math><mrow><mi>E</mi> <mo>(</mo> <mi>X</mi> <mo>∣</mo> <mi>G</mi> <mo>)</mo></mrow> </math> , representing the orthogonal projection of the observed score onto the σ-algebra of the latent variable. Reliability, expressed as <math><mrow><mi>Rel</mi> <mo>(</mo> <mi>X</mi> <mo>)</mo> <mo>=</mo> <mi>Var</mi> <mo>[</mo> <mi>E</mi> <mo>(</mo> <mi>X</mi> <mo>∣</mo> <mi>G</mi> <mo>)</mo> <mo>]</mo> <mo>/</mo> <mi>Var</mi> <mo>(</mo> <mi>X</mi> <mo>)</mo></mrow> </math> , quantifies the efficiency of this projection-the squared cosine between <math><mrow><mi>X</mi> <mspace></mspace></mrow> </math> and its true-score projection. This formulation unifies reliability with regression <math> <mrow> <msup><mrow><mi>R</mi></mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> , factor-analytic communality, and predictive accuracy in stochastic models. The operator-theoretic perspective clarifies that measurement error corresponds to the orthogonal complement of the projection, and reliability reflects the alignment between observed and latent scores. Numerical examples and measure-theoretic proofs illustrate the framework's generality. The approach provides a rigorous mathematical foundation for reliability, connecting psychometric theory with modern statistical and geometric analysis.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251389891"},"PeriodicalIF":2.3,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12615236/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145539189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1