首页 > 最新文献

Applied Psychological Measurement最新文献

英文 中文
Semi-Parametric Item Response Theory With O'Sullivan Splines for Item Responses and Response Time. 利用奥沙利文样条对项目响应和响应时间进行半参数项目响应理论研究
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-07-01 Epub Date: 2025-02-02 DOI: 10.1177/01466216251316277
Chen-Wei Liu

Response time (RT) has been an essential resource for supplementing the estimation accuracy of latent traits and item parameters in educational testing. Most item response theory (IRT) approaches are based on parametric RT models. However, since test takers may alter their behaviors during a test due to motivation or strategy shifts, fatigue, or other causes, parametric IRT models are unlikely to capture such subtle and nonlinear information. In this work, we propose a novel semi-parametric IRT model with O'Sullivan splines to accommodate the flexible mean RT shapes and explore the underlying nonlinear relationships between latent traits and RT. A simulation study was conducted to demonstrate the substantial improvement in parameter estimation achieved by the new model, as well as the detriment of using parametric models in terms of biases and measurement errors. Using this model, a dataset of mathematics test scores and RT from the Programme for International Student Assessment was analyzed to demonstrate the evident nonlinearity and to compare the proposed model with existing models in terms of model fitting. The findings presented in this study indicate the promising nature of the new approach, suggesting its potential as an additional psychometric tool to enhance test reliability and reduce measurement errors.

在教育测试中,反应时间是补充潜在特质和项目参数估计准确性的重要资源。大多数项目反应理论(IRT)方法都是基于参数化反应模型。然而,由于考生在考试过程中可能会由于动机或策略转变、疲劳或其他原因而改变他们的行为,参数IRT模型不太可能捕捉到这种微妙的非线性信息。在这项工作中,我们提出了一种新的半参数IRT模型与O'Sullivan样条,以适应灵活的平均RT形状,并探索潜在性状与RT之间潜在的非线性关系。通过仿真研究,证明了新模型在参数估计方面取得的实质性改进,以及使用参数模型在偏差和测量误差方面的危害。使用该模型,分析了国际学生评估计划的数学考试成绩和RT数据集,以证明明显的非线性,并在模型拟合方面将所提出的模型与现有模型进行比较。这项研究的发现表明了新方法的前景,表明它有可能作为一种额外的心理测量工具来提高测试的可靠性和减少测量误差。
{"title":"Semi-Parametric Item Response Theory With O'Sullivan Splines for Item Responses and Response Time.","authors":"Chen-Wei Liu","doi":"10.1177/01466216251316277","DOIUrl":"10.1177/01466216251316277","url":null,"abstract":"<p><p>Response time (RT) has been an essential resource for supplementing the estimation accuracy of latent traits and item parameters in educational testing. Most item response theory (IRT) approaches are based on parametric RT models. However, since test takers may alter their behaviors during a test due to motivation or strategy shifts, fatigue, or other causes, parametric IRT models are unlikely to capture such subtle and nonlinear information. In this work, we propose a novel semi-parametric IRT model with O'Sullivan splines to accommodate the flexible mean RT shapes and explore the underlying nonlinear relationships between latent traits and RT. A simulation study was conducted to demonstrate the substantial improvement in parameter estimation achieved by the new model, as well as the detriment of using parametric models in terms of biases and measurement errors. Using this model, a dataset of mathematics test scores and RT from the Programme for International Student Assessment was analyzed to demonstrate the evident nonlinearity and to compare the proposed model with existing models in terms of model fitting. The findings presented in this study indicate the promising nature of the new approach, suggesting its potential as an additional psychometric tool to enhance test reliability and reduce measurement errors.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"224-242"},"PeriodicalIF":1.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11789044/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143190883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compound Optimal Design for Online Item Calibration Under the Two-Parameter Logistic Model. 双参数Logistic模型下在线项目标定的复合优化设计。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-07-01 Epub Date: 2025-01-28 DOI: 10.1177/01466216251316276
Lihong Song, Wenyi Wang

Under the theory of sequential design, compound optimal design with two optimality criteria can be used to solve the problem of efficient calibration of item parameters of item response theory model. In order to efficiently calibrate item parameters in computerized testing, a compound optimal design is proposed for the simultaneous estimation of item difficulty and discrimination parameters under the two-parameter logistic model, which adaptively focuses on optimizing the parameter which is difficult to estimate. The compound optimal design using the acceptance probability can provide ability design points to optimize the item difficulty and discrimination parameters, respectively. Simulation and real data analysis studies showed that the compound optimal design outperformed than the D-optimal and random design in terms of the recovery of both discrimination and difficulty parameters.

在序贯设计理论下,采用具有两个最优准则的复合优化设计可解决项目反应理论模型中项目参数的有效标定问题。为了有效地标定计算机测试中的试题参数,提出了一种双参数logistic模型下试题难度和判别参数同时估计的复合优化设计方法,该方法自适应地对难以估计的试题参数进行优化。利用可接受概率的复合优化设计可以分别提供优化项目难度和判别参数的能力设计点。仿真和实际数据分析研究表明,复合优化设计在识别参数和难度参数的恢复方面都优于d -最优设计和随机设计。
{"title":"Compound Optimal Design for Online Item Calibration Under the Two-Parameter Logistic Model.","authors":"Lihong Song, Wenyi Wang","doi":"10.1177/01466216251316276","DOIUrl":"10.1177/01466216251316276","url":null,"abstract":"<p><p>Under the theory of sequential design, compound optimal design with two optimality criteria can be used to solve the problem of efficient calibration of item parameters of item response theory model. In order to efficiently calibrate item parameters in computerized testing, a compound optimal design is proposed for the simultaneous estimation of item difficulty and discrimination parameters under the two-parameter logistic model, which adaptively focuses on optimizing the parameter which is difficult to estimate. The compound optimal design using the acceptance probability can provide ability design points to optimize the item difficulty and discrimination parameters, respectively. Simulation and real data analysis studies showed that the compound optimal design outperformed than the D-optimal and random design in terms of the recovery of both discrimination and difficulty parameters.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"177-198"},"PeriodicalIF":1.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11775943/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How to Make Sense of Reliability? Common Language Interpretation of Reliability and the Relation of Reliability to Effect Size. 如何理解可靠性?信度的共同语言解释及信度与效应量的关系。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-06-24 DOI: 10.1177/01466216251350159
Jari Metsämuuronen, Timi Niemensivu

Communicating the factual meaning of a particular reliability estimate is sometimes difficult. What does a specific reliability estimate of 0.80 or 0.95 mean in common language? Deflation-corrected estimates of reliability (DCER) using Somers' D or Goodman-Kruskal G as the item-score correlations are transformed into forms where specific estimates from the family of common language effect sizes are visible. This makes it possible to communicate reliability estimates using a common language and to evaluate the magnitude of a particular reliability estimate in the same way and with the same metric as we do with effect size estimates. Using a DCER, we can say that with k = 40 items, if the reliability is 0.95, in 80 out of 100 random pairs of test takers from different subpopulations on all items combined, those with a higher item response will also score higher on the test. In this case, using the thresholds familiar from effect sizes, we can say that the reliability is "very high." The transformation of the reliability estimate into a common language effect size depends on the size of the item-score association estimates and the number of items, so no closed-form equations for the transformations are given. However, relevant thresholds are provided for practical use.

传达特定可靠性评估的实际意义有时是困难的。在普通语言中,一个特定的信度估计为0.80或0.95意味着什么?使用Somers' D或Goodman-Kruskal G作为项目得分相关性的通货紧缩校正的可靠性估计(DCER)被转换为形式,其中来自共同语言效应大小家族的具体估计是可见的。这使得使用通用语言进行可靠性评估成为可能,并以与效应大小评估相同的方式和度量来评估特定可靠性评估的大小。使用DCER,我们可以说,当k = 40个项目时,如果信度为0.95,那么在100对随机的测试者中,有80对来自不同亚群体的测试者在所有项目的组合中,那些对项目反应较高的人也会在测试中得分较高。在这种情况下,使用效应大小中熟悉的阈值,我们可以说可靠性“非常高”。将信度估计转换为公共语言效应大小取决于项目得分关联估计的大小和项目数量,因此没有给出转换的封闭形式方程。但是,为实际使用提供了相关的阈值。
{"title":"How to Make Sense of Reliability? Common Language Interpretation of Reliability and the Relation of Reliability to Effect Size.","authors":"Jari Metsämuuronen, Timi Niemensivu","doi":"10.1177/01466216251350159","DOIUrl":"10.1177/01466216251350159","url":null,"abstract":"<p><p>Communicating the factual meaning of a particular reliability estimate is sometimes difficult. What does a specific reliability estimate of 0.80 or 0.95 mean in common language? Deflation-corrected estimates of reliability (DCER) using Somers' <i>D</i> or Goodman-Kruskal <i>G</i> as the item-score correlations are transformed into forms where specific estimates from the family of common language effect sizes are visible. This makes it possible to communicate reliability estimates using a common language and to evaluate the magnitude of a particular reliability estimate in the same way and with the same metric as we do with effect size estimates. Using a DCER, we can say that with <i>k</i> = 40 items, if the reliability is 0.95, in 80 out of 100 random pairs of test takers from different subpopulations on all items combined, those with a higher item response will also score higher on the test. In this case, using the thresholds familiar from effect sizes, we can say that the reliability is \"very high.\" The transformation of the reliability estimate into a common language effect size depends on the size of the item-score association estimates and the number of items, so no closed-form equations for the transformations are given. However, relevant thresholds are provided for practical use.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251350159"},"PeriodicalIF":1.0,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12187714/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144508891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Increase of Uncertainty in Summed-Score-Based Scoring in Non-Rasch IRT. 非rasch IRT中基于总分评分的不确定性增加。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-06-12 DOI: 10.1177/01466216251350342
Eisuke Segawa

Summed-score (SS)-based scoring in non-Rasch IRT allows for pencil-and-paper administration and is used in the Patient-Reported Outcomes Measurement Information System (PROMIS) alongside response-pattern-based scoring. However, this convenience comes with an increase in uncertainty (the increase) associated with SS scoring. The increase can be quantified through the relationship between Bayesian SS and RP scoring. Given an SS of s, the SS posterior is a weighted sum of RP posteriors, with weights representing the marginal probabilities of RPs. From this mixture, the SS score (SS posterior mean) is a weighted sum of RP posterior means, and its uncertainty (variance of the SS posterior) is decomposed into the uncertainty of RP scoring (the weighted sum of RP posterior variances) and the increase (variance of RP posterior means). Without quantifying the increase, PROMIS recommends RP scoring for greater accuracy, suggesting SS scoring as a second option. Using variance decomposition, we quantified the increases for two short forms (SFs). In one, the increase is very small, making SS scoring as accurate as RP scoring, while in the other, the increase is large, indicating SS scoring may not be a viable second option. The increase varies widely, influencing scoring decisions, and should be reported for each SF when SS scoring is used.

在非rasch IRT中,基于总分数(SS)的评分允许纸笔管理,并与基于反应模式的评分一起用于患者报告的结果测量信息系统(PROMIS)。然而,这种便利伴随着与SS评分相关的不确定性的增加(增加)。增加可以通过贝叶斯SS和RP评分之间的关系来量化。给定SS = s, SS后验是RP后验的加权和,权重表示RP的边际概率。由此,SS评分(SS后验均值)是RP后验均值的加权和,其不确定性(SS后验方差)分解为RP评分的不确定性(RP后验方差的加权和)和增加(RP后验均值的方差)。在没有量化增加的情况下,PROMIS建议RP评分更准确,建议SS评分作为第二选择。使用方差分解,我们量化了两个简短形式(sf)的增长。在一种情况下,增加非常小,使得SS评分与RP评分一样准确,而在另一种情况下,增加很大,表明SS评分可能不是可行的第二选择。增加幅度很大,影响评分决策,当使用SS评分时,应报告每个SF。
{"title":"Increase of Uncertainty in Summed-Score-Based Scoring in Non-Rasch IRT.","authors":"Eisuke Segawa","doi":"10.1177/01466216251350342","DOIUrl":"10.1177/01466216251350342","url":null,"abstract":"<p><p>Summed-score (SS)-based scoring in non-Rasch IRT allows for pencil-and-paper administration and is used in the Patient-Reported Outcomes Measurement Information System (PROMIS) alongside response-pattern-based scoring. However, this convenience comes with an increase in uncertainty (the increase) associated with SS scoring. The increase can be quantified through the relationship between Bayesian SS and RP scoring. Given an SS of s, the SS posterior is a weighted sum of RP posteriors, with weights representing the marginal probabilities of RPs. From this mixture, the SS score (SS posterior mean) is a weighted sum of RP posterior means, and its uncertainty (variance of the SS posterior) is decomposed into the uncertainty of RP scoring (the weighted sum of RP posterior variances) and the increase (variance of RP posterior means). Without quantifying the increase, PROMIS recommends RP scoring for greater accuracy, suggesting SS scoring as a second option. Using variance decomposition, we quantified the increases for two short forms (SFs). In one, the increase is very small, making SS scoring as accurate as RP scoring, while in the other, the increase is large, indicating SS scoring may not be a viable second option. The increase varies widely, influencing scoring decisions, and should be reported for each SF when SS scoring is used.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251350342"},"PeriodicalIF":1.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12162545/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144303335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
tna: An R Package for Transition Network Analysis. 一个用于转换网络分析的R包。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-06-05 DOI: 10.1177/01466216251348840
Santtu Tikka, Sonsoles López-Pernas, Mohammed Saqr

Understanding the dynamics of transitions plays a central role in educational research, informing studies of learning processes, motivation shifts, and social interactions. Transition network analysis (TNA) is a unified framework of probabilistic modeling and network analysis for capturing the temporal and relational aspects of transitions between events or states of interest. We introduce the R package tna that implements procedures for estimating the TNA models, building the transition networks, identifying patterns and communities, computing centrality measures, and visualizing the networks. The package also implements several functions for statistical procedures that can be used to assess differences between groups, stability of centrality measures and importance of specific transitions.

理解转变的动态在教育研究中起着核心作用,为学习过程、动机转变和社会互动的研究提供信息。转换网络分析(TNA)是概率建模和网络分析的统一框架,用于捕获感兴趣的事件或状态之间转换的时间和关系方面。我们介绍了R包tna,它实现了评估tna模型、构建过渡网络、识别模式和社区、计算中心性度量和可视化网络的过程。该软件包还实现了统计程序的若干功能,可用于评估群体之间的差异、中心性措施的稳定性和特定过渡的重要性。
{"title":"tna: An R Package for Transition Network Analysis.","authors":"Santtu Tikka, Sonsoles López-Pernas, Mohammed Saqr","doi":"10.1177/01466216251348840","DOIUrl":"10.1177/01466216251348840","url":null,"abstract":"<p><p>Understanding the dynamics of transitions plays a central role in educational research, informing studies of learning processes, motivation shifts, and social interactions. Transition network analysis (TNA) is a unified framework of probabilistic modeling and network analysis for capturing the temporal and relational aspects of transitions between events or states of interest. We introduce the R package tna that implements procedures for estimating the TNA models, building the transition networks, identifying patterns and communities, computing centrality measures, and visualizing the networks. The package also implements several functions for statistical procedures that can be used to assess differences between groups, stability of centrality measures and importance of specific transitions.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251348840"},"PeriodicalIF":1.0,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12141252/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144250343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Use of Elbow Plot Method for Class Enumeration in Factor Mixture Models. 用弯头图法进行因子混合模型的类枚举。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-05-20 DOI: 10.1177/01466216251344288
Sedat Sen, Allan S Cohen

Application of factor mixture models (FMMs) requires determining the correct number of latent classes. A number of studies have examined the performance of several information criterion (IC) indices, but as yet none have studied the effectiveness of the elbow plot method. In this study, therefore, the effectiveness of the elbow plot method was compared with the lowest value criterion and the difference method calculated from five commonly used IC indices. Results of a simulation study showed the elbow plot method to detect the generating model at least 90% of the time for two- and three-class FMMs. Results also showed the elbow plot method did not perform well for two-factor and four-class conditions. The performance of the elbow plot method was generally better than that of the lowest IC value criterion and difference method under two- and three-class conditions. For the four-latent class conditions, there were no meaningful differences between the results of the elbow plot method and the lowest value criterion method. On the other hand, the difference method outperformed the other two methods in conditions with two factors and four classes.

因子混合模型(fmm)的应用需要确定潜在类别的正确数量。许多研究已经检查了几个信息标准(IC)指数的性能,但迄今为止还没有研究肘形图方法的有效性。因此,本研究将肘形图法与最低值准则和五种常用IC指数计算的差值法的有效性进行了比较。仿真研究结果表明,对于二级和三级fmm,弯头图方法至少有90%的时间可以检测出生成模型。结果还表明,肘部图法在双因素和四类条件下表现不佳。在二级和三级条件下,肘形图法的性能普遍优于最低IC值判据法和差分法。对于四潜分类条件,肘形图法和最低值标准法的结果没有显著差异。另一方面,在两因素四类条件下,差分法优于其他两种方法。
{"title":"On the Use of Elbow Plot Method for Class Enumeration in Factor Mixture Models.","authors":"Sedat Sen, Allan S Cohen","doi":"10.1177/01466216251344288","DOIUrl":"10.1177/01466216251344288","url":null,"abstract":"<p><p>Application of factor mixture models (FMMs) requires determining the correct number of latent classes. A number of studies have examined the performance of several information criterion (IC) indices, but as yet none have studied the effectiveness of the elbow plot method. In this study, therefore, the effectiveness of the elbow plot method was compared with the lowest value criterion and the difference method calculated from five commonly used IC indices. Results of a simulation study showed the elbow plot method to detect the generating model at least 90% of the time for two- and three-class FMMs. Results also showed the elbow plot method did not perform well for two-factor and four-class conditions. The performance of the elbow plot method was generally better than that of the lowest IC value criterion and difference method under two- and three-class conditions. For the four-latent class conditions, there were no meaningful differences between the results of the elbow plot method and the lowest value criterion method. On the other hand, the difference method outperformed the other two methods in conditions with two factors and four classes.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251344288"},"PeriodicalIF":1.0,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12092417/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144129245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Generalized Multi-Detector Combination Approach for Differential Item Functioning Detection. 差分项目功能检测的一种广义多检测器组合方法。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-05-01 Epub Date: 2024-12-19 DOI: 10.1177/01466216241310602
Shan Huang, Hidetoki Ishii

Many studies on differential item functioning (DIF) detection rely on single detection methods (SDMs), each of which necessitates specific assumptions that may not always be validated. Using an inappropriate SDM can lead to diminished accuracy in DIF detection. To address this limitation, a novel multi-detector combination (MDC) approach is proposed. Unlike SDMs, MDC effectively evaluates the relevance of different SDMs under various test conditions and integrates them using supervised learning, thereby mitigating the risk associated with selecting a suboptimal SDM for DIF detection. This study aimed to validate the accuracy of the MDC approach by applying five types of SDMs and four distinct supervised learning methods in MDC modeling. Model performance was assessed using the area under the curve (AUC), which provided a comprehensive measure of the ability of the model to distinguish between classes across all threshold levels, with higher AUC values indicating higher accuracy. The MDC methods consistently achieved higher average AUC values compared to SDMs in both matched test sets (where test conditions align with the training set) and unmatched test sets. Furthermore, MDC outperformed all SDMs under each test condition. These findings indicated that MDC is highly accurate and robust across diverse test conditions, establishing it as a viable method for practical DIF detection.

许多关于差异项目功能(DIF)检测的研究依赖于单一检测方法(SDMs),每种方法都需要特定的假设,而这些假设可能并不总是有效的。使用不合适的SDM可能导致DIF检测精度降低。为了解决这一限制,提出了一种新的多检测器组合(MDC)方法。与SDM不同,MDC有效地评估了不同SDM在各种测试条件下的相关性,并使用监督学习将它们集成在一起,从而降低了为DIF检测选择次优SDM的风险。本研究旨在通过在多数据集建模中应用五种类型的sdm和四种不同的监督学习方法来验证多数据集方法的准确性。使用曲线下面积(AUC)评估模型性能,它提供了模型在所有阈值水平上区分类别的能力的综合度量,AUC值越高表明准确率越高。与sdm相比,MDC方法在匹配的测试集(测试条件与训练集一致)和不匹配的测试集中始终获得更高的平均AUC值。此外,MDC在每个测试条件下都优于所有sdm。这些发现表明,MDC在不同的测试条件下具有很高的准确性和鲁棒性,使其成为实际DIF检测的可行方法。
{"title":"A Generalized Multi-Detector Combination Approach for Differential Item Functioning Detection.","authors":"Shan Huang, Hidetoki Ishii","doi":"10.1177/01466216241310602","DOIUrl":"10.1177/01466216241310602","url":null,"abstract":"<p><p>Many studies on differential item functioning (DIF) detection rely on single detection methods (SDMs), each of which necessitates specific assumptions that may not always be validated. Using an inappropriate SDM can lead to diminished accuracy in DIF detection. To address this limitation, a novel multi-detector combination (MDC) approach is proposed. Unlike SDMs, MDC effectively evaluates the relevance of different SDMs under various test conditions and integrates them using supervised learning, thereby mitigating the risk associated with selecting a suboptimal SDM for DIF detection. This study aimed to validate the accuracy of the MDC approach by applying five types of SDMs and four distinct supervised learning methods in MDC modeling. Model performance was assessed using the area under the curve (AUC), which provided a comprehensive measure of the ability of the model to distinguish between classes across all threshold levels, with higher AUC values indicating higher accuracy. The MDC methods consistently achieved higher average AUC values compared to SDMs in both matched test sets (where test conditions align with the training set) and unmatched test sets. Furthermore, MDC outperformed all SDMs under each test condition. These findings indicated that MDC is highly accurate and robust across diverse test conditions, establishing it as a viable method for practical DIF detection.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"75-89"},"PeriodicalIF":1.2,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660104/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142878074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference of Correlations Among Testlet Effects: A Latent Variable Selection Method. 小测验效应之间相关性的推断:潜变量选择法
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-05-01 Epub Date: 2024-12-26 DOI: 10.1177/01466216241310598
Xin Xu, Jinxin Guo, Tao Xin

In psychological and educational measurement, a testlet-based test is a common and popular format, especially in some large-scale assessments. In modeling testlet effects, a standard bifactor model, as a common strategy, assumes different testlet effects and the main effect to be fully independently distributed. However, it is difficult to establish perfectly independent clusters as this assumption. To address this issue, correlations among testlets could be taken into account in fitting data. Moreover, one may desire to maintain a good practical interpretation of the sparse loading matrix. In this paper, we propose data-driven learning of significant correlations in the covariance matrix through a latent variable selection method. Under the proposed method, a regularization is performed on the weak correlations for the extended bifactor model. Further, a stochastic expectation maximization algorithm is employed for efficient computation. Results from simulation studies show the consistency of the proposed method in selecting significant correlations. Empirical data from the 2015 Program for International Student Assessment is analyzed using the proposed method as an example.

在心理和教育测量中,基于测试的测试是一种常见和流行的形式,特别是在一些大规模的评估中。在模拟试验集效应时,标准双因子模型作为一种常用策略,假定不同的试验集效应和主效应是完全独立分布的。然而,这种假设很难建立完全独立的聚类。为了解决这个问题,可以在拟合数据时考虑到测试集之间的相关性。此外,人们可能希望对稀疏加载矩阵保持良好的实际解释。在本文中,我们提出了通过潜在变量选择方法对协方差矩阵中的显著相关性进行数据驱动学习。在该方法下,对扩展双因子模型的弱相关性进行正则化处理。此外,为了提高计算效率,采用了随机期望最大化算法。仿真研究结果表明,该方法在选择显著相关性方面具有一致性。以2015年国际学生评估项目的实证数据为例进行分析。
{"title":"Inference of Correlations Among Testlet Effects: A Latent Variable Selection Method.","authors":"Xin Xu, Jinxin Guo, Tao Xin","doi":"10.1177/01466216241310598","DOIUrl":"10.1177/01466216241310598","url":null,"abstract":"<p><p>In psychological and educational measurement, a testlet-based test is a common and popular format, especially in some large-scale assessments. In modeling testlet effects, a standard bifactor model, as a common strategy, assumes different testlet effects and the main effect to be fully independently distributed. However, it is difficult to establish perfectly independent clusters as this assumption. To address this issue, correlations among testlets could be taken into account in fitting data. Moreover, one may desire to maintain a good practical interpretation of the sparse loading matrix. In this paper, we propose data-driven learning of significant correlations in the covariance matrix through a latent variable selection method. Under the proposed method, a regularization is performed on the weak correlations for the extended bifactor model. Further, a stochastic expectation maximization algorithm is employed for efficient computation. Results from simulation studies show the consistency of the proposed method in selecting significant correlations. Empirical data from the 2015 Program for International Student Assessment is analyzed using the proposed method as an example.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"126-155"},"PeriodicalIF":1.2,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142903933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Measurement of Change in the Context of Item Parameter Drift. 项目参数漂移情况下变化的自适应测量。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-05-01 Epub Date: 2024-12-30 DOI: 10.1177/01466216241310599
Allison W Cooperman, Ming Him Tai, Joseph N DeWeese, David J Weiss

Adaptive measurement of change (AMC) uses computerized adaptive testing (CAT) to measure and test the significance of intraindividual change on one or more latent traits. The extant AMC research has so far assumed that item parameter values are constant across testing occasions. Yet item parameters might change over time, a phenomenon termed item parameter drift (IPD). The current study examined AMC's performance in the context of IPD with unidimensional, dichotomous CATs across two testing occasions. A Monte Carlo simulation revealed that AMC false and true positive rates were primarily affected by changes in the difficulty parameter. False positive rates were related to the location of the drift items relative to the latent trait continuum, as the administration of more drift items spuriously increased the magnitude of estimated trait change. Moreover, true positive rates depended upon an interaction between the direction of difficulty parameter drift and the latent trait change trajectory. A follow-up simulation further showed that the number of items in the CAT with parameter drift impacted AMC false and true positive rates, with these relationships moderated by IPD characteristics and the latent trait change trajectory. It is recommended that test administrators confirm the absence of IPD prior to using AMC for measuring intraindividual change with educational and psychological tests.

自适应变化测量(AMC)采用计算机化的自适应测试(CAT)来测量和测试个体内部变化对一个或多个潜在特征的显著性。现有的AMC研究迄今为止都假设项目参数值在不同的测试场合是恒定的。然而,项目参数可能会随着时间的推移而改变,这种现象被称为项目参数漂移(IPD)。目前的研究通过两个测试场合,用一维、二分类cat检查了AMC在IPD背景下的表现。蒙特卡罗模拟表明,AMC的假阳性率和真阳性率主要受难度参数变化的影响。假阳性率与漂移项目相对于潜在特质连续体的位置有关,因为更多的漂移项目的管理虚假地增加了估计的特质变化的幅度。此外,真阳性率依赖于难度参数漂移方向与潜在特质变化轨迹的交互作用。后续模拟进一步表明,具有参数漂移的CAT条目数量影响AMC假阳性率和真阳性率,这种关系受到IPD特征和潜在特质变化轨迹的调节。建议考试管理员在使用AMC进行教育和心理测试来测量个人内部变化之前,先确认IPD的缺失。
{"title":"Adaptive Measurement of Change in the Context of Item Parameter Drift.","authors":"Allison W Cooperman, Ming Him Tai, Joseph N DeWeese, David J Weiss","doi":"10.1177/01466216241310599","DOIUrl":"10.1177/01466216241310599","url":null,"abstract":"<p><p>Adaptive measurement of change (AMC) uses computerized adaptive testing (CAT) to measure and test the significance of intraindividual change on one or more latent traits. The extant AMC research has so far assumed that item parameter values are constant across testing occasions. Yet item parameters might change over time, a phenomenon termed item parameter drift (IPD). The current study examined AMC's performance in the context of IPD with unidimensional, dichotomous CATs across two testing occasions. A Monte Carlo simulation revealed that AMC false and true positive rates were primarily affected by changes in the difficulty parameter. False positive rates were related to the location of the drift items relative to the latent trait continuum, as the administration of more drift items spuriously increased the magnitude of estimated trait change. Moreover, true positive rates depended upon an interaction between the direction of difficulty parameter drift and the latent trait change trajectory. A follow-up simulation further showed that the number of items in the CAT with parameter drift impacted AMC false and true positive rates, with these relationships moderated by IPD characteristics and the latent trait change trajectory. It is recommended that test administrators confirm the absence of IPD prior to using AMC for measuring intraindividual change with educational and psychological tests.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"109-125"},"PeriodicalIF":1.2,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11683792/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142915981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Information Manifold Perspective for Analyzing Test Data. 从信息流形的角度分析测试数据。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-05-01 Epub Date: 2024-12-20 DOI: 10.1177/01466216241310600
James O Ramsay, Juan Li, Joakim Wallmark, Marie Wiberg

Modifications of current psychometric models for analyzing test data are proposed that produce an additive scale measure of information. This information measure is a one-dimensional space curve or curved surface manifold that is invariant across varying manifold indexing systems. The arc length along a curve manifold is used as it is an additive metric having a defined zero and a version of the bit as a unit. This property, referred to here as the scope of the test or an item, facilitates the evaluation of graphs and numerical summaries. The measurement power of the test is defined by the length of the manifold, and the performance or experiential level of a person by a position along the curve. In this study, we also use all information from the items including the information from the distractors. Test data from a large-scale college admissions test are used to illustrate the test information manifold perspective and to compare it with the well-known item response theory nominal model. It is illustrated that the use of information theory opens a vista of new ways of assessing item performance and inter-item dependency, as well as test takers' knowledge.

提出了对当前用于分析测试数据的心理测量模型的修改,以产生信息的附加尺度测量。这种信息度量是一维空间曲线或曲面流形,它在不同的流形索引系统中是不变的。沿着曲线流形的弧长被使用,因为它是一个附加度量,具有定义的零和一个版本的钻头作为单位。这个属性,在这里称为测试或项目的范围,便于对图表和数值摘要进行评估。测试的测量能力由流形的长度来定义,而一个人的表现或经验水平由曲线上的位置来定义。在本研究中,我们也使用了所有来自项目的信息,包括来自干扰物的信息。以一项大规模大学入学考试的测验数据为例,说明测验信息的多元视角,并将其与著名的项目反应理论名义模型进行比较。研究表明,信息论的应用为评估项目绩效、项目间依赖以及考生的知识提供了新的途径。
{"title":"An Information Manifold Perspective for Analyzing Test Data.","authors":"James O Ramsay, Juan Li, Joakim Wallmark, Marie Wiberg","doi":"10.1177/01466216241310600","DOIUrl":"10.1177/01466216241310600","url":null,"abstract":"<p><p>Modifications of current psychometric models for analyzing test data are proposed that produce an additive scale measure of information. This information measure is a one-dimensional space curve or curved surface manifold that is invariant across varying manifold indexing systems. The arc length along a curve manifold is used as it is an additive metric having a defined zero and a version of the bit as a unit. This property, referred to here as the scope of the test or an item, facilitates the evaluation of graphs and numerical summaries. The measurement power of the test is defined by the length of the manifold, and the performance or experiential level of a person by a position along the curve. In this study, we also use all information from the items including the information from the distractors. Test data from a large-scale college admissions test are used to illustrate the test information manifold perspective and to compare it with the well-known item response theory nominal model. It is illustrated that the use of information theory opens a vista of new ways of assessing item performance and inter-item dependency, as well as test takers' knowledge.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"90-108"},"PeriodicalIF":1.2,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11662344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142878097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1