首页 > 最新文献

Applied Psychological Measurement最新文献

英文 中文
Rise of the Machine: Detecting Aberrant Response Patterns in Survey Instruments Using Autoencoder. 机器的兴起:用自动编码器检测测量仪器中的异常响应模式。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2026-02-13 DOI: 10.1177/01466216261425242
Cody Ding

Survey questionnaires are essential tools in psychological and educational research, as the data they gather directly influence research conclusions and policy decisions. A major challenge in ensuring data quality is identifying aberrant response patterns that can jeopardize research outcomes, as they may introduce errors into subsequent analyses, potentially resulting in flawed theoretical conclusions and misguided practical applications. This study presents a machine learning solution that employs autoencoder neural networks to detect aberrant response patterns in survey data as a computational method. We evaluated the effectiveness of autoencoder neural networks in identifying response anomalies through both simulated and real data. The results indicate that this approach can effectively detect anomalies in responses, providing researchers with more options for their analyses and subsequent conclusions. Ultimately, this enhances the trustworthiness of findings in psychological and educational research.

调查问卷是心理和教育研究的重要工具,因为它们收集的数据直接影响研究结论和政策决定。确保数据质量的一个主要挑战是识别可能危及研究成果的异常响应模式,因为它们可能在随后的分析中引入错误,可能导致有缺陷的理论结论和误导的实际应用。本研究提出了一种机器学习解决方案,该解决方案采用自编码器神经网络作为计算方法来检测调查数据中的异常响应模式。我们通过模拟和真实数据评估了自编码器神经网络在识别响应异常方面的有效性。结果表明,该方法可以有效地检测到响应中的异常,为研究人员的分析和后续结论提供了更多的选择。最终,这提高了心理学和教育研究结果的可信度。
{"title":"Rise of the Machine: Detecting Aberrant Response Patterns in Survey Instruments Using Autoencoder.","authors":"Cody Ding","doi":"10.1177/01466216261425242","DOIUrl":"10.1177/01466216261425242","url":null,"abstract":"<p><p>Survey questionnaires are essential tools in psychological and educational research, as the data they gather directly influence research conclusions and policy decisions. A major challenge in ensuring data quality is identifying aberrant response patterns that can jeopardize research outcomes, as they may introduce errors into subsequent analyses, potentially resulting in flawed theoretical conclusions and misguided practical applications. This study presents a machine learning solution that employs autoencoder neural networks to detect aberrant response patterns in survey data as a computational method. We evaluated the effectiveness of autoencoder neural networks in identifying response anomalies through both simulated and real data. The results indicate that this approach can effectively detect anomalies in responses, providing researchers with more options for their analyses and subsequent conclusions. Ultimately, this enhances the trustworthiness of findings in psychological and educational research.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216261425242"},"PeriodicalIF":1.2,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12904810/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Latent Density Misspecification on Item Response Theory Equating Methods. 潜在密度错配对项目反应理论等价方法的影响。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2026-02-12 DOI: 10.1177/01466216261425440
Kyung Yong Kim, Seongeun Kim, Haeju Lee

Item response theory (IRT) observed and true score equating are often conducted assuming that the latent variable is normally distributed. Although this might be a reasonable assumption for many educational and psychological assessments, not all variables can be approximated by a normal distribution. Under the common-item nonequivalent groups design, the current study examined the impact of latent density misspecification on IRT observed and true score equating. Specifically, equating results provided by two separate calibration estimates based on the Stocking-Lord linking method with normal and uniform weights and three concurrent calibration estimates obtained with different characterizations of the latent densities for the old and new groups were compared using both simulated and real data sets. In general, the concurrent calibration method with the latent densities for the two groups estimated using the empirical histogram method provided equating results with the least amount of error for most of the study conditions. Using normal weights with the Stocking-Lord method generally performed much better than using uniform weights; however, the overall performance of the Stocking-Lord method with normal weights was acceptable only if the latent densities for the two groups were normal distributions or close to normal distributions.

项目反应理论(IRT)的观察和真分相等通常是假设潜在变量是正态分布的。虽然对于许多教育和心理评估来说,这可能是一个合理的假设,但并非所有变量都可以用正态分布来近似。在共同项目非等效组设计下,本研究考察了潜在密度错配对观察到的IRT和真实评分相等的影响。具体而言,利用模拟数据集和真实数据集,比较了基于stockinglord连接方法的两个独立的正态权和均匀权的校准估计结果和三个具有不同潜在密度特征的新老群体的同步校准估计结果。总的来说,在大多数研究条件下,使用经验直方图法估计两组潜在密度的并发校准方法提供了误差最小的等效结果。使用stockinglord法的正常权重通常比使用均匀权重要好得多;然而,只有当两组的潜在密度为正态分布或接近正态分布时,具有正态权值的stockinglord方法的总体性能才可以接受。
{"title":"The Impact of Latent Density Misspecification on Item Response Theory Equating Methods.","authors":"Kyung Yong Kim, Seongeun Kim, Haeju Lee","doi":"10.1177/01466216261425440","DOIUrl":"10.1177/01466216261425440","url":null,"abstract":"<p><p>Item response theory (IRT) observed and true score equating are often conducted assuming that the latent variable is normally distributed. Although this might be a reasonable assumption for many educational and psychological assessments, not all variables can be approximated by a normal distribution. Under the common-item nonequivalent groups design, the current study examined the impact of latent density misspecification on IRT observed and true score equating. Specifically, equating results provided by two separate calibration estimates based on the Stocking-Lord linking method with normal and uniform weights and three concurrent calibration estimates obtained with different characterizations of the latent densities for the old and new groups were compared using both simulated and real data sets. In general, the concurrent calibration method with the latent densities for the two groups estimated using the empirical histogram method provided equating results with the least amount of error for most of the study conditions. Using normal weights with the Stocking-Lord method generally performed much better than using uniform weights; however, the overall performance of the Stocking-Lord method with normal weights was acceptable only if the latent densities for the two groups were normal distributions or close to normal distributions.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216261425440"},"PeriodicalIF":1.2,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12900660/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Score-Based Tests With Fixed Effects Person Parameters in Item Response Theory: Detecting Model Misspecification Including Differential Item Functioning. 项目反应理论中具有固定效应人参数的基于分数的测试:检测包括差异项目功能的模型错配。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2026-02-09 DOI: 10.1177/01466216261422480
Rudolf Debelak, Charles C Driver

We present a fast, score-based test to detecting model misspecification in item response theory (IRT) models that remains valid when person parameters are treated as fixed effects, as may be used for very large data sets. The new approximation (i) eliminates the need to pre-specify ability groups or priors for person abilities, (ii) does not require explicit functional form assumptions, (iii) works with two estimators designed for very high item/person counts-constrained joint maximum likelihood (CJML) and joint maximum a posteriori (JMAP)-and (iv) requires only a single model fit, making DIF-screening faster and simpler than alternatives based on model comparisons. A spline-based residualization step further suppresses spurious Type I error when the ordering covariate is correlated with ability. Simulations with the two-parameter logistic model show nominal error rates and high power once examinees contribute around 15-20 responses; only extremely short tests (around 10 items) still pose challenges under strong impact. An application to 1,602 reading items and 57,684 students from the Mindsteps platform demonstrates scalability and practical value, flagging 13% of items for gender-related DIF and correlating highly with conventional approaches of explicitly modeling DIF. Together, these results position the proposed test as a robust, computation-light diagnostic for large-scale assessments when classical random-effects approaches are infeasible, ability group structure is unknown or complex, or the shape of DIF effects is unknown or complex.

我们提出了一个快速的,基于分数的测试来检测项目反应理论(IRT)模型中的模型错误,当人参数被视为固定效应时,该模型仍然有效,可以用于非常大的数据集。新的近似(i)消除了预先指定能力组或个人能力先验的需要,(ii)不需要明确的功能形式假设,(iii)使用两个为非常高的项目/人数量设计的估计器-约束联合最大似然(CJML)和联合最大后验(JMAP)-以及(iv)只需要一个模型拟合,使dif筛选比基于模型比较的替代方法更快更简单。当排序协变量与能力相关时,基于样条的残差化步骤进一步抑制虚假的I型错误。双参数逻辑模型的模拟显示,当考生贡献15-20个响应时,名义错误率和高功率;只有极短的测试(大约10个项目)在强大的冲击下仍然构成挑战。对Mindsteps平台上的1602个阅读项目和57,684名学生的应用程序展示了可扩展性和实用价值,标记了13%的与性别相关的DIF项目,并与传统的明确建模DIF方法高度相关。总之,这些结果表明,当经典的随机效应方法不可行,能力群结构未知或复杂,或者DIF效应的形状未知或复杂时,所提出的测试将成为一种强大的、无需计算的大规模评估诊断方法。
{"title":"Score-Based Tests With Fixed Effects Person Parameters in Item Response Theory: Detecting Model Misspecification Including Differential Item Functioning.","authors":"Rudolf Debelak, Charles C Driver","doi":"10.1177/01466216261422480","DOIUrl":"10.1177/01466216261422480","url":null,"abstract":"<p><p>We present a fast, score-based test to detecting model misspecification in item response theory (IRT) models that remains valid when person parameters are treated as fixed effects, as may be used for very large data sets. The new approximation (i) eliminates the need to pre-specify ability groups or priors for person abilities, (ii) does not require explicit functional form assumptions, (iii) works with two estimators designed for very high item/person counts-constrained joint maximum likelihood (CJML) and joint maximum a posteriori (JMAP)-and (iv) requires only a single model fit, making DIF-screening faster and simpler than alternatives based on model comparisons. A spline-based residualization step further suppresses spurious Type I error when the ordering covariate is correlated with ability. Simulations with the two-parameter logistic model show nominal error rates and high power once examinees contribute around 15-20 responses; only extremely short tests (around 10 items) still pose challenges under strong impact. An application to 1,602 reading items and 57,684 students from the <i>Mindsteps</i> platform demonstrates scalability and practical value, flagging 13% of items for gender-related DIF and correlating highly with conventional approaches of explicitly modeling DIF. Together, these results position the proposed test as a robust, computation-light diagnostic for large-scale assessments when classical random-effects approaches are infeasible, ability group structure is unknown or complex, or the shape of DIF effects is unknown or complex.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216261422480"},"PeriodicalIF":1.2,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12890607/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146182799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Item Calibration in the Context of the Swedish Scholastic Aptitude Test. 瑞典学术能力倾向测试的最佳项目校准。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2026-02-06 DOI: 10.1177/01466216261420758
Jonas Bjermo, Ellinor Fackle Fornius, Frank Miller

Large-scale achievement tests require the existence of item banks with items for use in future tests. Before an item is included into the bank, its characteristics need to be estimated. The process of estimating the item characteristics is called item calibration. For the quality of the future achievement tests, it is important to perform this calibration well and it is desirable to estimate the item characteristics as efficiently as possible. Methods of optimal design have been developed to allocate pretest items to examinees with the most suited ability. Theoretical evidence shows advantages with using ability-dependent allocation of pretest items. However, it is not clear whether these theoretical results hold also in a real testing situation. In this paper, we investigate the performance of an optimal ability-dependent allocation in the context of the Swedish Scholastic Aptitude Test (SweSAT) and quantify the gain from using the optimal allocation. On average over all items, we see an improved precision of calibration. While this average improvement is moderate, we are able to identify for what kind of items the method works well. This enables targeting specific item types for optimal calibration. We also discuss possibilities for improvements of the method.

大规模成就测试需要有题库,其中的题库可以在以后的测试中使用。在一个项目被纳入银行之前,需要对其特征进行估计。估计项目特征的过程称为项目校准。为了提高未来成就测试的质量,很重要的一点是要做好校准工作,并尽可能有效地估计项目特征。优化设计的方法已被开发,以分配预试项目的考生与最适合的能力。理论证据表明使用能力依赖的前测项目分配具有优势。然而,尚不清楚这些理论结果是否也适用于实际测试情况。在本文中,我们研究了瑞典学术能力测试(SweSAT)背景下最优能力依赖分配的表现,并量化了使用最优分配的收益。在所有项目的平均水平上,我们看到了校准精度的提高。虽然这种平均改进是适度的,但我们能够确定该方法适用于哪种类型的项目。这样可以针对特定项目类型进行最佳校准。我们还讨论了改进该方法的可能性。
{"title":"Optimal Item Calibration in the Context of the Swedish Scholastic Aptitude Test.","authors":"Jonas Bjermo, Ellinor Fackle Fornius, Frank Miller","doi":"10.1177/01466216261420758","DOIUrl":"10.1177/01466216261420758","url":null,"abstract":"<p><p>Large-scale achievement tests require the existence of item banks with items for use in future tests. Before an item is included into the bank, its characteristics need to be estimated. The process of estimating the item characteristics is called item calibration. For the quality of the future achievement tests, it is important to perform this calibration well and it is desirable to estimate the item characteristics as efficiently as possible. Methods of optimal design have been developed to allocate pretest items to examinees with the most suited ability. Theoretical evidence shows advantages with using ability-dependent allocation of pretest items. However, it is not clear whether these theoretical results hold also in a real testing situation. In this paper, we investigate the performance of an optimal ability-dependent allocation in the context of the Swedish Scholastic Aptitude Test (SweSAT) and quantify the gain from using the optimal allocation. On average over all items, we see an improved precision of calibration. While this average improvement is moderate, we are able to identify for what kind of items the method works well. This enables targeting specific item types for optimal calibration. We also discuss possibilities for improvements of the method.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216261420758"},"PeriodicalIF":1.2,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12880929/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Influence of Uninformative Prior Distributions for MCMC Method on Estimating Variance Components in Generalizability Theory. 推广理论中MCMC方法的无信息先验分布对方差分量估计的影响
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2026-02-03 DOI: 10.1177/01466216261415631
Guangming Li
<p><p>The Markov chain Monte Carlo (MCMC) method is more and more widely used to estimate variance components in generalizability theory (GT). However, as an essential part of MCMC method, uninformative priors haven't been explored and different GT researches vary in the use of uninformative priors. This study focused on effect of the different uninformative priors on estimating variance components. Based on <i>p × i × r</i> design, eight uninformative prior distributions were chosen for simulation study and empirical study, including <math> <mrow><msup><mi>σ</mi> <mn>2</mn></msup> <mo>∼</mo> <mi>i</mi> <mi>n</mi> <mi>v</mi> <mo>-</mo> <mi>g</mi> <mi>a</mi> <mi>m</mi> <mi>m</mi> <mi>a</mi> <mrow><mo>(</mo> <mrow><mn>0.001</mn> <mo>,</mo> <mn>0.001</mn></mrow> <mo>)</mo></mrow> </mrow> </math> [prior 1], <math> <mrow><msup><mi>σ</mi> <mn>2</mn></msup> <mo>∼</mo> <mi>i</mi> <mi>n</mi> <mi>v</mi> <mo>-</mo> <mi>g</mi> <mi>a</mi> <mi>m</mi> <mi>m</mi> <mi>a</mi> <mrow><mo>(</mo> <mrow><mn>1</mn> <mo>,</mo> <mn>1</mn></mrow> <mo>)</mo></mrow> </mrow> </math> [prior 2], <math> <mrow> <msup><mrow><mo> </mo> <mi>σ</mi></mrow> <mn>2</mn></msup> <mo>∼</mo> <mi>u</mi> <mi>n</mi> <mi>i</mi> <mi>f</mi> <mi>o</mi> <mi>r</mi> <mi>m</mi> <mrow><mo>(</mo> <mrow><mn>0.001</mn> <mo>,</mo> <mn>1000</mn></mrow> <mo>)</mo></mrow> </mrow> </math> <b>[</b>prior 3<b>]</b>, <math><mrow><mi>σ</mi> <mo>∼</mo> <mi>u</mi> <mi>n</mi> <mi>i</mi> <mi>f</mi> <mi>o</mi> <mi>r</mi> <mi>m</mi> <mrow><mo>(</mo> <mrow><mn>0</mn> <mo>,</mo> <mn>100</mn></mrow> <mo>)</mo></mrow> </mrow> </math> [prior 4], <math><mrow><mi>log</mi> <mo>⁡</mo> <mrow><mo>(</mo> <msup><mi>σ</mi> <mn>2</mn></msup> <mo>)</mo></mrow> <mo>∼</mo> <mi>u</mi> <mi>n</mi> <mi>i</mi> <mi>f</mi> <mi>o</mi> <mi>r</mi> <mi>m</mi> <mrow><mo>(</mo> <mrow><mo>-</mo> <mn>10</mn> <mo>,</mo> <mn>10</mn></mrow> <mo>)</mo></mrow> </mrow> </math> [prior 5], <math> <mrow><mfrac><mn>1</mn> <msup><mi>σ</mi> <mn>2</mn></msup> </mfrac> <mo>∼</mo> <mi>p</mi> <mi>a</mi> <mi>r</mi> <mi>e</mi> <mi>t</mi> <mi>o</mi> <mrow><mo>(</mo> <mrow><mn>1</mn> <mo>,</mo> <mn>0.001</mn></mrow> <mo>)</mo></mrow> <mo> </mo> <mrow><mo>[</mo> <mrow><mtext>prior</mtext> <mo> </mo> <mn>6</mn></mrow> <mo>]</mo></mrow> </mrow> </math> , <math> <mrow> <mfrac><msup><mi>σ</mi> <mn>2</mn></msup> <msup><mrow><mo>(</mo> <mrow><msup><mi>σ</mi> <mn>2</mn></msup> <mo>+</mo> <msup><mi>τ</mi> <mn>2</mn></msup> </mrow> <mo>)</mo></mrow> <mn>2</mn></msup> </mfrac> <mo>∼</mo> <mi>u</mi> <mi>n</mi> <mi>i</mi> <mi>f</mi> <mi>o</mi> <mi>r</mi> <mi>m</mi></mrow> </math> [prior 7], and <math> <mrow> <mfrac><msup><mi>σ</mi> <mn>2</mn></msup> <msup><mrow><mn>2</mn> <mi>τ</mi> <mrow><mo>(</mo> <mrow><mi>σ</mi> <mo>+</mo> <mi>τ</mi></mrow> <mo>)</mo></mrow> </mrow> <mn>2</mn></msup> </mfrac> <mo>∼</mo> <mi>u</mi> <mi>n</mi> <mi>i</mi> <mi>f</mi> <mi>o</mi> <mi>r</mi> <mi>m</mi> <mrow><mo>(</mo> <mrow><mn>0</mn> <mo>,</mo> <mn>1</mn></mrow> <mo>)</mo></mrow> </mrow> </math> [prior 8
马尔可夫链蒙特卡罗(MCMC)方法是泛化理论中越来越广泛使用的方差分量估计方法。然而,作为MCMC方法的一个重要组成部分,非信息先验尚未得到研究,不同的GT研究对非信息先验的使用也各不相同。本文主要研究了不同的非信息先验对方差成分估计的影响。基于p××r设计、八不提供信息的先验分布为仿真研究和实证研究,选择包括σ2∼我n v - g m m(0.001, 0.001)[1]之前,σ2∼我n v - g m m(1, 1)[2]之前,σ2∼u n i f o r m(0.001, 1000)[3]之前,σ∼u n i f o r m(0, 100)[4]之前,日志⁡(σ2)∼u n i f o r m(- 10、10)[5]之前,1σ2∼p r e t o(0.001)之前[6],σ2(σ2 +τ2)2∼u n i f o r m[7]之前,和σ 22 τ (σ + τ) 2 ~ u n I f or m(0,1)[先验8]。并计算了完整数据和10%缺失/稀疏数据的三个后验点估计(即均值、中位数和众数)。经过仿真研究和实证研究,结果表明:(1)σ 2 ~ in v - g a m ma (0.001, 0.001) [prior 1]在大多数情况下的后验点估计性能最好且更稳定,而1 σ 2 ~ p ar o (1,0.001) [prior 6]总是最差的后验点估计;(2)不同方法的差异主要体现在方差分量σ i 2和σ r 2上,先验6存在明显的极值偏差,极值偏差最大可达281.09和167.59;(3)后验均值估计总是产生最大的偏差,但后验中值估计是最好的;(4)当方差分量的水平数较小时,无信息先验间方差分量的估计差异较大;(5)完整数据与10%缺失/稀疏数据的结果基本相同。少量的缺失/稀疏数据对结果的影响很小。这8个发行版的运行时间从489.78秒到692.58秒不等,彼此之间差别不大。
{"title":"Influence of Uninformative Prior Distributions for MCMC Method on Estimating Variance Components in Generalizability Theory.","authors":"Guangming Li","doi":"10.1177/01466216261415631","DOIUrl":"10.1177/01466216261415631","url":null,"abstract":"&lt;p&gt;&lt;p&gt;The Markov chain Monte Carlo (MCMC) method is more and more widely used to estimate variance components in generalizability theory (GT). However, as an essential part of MCMC method, uninformative priors haven't been explored and different GT researches vary in the use of uninformative priors. This study focused on effect of the different uninformative priors on estimating variance components. Based on &lt;i&gt;p × i × r&lt;/i&gt; design, eight uninformative prior distributions were chosen for simulation study and empirical study, including &lt;math&gt; &lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;σ&lt;/mi&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt; &lt;mo&gt;∼&lt;/mo&gt; &lt;mi&gt;i&lt;/mi&gt; &lt;mi&gt;n&lt;/mi&gt; &lt;mi&gt;v&lt;/mi&gt; &lt;mo&gt;-&lt;/mo&gt; &lt;mi&gt;g&lt;/mi&gt; &lt;mi&gt;a&lt;/mi&gt; &lt;mi&gt;m&lt;/mi&gt; &lt;mi&gt;m&lt;/mi&gt; &lt;mi&gt;a&lt;/mi&gt; &lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;mn&gt;0.001&lt;/mn&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mn&gt;0.001&lt;/mn&gt;&lt;/mrow&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;/mrow&gt; &lt;/math&gt; [prior 1], &lt;math&gt; &lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;σ&lt;/mi&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt; &lt;mo&gt;∼&lt;/mo&gt; &lt;mi&gt;i&lt;/mi&gt; &lt;mi&gt;n&lt;/mi&gt; &lt;mi&gt;v&lt;/mi&gt; &lt;mo&gt;-&lt;/mo&gt; &lt;mi&gt;g&lt;/mi&gt; &lt;mi&gt;a&lt;/mi&gt; &lt;mi&gt;m&lt;/mi&gt; &lt;mi&gt;m&lt;/mi&gt; &lt;mi&gt;a&lt;/mi&gt; &lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;/mrow&gt; &lt;/math&gt; [prior 2], &lt;math&gt; &lt;mrow&gt; &lt;msup&gt;&lt;mrow&gt;&lt;mo&gt; &lt;/mo&gt; &lt;mi&gt;σ&lt;/mi&gt;&lt;/mrow&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt; &lt;mo&gt;∼&lt;/mo&gt; &lt;mi&gt;u&lt;/mi&gt; &lt;mi&gt;n&lt;/mi&gt; &lt;mi&gt;i&lt;/mi&gt; &lt;mi&gt;f&lt;/mi&gt; &lt;mi&gt;o&lt;/mi&gt; &lt;mi&gt;r&lt;/mi&gt; &lt;mi&gt;m&lt;/mi&gt; &lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;mn&gt;0.001&lt;/mn&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mn&gt;1000&lt;/mn&gt;&lt;/mrow&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;/mrow&gt; &lt;/math&gt; &lt;b&gt;[&lt;/b&gt;prior 3&lt;b&gt;]&lt;/b&gt;, &lt;math&gt;&lt;mrow&gt;&lt;mi&gt;σ&lt;/mi&gt; &lt;mo&gt;∼&lt;/mo&gt; &lt;mi&gt;u&lt;/mi&gt; &lt;mi&gt;n&lt;/mi&gt; &lt;mi&gt;i&lt;/mi&gt; &lt;mi&gt;f&lt;/mi&gt; &lt;mi&gt;o&lt;/mi&gt; &lt;mi&gt;r&lt;/mi&gt; &lt;mi&gt;m&lt;/mi&gt; &lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;mn&gt;0&lt;/mn&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mn&gt;100&lt;/mn&gt;&lt;/mrow&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;/mrow&gt; &lt;/math&gt; [prior 4], &lt;math&gt;&lt;mrow&gt;&lt;mi&gt;log&lt;/mi&gt; &lt;mo&gt;⁡&lt;/mo&gt; &lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;msup&gt;&lt;mi&gt;σ&lt;/mi&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;mo&gt;∼&lt;/mo&gt; &lt;mi&gt;u&lt;/mi&gt; &lt;mi&gt;n&lt;/mi&gt; &lt;mi&gt;i&lt;/mi&gt; &lt;mi&gt;f&lt;/mi&gt; &lt;mi&gt;o&lt;/mi&gt; &lt;mi&gt;r&lt;/mi&gt; &lt;mi&gt;m&lt;/mi&gt; &lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;mo&gt;-&lt;/mo&gt; &lt;mn&gt;10&lt;/mn&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mn&gt;10&lt;/mn&gt;&lt;/mrow&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;/mrow&gt; &lt;/math&gt; [prior 5], &lt;math&gt; &lt;mrow&gt;&lt;mfrac&gt;&lt;mn&gt;1&lt;/mn&gt; &lt;msup&gt;&lt;mi&gt;σ&lt;/mi&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt; &lt;/mfrac&gt; &lt;mo&gt;∼&lt;/mo&gt; &lt;mi&gt;p&lt;/mi&gt; &lt;mi&gt;a&lt;/mi&gt; &lt;mi&gt;r&lt;/mi&gt; &lt;mi&gt;e&lt;/mi&gt; &lt;mi&gt;t&lt;/mi&gt; &lt;mi&gt;o&lt;/mi&gt; &lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;mn&gt;1&lt;/mn&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mn&gt;0.001&lt;/mn&gt;&lt;/mrow&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;mo&gt; &lt;/mo&gt; &lt;mrow&gt;&lt;mo&gt;[&lt;/mo&gt; &lt;mrow&gt;&lt;mtext&gt;prior&lt;/mtext&gt; &lt;mo&gt; &lt;/mo&gt; &lt;mn&gt;6&lt;/mn&gt;&lt;/mrow&gt; &lt;mo&gt;]&lt;/mo&gt;&lt;/mrow&gt; &lt;/mrow&gt; &lt;/math&gt; , &lt;math&gt; &lt;mrow&gt; &lt;mfrac&gt;&lt;msup&gt;&lt;mi&gt;σ&lt;/mi&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt; &lt;msup&gt;&lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;msup&gt;&lt;mi&gt;σ&lt;/mi&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt; &lt;mo&gt;+&lt;/mo&gt; &lt;msup&gt;&lt;mi&gt;τ&lt;/mi&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt; &lt;/mrow&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt; &lt;/mfrac&gt; &lt;mo&gt;∼&lt;/mo&gt; &lt;mi&gt;u&lt;/mi&gt; &lt;mi&gt;n&lt;/mi&gt; &lt;mi&gt;i&lt;/mi&gt; &lt;mi&gt;f&lt;/mi&gt; &lt;mi&gt;o&lt;/mi&gt; &lt;mi&gt;r&lt;/mi&gt; &lt;mi&gt;m&lt;/mi&gt;&lt;/mrow&gt; &lt;/math&gt; [prior 7], and &lt;math&gt; &lt;mrow&gt; &lt;mfrac&gt;&lt;msup&gt;&lt;mi&gt;σ&lt;/mi&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt; &lt;msup&gt;&lt;mrow&gt;&lt;mn&gt;2&lt;/mn&gt; &lt;mi&gt;τ&lt;/mi&gt; &lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;mi&gt;σ&lt;/mi&gt; &lt;mo&gt;+&lt;/mo&gt; &lt;mi&gt;τ&lt;/mi&gt;&lt;/mrow&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;/mrow&gt; &lt;mn&gt;2&lt;/mn&gt;&lt;/msup&gt; &lt;/mfrac&gt; &lt;mo&gt;∼&lt;/mo&gt; &lt;mi&gt;u&lt;/mi&gt; &lt;mi&gt;n&lt;/mi&gt; &lt;mi&gt;i&lt;/mi&gt; &lt;mi&gt;f&lt;/mi&gt; &lt;mi&gt;o&lt;/mi&gt; &lt;mi&gt;r&lt;/mi&gt; &lt;mi&gt;m&lt;/mi&gt; &lt;mrow&gt;&lt;mo&gt;(&lt;/mo&gt; &lt;mrow&gt;&lt;mn&gt;0&lt;/mn&gt; &lt;mo&gt;,&lt;/mo&gt; &lt;mn&gt;1&lt;/mn&gt;&lt;/mrow&gt; &lt;mo&gt;)&lt;/mo&gt;&lt;/mrow&gt; &lt;/mrow&gt; &lt;/math&gt; [prior 8","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216261415631"},"PeriodicalIF":1.2,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12867738/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating and Fitting the Non-continuous category scored Polytomous Items under the Weighted Score Logistic Model and its Simulation Study. 加权评分Logistic模型下非连续类别计分多同构项目的估计与拟合及其仿真研究。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2026-01-28 DOI: 10.1177/01466216261420305
Xiaozhu Jian, Buyun Dai, Yeqi Qing, YuanPing Deng

This study presents a novel extension of the weighted score logistic model (WSLM). The WSLM is an advancement of the traditional dichotomous logistic model that incorporates an additional weighted score parameter. This model is specifically designed to analyze non-continuous category scored polytomous items in educational and psychological testing contexts. Within the WSLM framework, the mean difficulty parameter reflects the overall item difficulty, while both discrimination and mean difficulty parameters are estimated using marginal maximum likelihood estimation. A Monte Carlo simulation study was conducted to evaluate the performance of the WSLM, which demonstrated low levels of bias and root mean square error (RMSE) of item parameters, indicative of accurate parameter recovery. Under most simulation conditions, the fit statistics Q1 and Q4 for polytomous items under the WSLM remained below their respective critical chi-square values, suggesting acceptable model-data fit. These results support the applicability and robustness of the WSLM in practical assessment settings involving complex scoring schemes.

本文提出了加权分数逻辑模型(WSLM)的一种新扩展。WSLM是传统二分逻辑模型的改进,它包含了一个额外的加权分数参数。这个模型是专门设计来分析教育和心理测试背景下的非连续类别得分的多分项目。在WSLM框架中,平均难度参数反映了整体项目难度,而判别和平均难度参数都是使用边际最大似然估计来估计的。通过蒙特卡罗模拟研究来评估WSLM的性能,结果表明,项目参数的偏差和均方根误差(RMSE)水平较低,表明参数恢复准确。在大多数模拟条件下,WSLM下的多同体项目的拟合统计量Q1和Q4保持在各自的临界卡方值以下,表明模型数据拟合可以接受。这些结果支持了WSLM在涉及复杂评分方案的实际评估设置中的适用性和鲁棒性。
{"title":"Estimating and Fitting the Non-continuous category scored Polytomous Items under the Weighted Score Logistic Model and its Simulation Study.","authors":"Xiaozhu Jian, Buyun Dai, Yeqi Qing, YuanPing Deng","doi":"10.1177/01466216261420305","DOIUrl":"https://doi.org/10.1177/01466216261420305","url":null,"abstract":"<p><p>This study presents a novel extension of the weighted score logistic model (WSLM). The WSLM is an advancement of the traditional dichotomous logistic model that incorporates an additional weighted score parameter. This model is specifically designed to analyze non-continuous category scored polytomous items in educational and psychological testing contexts. Within the WSLM framework, the mean difficulty parameter reflects the overall item difficulty, while both discrimination and mean difficulty parameters are estimated using marginal maximum likelihood estimation. A Monte Carlo simulation study was conducted to evaluate the performance of the WSLM, which demonstrated low levels of bias and root mean square error (RMSE) of item parameters, indicative of accurate parameter recovery. Under most simulation conditions, the fit statistics Q1 and Q4 for polytomous items under the WSLM remained below their respective critical chi-square values, suggesting acceptable model-data fit. These results support the applicability and robustness of the WSLM in practical assessment settings involving complex scoring schemes.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216261420305"},"PeriodicalIF":1.2,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12854999/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146108014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized Cohen's d for Multiple Means and Polytomous Settings. 多均值和多同构条件下的广义Cohen’s d。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2026-01-20 DOI: 10.1177/01466216261416025
Jari Metsämuuronen

Cohen's d is the most commonly used estimator to quantify the magnitude of the difference between the means of two subpopulations. When comparing multiple populations simultaneously, Cohen's f can be used for the same purpose. Using their relationship in the dichotomous setting, several general formulas for d are derived that generalize d to the polytomous setting. The traditional simplified estimator d = 2f is studied as a shortcut estimator. It is strongly recommended to use the general formulas instead of the simplified ones when assessing the magnitude of the effect size, especially when the discrepancy of the extreme proportions of cases in the subpopulations exceeds 0.40.

Cohen’s d是最常用的估计量,用于量化两个亚种群均值之差的大小。当同时比较多个种群时,Cohen's f也可以用于相同的目的。利用它们在二分环境中的关系,导出了d的几个一般公式,将d推广到多分环境。研究了传统的简化估计量d = 2f作为一种快捷估计量。在评估效应大小大小时,尤其当亚群中病例的极端比例差异超过0.40时,强烈建议使用一般公式而不是简化公式。
{"title":"Generalized Cohen's d for Multiple Means and Polytomous Settings.","authors":"Jari Metsämuuronen","doi":"10.1177/01466216261416025","DOIUrl":"10.1177/01466216261416025","url":null,"abstract":"<p><p>Cohen's <i>d</i> is the most commonly used estimator to quantify the magnitude of the difference between the means of two subpopulations. When comparing multiple populations simultaneously, Cohen's <i>f</i> can be used for the same purpose. Using their relationship in the dichotomous setting, several general formulas for <i>d</i> are derived that generalize <i>d</i> to the polytomous setting. The traditional simplified estimator <i>d</i> = 2<i>f</i> is studied as a shortcut estimator. It is strongly recommended to use the general formulas instead of the simplified ones when assessing the magnitude of the effect size, especially when the discrepancy of the extreme proportions of cases in the subpopulations exceeds 0.40.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216261416025"},"PeriodicalIF":1.2,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12819128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Calibrating Multidimensional Assessments With Structural Missingness: An Application of a Multiple-Group Higher-Order IRT Model. 用结构缺失校正多维评估:多组高阶IRT模型的应用。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2026-01-07 DOI: 10.1177/01466216251415011
Yale Quan, Chun Wang

Educational Constructs are becoming increasingly complex and are often conceptualized at both a general level and a subdomain level. It is often desirable to report scores from both levels simultaneously. However, to measure such complex constructs, a very large item bank that is hard for a student to complete in any reasonable timeframe is needed. Furthermore, most current score reporting practices either only report subdomain scores, or the general domain score is calculated post hoc. We propose that a multiple group HO-IRT model with structural missingness can be used to simultaneously report general and subdomain scores while controlling assessment length. Although the model itself is not new, we consider a novel application scenario using a NEAT design with both a representative and non-representative anchor test. While a representative anchor test is recommended in literature, it is sometimes unrealistic in practice when the multidimensional construct shifts over time. Hence, exploring the parameter recovery of multiple group HO-IRT in the presence of non-representative anchor test is especially interesting and important. We show, through Monte Carlo simulation, that the RMSE of IRT estimates retrieved under a non-representative anchor item set with a moderate correlation between the higher- and lower-order factors, is comparable to the RMSE of IRT estimates retrieved under a representative anchor item set. Missing data were addressed using a full-information maximum likelihood approach to parameter estimation.

教育结构正变得越来越复杂,并且经常在一般级别和子领域级别概念化。同时报告两个级别的分数通常是可取的。然而,要测量如此复杂的结构,需要一个非常大的题库,学生很难在任何合理的时间框架内完成。此外,大多数当前的分数报告实践要么只报告子领域分数,要么在事后计算一般领域分数。我们提出了一个具有结构缺失的多组HO-IRT模型,可以在控制评估长度的同时报告一般和子域分数。虽然模型本身并不新鲜,但我们考虑了一个使用具有代表性和非代表性锚点测试的NEAT设计的新颖应用场景。虽然有代表性的锚点测试在文献中被推荐,但在实践中,当多维结构随时间变化时,它有时是不现实的。因此,探索多组HO-IRT在非代表性锚检验下的参数恢复就显得尤为有趣和重要。我们通过蒙特卡罗模拟表明,在非代表性锚项目集下检索的IRT估计的RMSE与在代表性锚项目集下检索的IRT估计的RMSE具有较高和较低阶因素之间的适度相关性,可与代表性锚项目集检索的IRT估计的RMSE相比较。使用全信息最大似然方法对缺失数据进行参数估计。
{"title":"Calibrating Multidimensional Assessments With Structural Missingness: An Application of a Multiple-Group Higher-Order IRT Model.","authors":"Yale Quan, Chun Wang","doi":"10.1177/01466216251415011","DOIUrl":"10.1177/01466216251415011","url":null,"abstract":"<p><p>Educational Constructs are becoming increasingly complex and are often conceptualized at both a general level and a subdomain level. It is often desirable to report scores from both levels simultaneously. However, to measure such complex constructs, a very large item bank that is hard for a student to complete in any reasonable timeframe is needed. Furthermore, most current score reporting practices either only report subdomain scores, or the general domain score is calculated post hoc. We propose that a multiple group HO-IRT model with structural missingness can be used to simultaneously report general and subdomain scores while controlling assessment length. Although the model itself is not new, we consider a novel application scenario using a NEAT design with both a representative and non-representative anchor test. While a representative anchor test is recommended in literature, it is sometimes unrealistic in practice when the multidimensional construct shifts over time. Hence, exploring the parameter recovery of multiple group HO-IRT in the presence of non-representative anchor test is especially interesting and important. We show, through Monte Carlo simulation, that the RMSE of IRT estimates retrieved under a non-representative anchor item set with a moderate correlation between the higher- and lower-order factors, is comparable to the RMSE of IRT estimates retrieved under a representative anchor item set. Missing data were addressed using a full-information maximum likelihood approach to parameter estimation.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251415011"},"PeriodicalIF":1.2,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12779540/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145953603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Latent Trait Estimation in Multidimensional Forced Choice Measures: Latent Regression Multi-Unidimensional Pairwise Preference Model. 改进多维强迫选择度量中的潜在特征估计:潜在回归多单维两两偏好模型。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2026-01-03 DOI: 10.1177/01466216251415189
Sean Joo, Philseok Lee, Stephen Stark

The field of psychometrics has made remarkable progress in developing item response theory (IRT) models for analyzing multidimensional forced choice (MFC) measures. This study introduces an innovative method that enhances the latent trait estimation of the Multi-Unidimensional Pairwise Preference (MUPP) model by incorporating latent regression modeling. To validate the efficacy of the new method, we conducted a comprehensive simulation study. The results of the study provide compelling evidence that the proposed latent regression MUPP (LR-MUPP) model significantly improves the accuracy of the latent trait estimation. This study opens new avenues for future research and encourages further development and refinement of MFC IRT models and their applications.

心理测量学领域在建立项目反应理论(IRT)模型来分析多维强迫选择(MFC)测量方面取得了显著进展。本研究提出了一种创新的方法,结合潜在回归模型,提高了多维配对偏好(MUPP)模型的潜在性状估计。为了验证新方法的有效性,我们进行了全面的仿真研究。研究结果有力地证明,所提出的潜在回归MUPP (LR-MUPP)模型显著提高了潜在性状估计的准确性。本研究为未来的研究开辟了新的途径,并鼓励了MFC IRT模型及其应用的进一步发展和完善。
{"title":"Improving Latent Trait Estimation in Multidimensional Forced Choice Measures: Latent Regression Multi-Unidimensional Pairwise Preference Model.","authors":"Sean Joo, Philseok Lee, Stephen Stark","doi":"10.1177/01466216251415189","DOIUrl":"10.1177/01466216251415189","url":null,"abstract":"<p><p>The field of psychometrics has made remarkable progress in developing item response theory (IRT) models for analyzing multidimensional forced choice (MFC) measures. This study introduces an innovative method that enhances the latent trait estimation of the Multi-Unidimensional Pairwise Preference (MUPP) model by incorporating latent regression modeling. To validate the efficacy of the new method, we conducted a comprehensive simulation study. The results of the study provide compelling evidence that the proposed latent regression MUPP (LR-MUPP) model significantly improves the accuracy of the latent trait estimation. This study opens new avenues for future research and encourages further development and refinement of MFC IRT models and their applications.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251415189"},"PeriodicalIF":1.2,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12764422/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145907257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can the Generalized Graded Unfolding Model Fit Dominance Responses? 广义梯度展开模型能拟合优势反应吗?
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-12-07 DOI: 10.1177/01466216251401214
Jianbin Fu, Xuan Tan, Patrick C Kyllonen

Theoretically, the generalized graded unfolding model (GGUM) is more flexible than the generalized partial credit model (GPCM), a dominance model. For item responses generated by the GPCM, the GGUM estimations can generate overlapping item response curves with those from the GPCM over a range of latent trait scores covering almost all of the population. The discrimination and category threshold estimates from the two models are approximately equal. It is necessary to use an informative prior around an extreme location (e.g., 4 for a positive GPCM item) or fix the extreme locations in the GGUM estimation of GPCM items to achieve the desired estimation. The simulation study and the applications on two real datasets support the theoretical claims. Various practical implications are discussed, and suggestions for future research are provided.

从理论上讲,广义分级展开模型(GGUM)比优势模型广义部分信用模型(GPCM)更具灵活性。对于由GPCM生成的项目反应,GGUM估计可以在几乎覆盖所有群体的潜在特质得分范围内生成与GPCM的项目反应重叠的曲线。两种模型的判别和类别阈值估计大致相等。有必要在极端位置周围使用信息先验(例如,4为正GPCM项目)或固定GPCM项目的GGUM估计中的极端位置以实现所需的估计。仿真研究和在两个实际数据集上的应用支持了理论观点。讨论了各种实际意义,并对未来的研究提出了建议。
{"title":"Can the Generalized Graded Unfolding Model Fit Dominance Responses?","authors":"Jianbin Fu, Xuan Tan, Patrick C Kyllonen","doi":"10.1177/01466216251401214","DOIUrl":"10.1177/01466216251401214","url":null,"abstract":"<p><p>Theoretically, the generalized graded unfolding model (GGUM) is more flexible than the generalized partial credit model (GPCM), a dominance model. For item responses generated by the GPCM, the GGUM estimations can generate overlapping item response curves with those from the GPCM over a range of latent trait scores covering almost all of the population. The discrimination and category threshold estimates from the two models are approximately equal. It is necessary to use an informative prior around an extreme location (e.g., 4 for a positive GPCM item) or fix the extreme locations in the GGUM estimation of GPCM items to achieve the desired estimation. The simulation study and the applications on two real datasets support the theoretical claims. Various practical implications are discussed, and suggestions for future research are provided.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251401214"},"PeriodicalIF":1.2,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12682685/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145716376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1