首页 > 最新文献

Psychometrika最新文献

英文 中文
Rejoinder to McNeish and Mislevy: What Does Psychological Measurement Require? 对 McNeish 和 Mislevy 的反驳:心理测量需要什么?
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-30 DOI: 10.1007/s11336-024-10004-7
Klaas Sijtsma, Jules L Ellis, Denny Borsboom

In this rejoinder to McNeish (2024) and Mislevy (2024), who both responded to our focus article on the merits of the simple sum score (Sijtsma et al., 2024), we address several issues. Psychometrics education and in particular psychometricians' outreach may help researchers to use IRT models as a precursor for the responsible use of the latent variable score and the sum score. Different methods used for test and questionnaire construction often do not produce highly different results, and when they do, this may be due to an unarticulated attribute theory generating noisy data. The sum score and transformations thereof, such as normalized test scores and percentiles, may help test practitioners and their clients to better communicate results. Latent variables prove important in more advanced applications such as equating and adaptive testing where they serve as technical tools rather than communication devices. Decisions based on test results are often binary or use a rather coarse ordering of scale levels, hence, do not require a high level of granularity (but nevertheless need to be precise). A gap exists between psychology and psychometrics which is growing deeper and wider, and that needs to be bridged. Psychology and psychometrics must work together to attain this goal.

麦克尼什(McNeish,2024 年)和米斯莱维(Mislevy,2024 年)都对我们关于简单总分优点的重点文章(Sijtsma et al.心理测量学教育,特别是心理测量学家的宣传,可以帮助研究人员使用 IRT 模型,作为负责任地使用潜变量得分和总分的先导。不同的测验和问卷编制方法往往不会产生截然不同的结果,即使产生了截然不同的结果,也可能是由于未阐明的属性理论产生了嘈杂的数据。总分及其转换,如标准化测试分数和百分位数,可以帮助测试从业人员及其客户更好地交流测试结果。在更高级的应用中,如等差数列和适应性测试,潜变量被证明是重要的技术工具,而不是交流工具。根据测试结果做出的决定通常是二元的,或使用相当粗略的量表等级排序,因此不需要很高的粒度(但仍然需要精确)。心理学和心理测量学之间的差距越来越大,需要加以弥合。心理学和心理测量学必须共同努力实现这一目标。
{"title":"Rejoinder to McNeish and Mislevy: What Does Psychological Measurement Require?","authors":"Klaas Sijtsma, Jules L Ellis, Denny Borsboom","doi":"10.1007/s11336-024-10004-7","DOIUrl":"https://doi.org/10.1007/s11336-024-10004-7","url":null,"abstract":"<p><p>In this rejoinder to McNeish (2024) and Mislevy (2024), who both responded to our focus article on the merits of the simple sum score (Sijtsma et al., 2024), we address several issues. Psychometrics education and in particular psychometricians' outreach may help researchers to use IRT models as a precursor for the responsible use of the latent variable score and the sum score. Different methods used for test and questionnaire construction often do not produce highly different results, and when they do, this may be due to an unarticulated attribute theory generating noisy data. The sum score and transformations thereof, such as normalized test scores and percentiles, may help test practitioners and their clients to better communicate results. Latent variables prove important in more advanced applications such as equating and adaptive testing where they serve as technical tools rather than communication devices. Decisions based on test results are often binary or use a rather coarse ordering of scale levels, hence, do not require a high level of granularity (but nevertheless need to be precise). A gap exists between psychology and psychometrics which is growing deeper and wider, and that needs to be bridged. Psychology and psychometrics must work together to attain this goal.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142548948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Are Sum Scores a Great Accomplishment of Psychometrics or Intuitive Test Theory? 总分是心理测量学还是直觉测验理论的伟大成就?
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-22 DOI: 10.1007/s11336-024-10003-8
Robert J Mislevy

Sijtsma, Ellis, and Borsboom (Psychometrika, 89:84-117, 2024. https://doi.org/10.1007/s11336-024-09964-7 ) provide a thoughtful treatment in Psychometrika of the value and properties of sum scores and classical test theory at a depth at which few practicing psychometricians are familiar. In this note, I offer comments on their article from the perspective of evidentiary reasoning.

Sijtsma、Ellis 和 Borsboom (Psychometrika, 89:84-117, 2024. https://doi.org/10.1007/s11336-024-09964-7 ) 在《心理测量学》上对总分的价值和属性以及经典测验理论进行了深入的探讨,很少有实践心理测量学家会对这些内容感到熟悉。在本说明中,我将从证据推理的角度对他们的文章发表评论。
{"title":"Are Sum Scores a Great Accomplishment of Psychometrics or Intuitive Test Theory?","authors":"Robert J Mislevy","doi":"10.1007/s11336-024-10003-8","DOIUrl":"https://doi.org/10.1007/s11336-024-10003-8","url":null,"abstract":"<p><p>Sijtsma, Ellis, and Borsboom (Psychometrika, 89:84-117, 2024. https://doi.org/10.1007/s11336-024-09964-7 ) provide a thoughtful treatment in Psychometrika of the value and properties of sum scores and classical test theory at a depth at which few practicing psychometricians are familiar. In this note, I offer comments on their article from the perspective of evidentiary reasoning.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142481089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Generalized Structured Component Analysis Accommodating Convex Components: A Knowledge-Based Multivariate Method with Interpretable Composite Indexes. 更正:适应凸成分的广义结构成分分析:基于知识的多元方法与可解释的综合指数。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-14 DOI: 10.1007/s11336-024-10001-w
Gyeongcheol Cho, Heungsun Hwang
{"title":"Correction to: Generalized Structured Component Analysis Accommodating Convex Components: A Knowledge-Based Multivariate Method with Interpretable Composite Indexes.","authors":"Gyeongcheol Cho, Heungsun Hwang","doi":"10.1007/s11336-024-10001-w","DOIUrl":"https://doi.org/10.1007/s11336-024-10001-w","url":null,"abstract":"","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142301121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proof of Reliability Convergence to 1 at Rate of Spearman-Brown Formula for Random Test Forms and Irrespective of Item Pool Dimensionality. 证明随机测试形式的信度按斯皮尔曼-布朗公式的比率趋近于 1,且与项目池的维度无关。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-01 Epub Date: 2024-03-12 DOI: 10.1007/s11336-024-09956-7
Jules L Ellis, Klaas Sijtsma

It is shown that the psychometric test reliability, based on any true-score model with randomly sampled items and uncorrelated errors, converges to 1 as the test length goes to infinity, with probability 1, assuming some general regularity conditions. The asymptotic rate of convergence is given by the Spearman-Brown formula, and for this it is not needed that the items are parallel, or latent unidimensional, or even finite dimensional. Simulations with the 2-parameter logistic item response theory model reveal that the reliability of short multidimensional tests can be positively biased, meaning that applying the Spearman-Brown formula in these cases would lead to overprediction of the reliability that results from lengthening a test. However, test constructors of short tests generally aim for short tests that measure just one attribute, so that the bias problem may have little practical relevance. For short unidimensional tests under the 2-parameter logistic model reliability is almost unbiased, meaning that application of the Spearman-Brown formula in these cases of greater practical utility leads to predictions that are approximately unbiased.

研究表明,基于随机抽样项目和不相关误差的任何真分模型的心理测验信度,在假定一些一般规律性条件的情况下,随着测验长度的增加,信度收敛到无穷大,概率为 1。斯皮尔曼-布朗(Spearman-Brown)公式给出了渐近收敛率,为此不需要项目是平行的,或潜在单维的,甚至是有限维的。用 2 参数逻辑项目反应理论模型模拟的结果表明,短多维测验的信度可能是正偏的,也就是说,在这种情况下应用斯皮尔曼-布朗公式会导致对加长测验的信度预测过高。然而,简短测验的设计者一般都会设计只测量一种属性的简短测验,因此偏差问题可能与实际意义不大。对于 2 参数逻辑模型下的单维度简短测验,信度几乎是无偏的,这意味着在这些实用性更强的情况下,应用斯皮尔曼-布朗公式可以得出近似无偏的预测结果。
{"title":"Proof of Reliability Convergence to 1 at Rate of Spearman-Brown Formula for Random Test Forms and Irrespective of Item Pool Dimensionality.","authors":"Jules L Ellis, Klaas Sijtsma","doi":"10.1007/s11336-024-09956-7","DOIUrl":"10.1007/s11336-024-09956-7","url":null,"abstract":"<p><p>It is shown that the psychometric test reliability, based on any true-score model with randomly sampled items and uncorrelated errors, converges to 1 as the test length goes to infinity, with probability 1, assuming some general regularity conditions. The asymptotic rate of convergence is given by the Spearman-Brown formula, and for this it is not needed that the items are parallel, or latent unidimensional, or even finite dimensional. Simulations with the 2-parameter logistic item response theory model reveal that the reliability of short multidimensional tests can be positively biased, meaning that applying the Spearman-Brown formula in these cases would lead to overprediction of the reliability that results from lengthening a test. However, test constructors of short tests generally aim for short tests that measure just one attribute, so that the bias problem may have little practical relevance. For short unidimensional tests under the 2-parameter logistic model reliability is almost unbiased, meaning that application of the Spearman-Brown formula in these cases of greater practical utility leads to predictions that are approximately unbiased.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11458731/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140112220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diagnostic Classification Models for Testlets: Methods and Theory. 小测试的诊断分类模型:方法与理论
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-01 Epub Date: 2024-03-26 DOI: 10.1007/s11336-024-09962-9
Xin Xu, Guanhua Fang, Jinxin Guo, Zhiliang Ying, Susu Zhang

Diagnostic classification models (DCMs) have seen wide applications in educational and psychological measurement, especially in formative assessment. DCMs in the presence of testlets have been studied in recent literature. A key ingredient in the statistical modeling and analysis of testlet-based DCMs is the superposition of two latent structures, the attribute profile and the testlet effect. This paper extends the standard testlet DINA (T-DINA) model to accommodate the potential correlation between the two latent structures. Model identifiability is studied and a set of sufficient conditions are proposed. As a byproduct, the identifiability of the standard T-DINA is also established. The proposed model is applied to a dataset from the 2015 Programme for International Student Assessment. Comparisons are made with DINA and T-DINA, showing that there is substantial improvement in terms of the goodness of fit. Simulations are conducted to assess the performance of the new method under various settings.

诊断分类模型(DCM)已被广泛应用于教育和心理测量,尤其是形成性评估。最近的文献对存在小测试的 DCM 进行了研究。基于小测验的 DCMs 统计建模和分析的一个关键要素是两个潜在结构的叠加,即属性轮廓和小测验效应。本文扩展了标准小测试子 DINA(T-DINA)模型,以适应两个潜在结构之间的潜在相关性。本文对模型的可识别性进行了研究,并提出了一系列充分条件。作为副产品,还建立了标准 T-DINA 的可识别性。提出的模型被应用于 2015 年国际学生评估项目的数据集。与 DINA 和 T-DINA 进行了比较,结果表明在拟合优度方面有了很大改进。还进行了模拟,以评估新方法在各种设置下的性能。
{"title":"Diagnostic Classification Models for Testlets: Methods and Theory.","authors":"Xin Xu, Guanhua Fang, Jinxin Guo, Zhiliang Ying, Susu Zhang","doi":"10.1007/s11336-024-09962-9","DOIUrl":"10.1007/s11336-024-09962-9","url":null,"abstract":"<p><p>Diagnostic classification models (DCMs) have seen wide applications in educational and psychological measurement, especially in formative assessment. DCMs in the presence of testlets have been studied in recent literature. A key ingredient in the statistical modeling and analysis of testlet-based DCMs is the superposition of two latent structures, the attribute profile and the testlet effect. This paper extends the standard testlet DINA (T-DINA) model to accommodate the potential correlation between the two latent structures. Model identifiability is studied and a set of sufficient conditions are proposed. As a byproduct, the identifiability of the standard T-DINA is also established. The proposed model is applied to a dataset from the 2015 Programme for International Student Assessment. Comparisons are made with DINA and T-DINA, showing that there is substantial improvement in terms of the goodness of fit. Simulations are conducted to assess the performance of the new method under various settings.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140289617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Signal-to-Noise Ratio in Estimating and Testing the Mediation Effect: Structural Equation Modeling versus Path Analysis with Weighted Composites. 估计和检验中介效应的信噪比:结构方程模型与加权复合路径分析》。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-01 Epub Date: 2024-05-28 DOI: 10.1007/s11336-024-09975-4
Ke-Hai Yuan, Zhiyong Zhang, Lijuan Wang

Mediation analysis plays an important role in understanding causal processes in social and behavioral sciences. While path analysis with composite scores was criticized to yield biased parameter estimates when variables contain measurement errors, recent literature has pointed out that the population values of parameters of latent-variable models are determined by the subjectively assigned scales of the latent variables. Thus, conclusions in existing studies comparing structural equation modeling (SEM) and path analysis with weighted composites (PAWC) on the accuracy and precision of the estimates of the indirect effect in mediation analysis have little validity. Instead of comparing the size on estimates of the indirect effect between SEM and PAWC, this article compares parameter estimates by signal-to-noise ratio (SNR), which does not depend on the metrics of the latent variables once the anchors of the latent variables are determined. Results show that PAWC yields greater SNR than SEM in estimating and testing the indirect effect even when measurement errors exist. In particular, path analysis via factor scores almost always yields greater SNRs than SEM. Mediation analysis with equally weighted composites (EWCs) also more likely yields greater SNRs than SEM. Consequently, PAWC is statistically more efficient and more powerful than SEM in conducting mediation analysis in empirical research. The article also further studies conditions that cause SEM to have smaller SNRs, and results indicate that the advantage of PAWC becomes more obvious when there is a strong relationship between the predictor and the mediator, whereas the size of the prediction error in the mediator adversely affects the performance of the PAWC methodology. Results of a real-data example also support the conclusions.

中介分析在理解社会和行为科学的因果过程中发挥着重要作用。使用综合得分的路径分析被批评为在变量包含测量误差的情况下会产生有偏差的参数估计,而最近的文献则指出,潜变量模型参数的总体值是由主观分配的潜变量标度决定的。因此,现有研究在比较结构方程模型(SEM)和加权复合路径分析(PAWC)时,就中介分析中间接效应估计值的准确性和精确性得出的结论并不可靠。本文没有比较 SEM 和 PAWC 对间接效应估计值的大小,而是通过信噪比(SNR)对参数估计值进行了比较。结果表明,即使存在测量误差,在估计和检验间接效应时,PAWC 的信噪比也比 SEM 高。特别是,通过因子得分进行的路径分析几乎总能获得比 SEM 更大的信噪比。使用等权重复合材料(EWCs)进行中介分析也比 SEM 更有可能获得更高的信噪比。因此,在实证研究中进行中介分析时,PAWC 在统计上比 SEM 更有效、更强大。文章还进一步研究了导致 SEM SNR 较小的条件,结果表明,当预测因子和中介因子之间存在较强关系时,PAWC 的优势会更加明显,而中介因子预测误差的大小会对 PAWC 方法的性能产生不利影响。一个真实数据实例的结果也支持上述结论。
{"title":"Signal-to-Noise Ratio in Estimating and Testing the Mediation Effect: Structural Equation Modeling versus Path Analysis with Weighted Composites.","authors":"Ke-Hai Yuan, Zhiyong Zhang, Lijuan Wang","doi":"10.1007/s11336-024-09975-4","DOIUrl":"10.1007/s11336-024-09975-4","url":null,"abstract":"<p><p>Mediation analysis plays an important role in understanding causal processes in social and behavioral sciences. While path analysis with composite scores was criticized to yield biased parameter estimates when variables contain measurement errors, recent literature has pointed out that the population values of parameters of latent-variable models are determined by the subjectively assigned scales of the latent variables. Thus, conclusions in existing studies comparing structural equation modeling (SEM) and path analysis with weighted composites (PAWC) on the accuracy and precision of the estimates of the indirect effect in mediation analysis have little validity. Instead of comparing the size on estimates of the indirect effect between SEM and PAWC, this article compares parameter estimates by signal-to-noise ratio (SNR), which does not depend on the metrics of the latent variables once the anchors of the latent variables are determined. Results show that PAWC yields greater SNR than SEM in estimating and testing the indirect effect even when measurement errors exist. In particular, path analysis via factor scores almost always yields greater SNRs than SEM. Mediation analysis with equally weighted composites (EWCs) also more likely yields greater SNRs than SEM. Consequently, PAWC is statistically more efficient and more powerful than SEM in conducting mediation analysis in empirical research. The article also further studies conditions that cause SEM to have smaller SNRs, and results indicate that the advantage of PAWC becomes more obvious when there is a strong relationship between the predictor and the mediator, whereas the size of the prediction error in the mediator adversely affects the performance of the PAWC methodology. Results of a real-data example also support the conclusions.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11458674/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141162255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The InterModel Vigorish as a Lens for Understanding (and Quantifying) the Value of Item Response Models for Dichotomously Coded Items. 将模型间 Vigorish 作为了解(和量化)二分法编码项目的项目反应模型价值的透镜。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-01 Epub Date: 2024-06-03 DOI: 10.1007/s11336-024-09977-2
Benjamin W Domingue, Klint Kanopka, Radhika Kapoor, Steffi Pohl, R Philip Chalmers, Charles Rahal, Mijke Rhemtulla

The deployment of statistical models-such as those used in item response theory-necessitates the use of indices that are informative about the degree to which a given model is appropriate for a specific data context. We introduce the InterModel Vigorish (IMV) as an index that can be used to quantify accuracy for models of dichotomous item responses based on the improvement across two sets of predictions (i.e., predictions from two item response models or predictions from a single such model relative to prediction based on the mean). This index has a range of desirable features: It can be used for the comparison of non-nested models and its values are highly portable and generalizable. We use this fact to compare predictive performance across a variety of simulated data contexts and also demonstrate qualitative differences in behavior between the IMV and other common indices (e.g., the AIC and RMSEA). We also illustrate the utility of the IMV in empirical applications with data from 89 dichotomous item response datasets. These empirical applications help illustrate how the IMV can be used in practice and substantiate our claims regarding various aspects of model performance. These findings indicate that the IMV may be a useful indicator in psychometrics, especially as it allows for easy comparison of predictions across a variety of contexts.

统计模型(如项目反应理论中使用的统计模型)的应用需要使用一些指数,这些指数能说明特定模型在多大程度上适合特定的数据环境。我们引入了 "模型间差异"(InterModel Vigorish,IMV)指数,该指数可用于量化二分项目反应模型的准确性,其依据是两组预测(即来自两个项目反应模型的预测或来自单一此类模型的预测相对于基于平均值的预测)的改进。该指数具有一系列理想的特点:它可用于非嵌套模型的比较,其值具有很强的可移植性和通用性。我们利用这一事实来比较各种模拟数据背景下的预测性能,并展示了 IMV 与其他常用指数(如 AIC 和 RMSEA)在行为上的本质区别。我们还利用 89 个二分项目响应数据集的数据说明了 IMV 在经验应用中的实用性。这些实证应用有助于说明 IMV 在实践中的应用,并证实了我们对模型性能各个方面的主张。这些研究结果表明,IMV 可能是心理测量学中一个有用的指标,尤其是因为它可以方便地比较各种情况下的预测结果。
{"title":"The InterModel Vigorish as a Lens for Understanding (and Quantifying) the Value of Item Response Models for Dichotomously Coded Items.","authors":"Benjamin W Domingue, Klint Kanopka, Radhika Kapoor, Steffi Pohl, R Philip Chalmers, Charles Rahal, Mijke Rhemtulla","doi":"10.1007/s11336-024-09977-2","DOIUrl":"10.1007/s11336-024-09977-2","url":null,"abstract":"<p><p>The deployment of statistical models-such as those used in item response theory-necessitates the use of indices that are informative about the degree to which a given model is appropriate for a specific data context. We introduce the InterModel Vigorish (IMV) as an index that can be used to quantify accuracy for models of dichotomous item responses based on the improvement across two sets of predictions (i.e., predictions from two item response models or predictions from a single such model relative to prediction based on the mean). This index has a range of desirable features: It can be used for the comparison of non-nested models and its values are highly portable and generalizable. We use this fact to compare predictive performance across a variety of simulated data contexts and also demonstrate qualitative differences in behavior between the IMV and other common indices (e.g., the AIC and RMSEA). We also illustrate the utility of the IMV in empirical applications with data from 89 dichotomous item response datasets. These empirical applications help illustrate how the IMV can be used in practice and substantiate our claims regarding various aspects of model performance. These findings indicate that the IMV may be a useful indicator in psychometrics, especially as it allows for easy comparison of predictions across a variety of contexts.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141201301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Crosswise Model for Surveys on Sensitive Topics: A General Framework for Item Selection and Statistical Analysis. 敏感话题调查的 Crosswise 模型:项目选择和统计分析的总体框架。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-01 Epub Date: 2024-05-28 DOI: 10.1007/s11336-024-09976-3
Marco Gregori, Martijn G De Jong, Rik Pieters

When surveys contain direct questions about sensitive topics, participants may not provide their true answers. Indirect question techniques incentivize truthful answers by concealing participants' responses in various ways. The Crosswise Model aims to do this by pairing a sensitive target item with a non-sensitive baseline item, and only asking participants to indicate whether their responses to the two items are the same or different. Selection of the baseline item is crucial to guarantee participants' perceived and actual privacy and to enable reliable estimates of the sensitive trait. This research makes the following contributions. First, it describes an integrated methodology to select the baseline item, based on conceptual and statistical considerations. The resulting methodology distinguishes four statistical models. Second, it proposes novel Bayesian estimation methods to implement these models. Third, it shows that the new models introduced here improve efficiency over common applications of the Crosswise Model and may relax the required statistical assumptions. These three contributions facilitate applying the methodology in a variety of settings. An empirical application on attitudes toward LGBT issues shows the potential of the Crosswise Model. An interactive app, Python and MATLAB codes support broader adoption of the model.

当调查包含有关敏感话题的直接问题时,参与者可能不会提供真实答案。间接提问技术通过各种方式隐藏参与者的回答,从而鼓励参与者提供真实答案。Crosswise 模型旨在通过将敏感的目标项目与非敏感的基线项目配对,只要求参与者指出他们对这两个项目的回答是相同还是不同。基线项目的选择对于保证参与者的感知和实际隐私以及可靠估计敏感特质至关重要。本研究有以下贡献。首先,它描述了一种基于概念和统计考虑来选择基线项目的综合方法。由此产生的方法区分了四种统计模型。其次,它提出了新的贝叶斯估计方法来实现这些模型。第三,它表明这里引入的新模型比 Crosswise 模型的普通应用提高了效率,并可放宽所需的统计假设。这三点贡献有助于在各种环境中应用该方法。对 LGBT 问题态度的实证应用显示了 Crosswise 模型的潜力。交互式应用程序、Python 和 MATLAB 代码支持更广泛地采用该模型。
{"title":"The Crosswise Model for Surveys on Sensitive Topics: A General Framework for Item Selection and Statistical Analysis.","authors":"Marco Gregori, Martijn G De Jong, Rik Pieters","doi":"10.1007/s11336-024-09976-3","DOIUrl":"10.1007/s11336-024-09976-3","url":null,"abstract":"<p><p>When surveys contain direct questions about sensitive topics, participants may not provide their true answers. Indirect question techniques incentivize truthful answers by concealing participants' responses in various ways. The Crosswise Model aims to do this by pairing a sensitive target item with a non-sensitive baseline item, and only asking participants to indicate whether their responses to the two items are the same or different. Selection of the baseline item is crucial to guarantee participants' perceived and actual privacy and to enable reliable estimates of the sensitive trait. This research makes the following contributions. First, it describes an integrated methodology to select the baseline item, based on conceptual and statistical considerations. The resulting methodology distinguishes four statistical models. Second, it proposes novel Bayesian estimation methods to implement these models. Third, it shows that the new models introduced here improve efficiency over common applications of the Crosswise Model and may relax the required statistical assumptions. These three contributions facilitate applying the methodology in a variety of settings. An empirical application on attitudes toward LGBT issues shows the potential of the Crosswise Model. An interactive app, Python and MATLAB codes support broader adoption of the model.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11458659/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141162342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Remarks from the Editor-in-Chief. 主编致辞。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-01 DOI: 10.1007/s11336-024-10002-9
Sandip Sinharay
{"title":"Remarks from the Editor-in-Chief.","authors":"Sandip Sinharay","doi":"10.1007/s11336-024-10002-9","DOIUrl":"10.1007/s11336-024-10002-9","url":null,"abstract":"","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142301122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differential Item Functioning via Robust Scaling. 通过稳健缩放实现差异项目功能。
IF 2.9 2区 心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-01 Epub Date: 2024-05-04 DOI: 10.1007/s11336-024-09957-6
Peter F Halpin

This paper proposes a method for assessing differential item functioning (DIF) in item response theory (IRT) models. The method does not require pre-specification of anchor items, which is its main virtue. It is developed in two main steps: first by showing how DIF can be re-formulated as a problem of outlier detection in IRT-based scaling and then tackling the latter using methods from robust statistics. The proposal is a redescending M-estimator of IRT scaling parameters that is tuned to flag items with DIF at the desired asymptotic type I error rate. Theoretical results describe the efficiency of the estimator in the absence of DIF and its robustness in the presence of DIF. Simulation studies show that the proposed method compares favorably to currently available approaches for DIF detection, and a real data example illustrates its application in a research context where pre-specification of anchor items is infeasible. The focus of the paper is the two-parameter logistic model in two independent groups, with extensions to other settings considered in the conclusion.

本文提出了一种在项目反应理论(IRT)模型中评估差异项目功能(DIF)的方法。该方法无需预先指定锚项目,这是它的主要优点。该方法的开发主要分两步:首先说明如何将 DIF 重新表述为基于 IRT 的缩放中的离群点检测问题,然后使用稳健统计的方法解决后者。本文提出了一种 IRT 缩放参数的重新降序 M-估计器,该估计器经过调整,可以在所需的渐近 I 类错误率下标记出具有 DIF 的项目。理论结果描述了该估计器在无 DIF 时的效率以及在有 DIF 时的稳健性。模拟研究表明,与目前可用的 DIF 检测方法相比,所提出的方法更胜一筹,而且一个真实数据示例说明了该方法在预设锚点项目不可行的研究环境中的应用。本文的重点是两个独立组中的双参数逻辑模型,并在结论中考虑了对其他设置的扩展。
{"title":"Differential Item Functioning via Robust Scaling.","authors":"Peter F Halpin","doi":"10.1007/s11336-024-09957-6","DOIUrl":"10.1007/s11336-024-09957-6","url":null,"abstract":"<p><p>This paper proposes a method for assessing differential item functioning (DIF) in item response theory (IRT) models. The method does not require pre-specification of anchor items, which is its main virtue. It is developed in two main steps: first by showing how DIF can be re-formulated as a problem of outlier detection in IRT-based scaling and then tackling the latter using methods from robust statistics. The proposal is a redescending M-estimator of IRT scaling parameters that is tuned to flag items with DIF at the desired asymptotic type I error rate. Theoretical results describe the efficiency of the estimator in the absence of DIF and its robustness in the presence of DIF. Simulation studies show that the proposed method compares favorably to currently available approaches for DIF detection, and a real data example illustrates its application in a research context where pre-specification of anchor items is infeasible. The focus of the paper is the two-parameter logistic model in two independent groups, with extensions to other settings considered in the conclusion.</p>","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140860216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Psychometrika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1