首页 > 最新文献

Applied Psychological Measurement最新文献

英文 中文
Item Response Modeling of Clinical Instruments With Filter Questions: Disentangling Symptom Presence and Severity. 带有过滤器问题的临床工具的项目反应模型:区分症状的存在与严重程度
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2024-09-01 Epub Date: 2024-06-17 DOI: 10.1177/01466216241261709
Brooke E Magnus

Clinical instruments that use a filter/follow-up response format often produce data with excess zeros, especially when administered to nonclinical samples. When the unidimensional graded response model (GRM) is then fit to these data, parameter estimates and scale scores tend to suggest that the instrument measures individual differences only among individuals with severe levels of the psychopathology. In such scenarios, alternative item response models that explicitly account for excess zeros may be more appropriate. The multivariate hurdle graded response model (MH-GRM), which has been previously proposed for handling zero-inflated questionnaire data, includes two latent variables: susceptibility, which underlies responses to the filter question, and severity, which underlies responses to the follow-up question. Using both simulated and empirical data, the current research shows that compared to unidimensional GRMs, the MH-GRM is better able to capture individual differences across a wider range of psychopathology, and that when unidimensional GRMs are fit to data from questionnaires that include filter questions, individual differences at the lower end of the severity continuum largely go unmeasured. Practical implications are discussed.

使用筛选/追踪反应格式的临床工具通常会产生过多零的数据,尤其是在对非临床样本进行施测时。如果将单维分级反应模型(GRM)与这些数据进行拟合,参数估计和量表得分往往表明,该工具只能测量具有严重心理病理学水平的个体之间的个体差异。在这种情况下,明确考虑多余零的替代项目反应模型可能更合适。多变量障碍分级反应模型(MH-GRM)是之前为处理零膨胀问卷数据而提出的,它包括两个潜变量:易感性和严重性,前者是对筛选问题的反应的基础,后者是对后续问题的反应的基础。通过模拟数据和经验数据,目前的研究表明,与单维 GRM 相比,MH-GRM 能够更好地捕捉更广泛的精神病理学中的个体差异,而且当单维 GRM 与包含过滤问题的问卷数据相匹配时,严重程度连续体低端的个体差异在很大程度上得不到测量。本文讨论了其实际意义。
{"title":"Item Response Modeling of Clinical Instruments With Filter Questions: Disentangling Symptom Presence and Severity.","authors":"Brooke E Magnus","doi":"10.1177/01466216241261709","DOIUrl":"10.1177/01466216241261709","url":null,"abstract":"<p><p>Clinical instruments that use a filter/follow-up response format often produce data with excess zeros, especially when administered to nonclinical samples. When the unidimensional graded response model (GRM) is then fit to these data, parameter estimates and scale scores tend to suggest that the instrument measures individual differences only among individuals with severe levels of the psychopathology. In such scenarios, alternative item response models that explicitly account for excess zeros may be more appropriate. The multivariate hurdle graded response model (MH-GRM), which has been previously proposed for handling zero-inflated questionnaire data, includes two latent variables: susceptibility, which underlies responses to the filter question, and severity, which underlies responses to the follow-up question. Using both simulated and empirical data, the current research shows that compared to unidimensional GRMs, the MH-GRM is better able to capture individual differences across a wider range of psychopathology, and that when unidimensional GRMs are fit to data from questionnaires that include filter questions, individual differences at the lower end of the severity continuum largely go unmeasured. Practical implications are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11331747/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142009739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Note on Standard Errors for Multidimensional Two-Parameter Logistic Models Using Gaussian Variational Estimation 关于使用高斯变异估计的多维双参数逻辑模型标准误差的说明
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2024-07-24 DOI: 10.1177/01466216241265757
Jiaying Xiao, Chun Wang, Gongjun Xu
Accurate item parameters and standard errors (SEs) are crucial for many multidimensional item response theory (MIRT) applications. A recent study proposed the Gaussian Variational Expectation Maximization (GVEM) algorithm to improve computational efficiency and estimation accuracy ( Cho et al., 2021 ). However, the SE estimation procedure has yet to be fully addressed. To tackle this issue, the present study proposed an updated supplemented expectation maximization (USEM) method and a bootstrap method for SE estimation. These two methods were compared in terms of SE recovery accuracy. The simulation results demonstrated that the GVEM algorithm with bootstrap and item priors (GVEM-BSP) outperformed the other methods, exhibiting less bias and relative bias for SE estimates under most conditions. Although the GVEM with USEM (GVEM-USEM) was the most computationally efficient method, it yielded an upward bias for SE estimates.
准确的项目参数和标准误差(SE)对许多多维项目反应理论(MIRT)的应用至关重要。最近的一项研究提出了高斯变分期望最大化(GVEM)算法,以提高计算效率和估计精度(Cho 等人,2021 年)。然而,SE 估算程序尚未得到充分解决。为解决这一问题,本研究提出了一种用于 SE 估计的更新补充期望最大化(USEM)方法和一种自举法。这两种方法在 SE 恢复精度方面进行了比较。模拟结果表明,带有自举和项目先验的 GVEM 算法(GVEM-BSP)优于其他方法,在大多数条件下,SE 估计的偏差和相对偏差都较小。虽然带有 USEM 的 GVEM 算法(GVEM-USEM)是计算效率最高的方法,但它产生了 SE 估计值的向上偏差。
{"title":"A Note on Standard Errors for Multidimensional Two-Parameter Logistic Models Using Gaussian Variational Estimation","authors":"Jiaying Xiao, Chun Wang, Gongjun Xu","doi":"10.1177/01466216241265757","DOIUrl":"https://doi.org/10.1177/01466216241265757","url":null,"abstract":"Accurate item parameters and standard errors (SEs) are crucial for many multidimensional item response theory (MIRT) applications. A recent study proposed the Gaussian Variational Expectation Maximization (GVEM) algorithm to improve computational efficiency and estimation accuracy ( Cho et al., 2021 ). However, the SE estimation procedure has yet to be fully addressed. To tackle this issue, the present study proposed an updated supplemented expectation maximization (USEM) method and a bootstrap method for SE estimation. These two methods were compared in terms of SE recovery accuracy. The simulation results demonstrated that the GVEM algorithm with bootstrap and item priors (GVEM-BSP) outperformed the other methods, exhibiting less bias and relative bias for SE estimates under most conditions. Although the GVEM with USEM (GVEM-USEM) was the most computationally efficient method, it yielded an upward bias for SE estimates.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141809630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measurement Invariance Testing Works 测量不变量测试的工作原理
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-06-14 DOI: 10.1177/01466216241261708
J. Lasker
Psychometricians have argued that measurement invariance (MI) testing is needed to know if the same psychological constructs are measured in different groups. Data from five experiments allowed that position to be tested. In the first, participants answered questionnaires on belief in free will and either the meaning of life or the meaning of a nonsense concept called “gavagai.” Since the meaning of life and the meaning of gavagai conceptually differ, MI should have been violated when groups were treated like their measurements were identical. MI was severely violated, indicating the questionnaires were interpreted differently. In the second and third experiments, participants were randomized to watch treatment videos explaining figural matrices rules or task-irrelevant control videos. Participants then took intelligence and figural matrices tests. The intervention worked and the experimental group had an additional influence on figural matrix performance in the form of knowing matrix rules, so their performance on the matrices tests violated MI and was anomalously high for their intelligence levels. In both experiments, MI was severely violated. In the fourth and fifth experiments, individuals were exposed to growth mindset interventions that a twin study revealed changed the amount of genetic variance in the target mindset measure without affecting other variables. When comparing treatment and control groups, MI was attainable before but not after treatment. Moreover, the control group showed longitudinal invariance, but the same was untrue for the treatment group. MI testing is likely able to show if the same things are measured in different groups.
心理测量学家认为,需要进行测量不变性(MI)测试,以了解不同群体是否测量了相同的心理结构。五项实验的数据对这一观点进行了检验。在第一项实验中,参与者回答了关于自由意志信念、生命意义或一个名为 "gavagai "的无意义概念的意义的问卷。由于 "生命的意义 "和 "gavagai "的意义在概念上是不同的,因此,当各组的测量结果完全相同时,MI 本应受到破坏。而 MI 遭到了严重破坏,这表明对问卷的解释是不同的。在第二和第三次实验中,参与者被随机分配观看解释图式矩阵规则的治疗视频或与任务无关的对照视频。参与者随后参加智力和图形矩阵测试。干预起了作用,实验组通过了解矩阵规则的形式对具象矩阵的表现产生了额外的影响,因此他们在矩阵测试中的表现违反了多元智能原理,并且与他们的智力水平相比异常偏高。在这两个实验中,都严重违反了多元智能。在第四项和第五项实验中,对个体进行了成长型思维模式干预,一项双胞胎研究显示,这种干预改变了目标思维模式测量的遗传变异量,而不影响其他变量。在比较治疗组和对照组时,MI 在治疗前是可以达到的,但在治疗后却无法达到。此外,对照组显示出纵向不变性,但治疗组的情况却并非如此。如果在不同的组别中测量相同的东西,多元智能测试很可能能够显示出这一点。
{"title":"Measurement Invariance Testing Works","authors":"J. Lasker","doi":"10.1177/01466216241261708","DOIUrl":"https://doi.org/10.1177/01466216241261708","url":null,"abstract":"Psychometricians have argued that measurement invariance (MI) testing is needed to know if the same psychological constructs are measured in different groups. Data from five experiments allowed that position to be tested. In the first, participants answered questionnaires on belief in free will and either the meaning of life or the meaning of a nonsense concept called “gavagai.” Since the meaning of life and the meaning of gavagai conceptually differ, MI should have been violated when groups were treated like their measurements were identical. MI was severely violated, indicating the questionnaires were interpreted differently. In the second and third experiments, participants were randomized to watch treatment videos explaining figural matrices rules or task-irrelevant control videos. Participants then took intelligence and figural matrices tests. The intervention worked and the experimental group had an additional influence on figural matrix performance in the form of knowing matrix rules, so their performance on the matrices tests violated MI and was anomalously high for their intelligence levels. In both experiments, MI was severely violated. In the fourth and fifth experiments, individuals were exposed to growth mindset interventions that a twin study revealed changed the amount of genetic variance in the target mindset measure without affecting other variables. When comparing treatment and control groups, MI was attainable before but not after treatment. Moreover, the control group showed longitudinal invariance, but the same was untrue for the treatment group. MI testing is likely able to show if the same things are measured in different groups.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141343348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accommodating and Extending Various Models for Special Effects Within the Generalized Partially Confirmatory Factor Analysis Framework 在广义部分确认因素分析框架内适应和扩展各种特殊效果模型
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-06-12 DOI: 10.1177/01466216241261704
Yifan Zhang, Jinsong Chen
Special measurement effects including the method and testlet effects are common issues in educational and psychological measurement. They are typically covered by various bifactor models or models for the multiple traits multiple methods (MTMM) structure for continuous data and by various testlet effect models for categorical data. However, existing models have some limitations in accommodating different type of effects. With slight modification, the generalized partially confirmatory factor analysis (GPCFA) framework can flexibly accommodate special effects for continuous and categorical cases with added benefits. Various bifactor, MTMM and testlet effect models can be linked to different variants of the revised GPCFA model. Compared to existing approaches, GPCFA offers multidimensionality for both the general and effect factors (or traits) and can address local dependence, mixed-type formats, and missingness jointly. Moreover, the partially confirmatory approach allows for regularization of the loading patterns, resulting in a simpler structure in both the general and special parts. We also provide a subroutine to compute the equivalent effect size. Simulation studies and real-data examples are used to demonstrate the performance and usefulness of the proposed approach under different situations.
包括方法效应和小测验效应在内的特殊测量效应是教育和心理测量中的常见问题。对于连续数据,通常有各种双因子模型或多特征多方法(MTMM)结构模型;对于分类数据,则有各种小测验效应模型。然而,现有模型在适应不同类型效应方面存在一些局限性。广义部分确证因子分析(GPCFA)框架稍加修改,就能灵活地适应连续和分类情况下的特殊效应,并带来更多好处。各种双因子、MTMM 和 testlet 效应模型都可以与修订后的 GPCFA 模型的不同变体相联系。与现有方法相比,GPCFA 提供了一般因素和效应因素(或特质)的多维性,并能共同解决局部依赖性、混合型格式和缺失问题。此外,部分确认法允许对载荷模式进行正则化,从而使一般和特殊部分的结构更加简单。我们还提供了一个计算等效效应大小的子程序。我们使用模拟研究和真实数据示例来证明所提方法在不同情况下的性能和实用性。
{"title":"Accommodating and Extending Various Models for Special Effects Within the Generalized Partially Confirmatory Factor Analysis Framework","authors":"Yifan Zhang, Jinsong Chen","doi":"10.1177/01466216241261704","DOIUrl":"https://doi.org/10.1177/01466216241261704","url":null,"abstract":"Special measurement effects including the method and testlet effects are common issues in educational and psychological measurement. They are typically covered by various bifactor models or models for the multiple traits multiple methods (MTMM) structure for continuous data and by various testlet effect models for categorical data. However, existing models have some limitations in accommodating different type of effects. With slight modification, the generalized partially confirmatory factor analysis (GPCFA) framework can flexibly accommodate special effects for continuous and categorical cases with added benefits. Various bifactor, MTMM and testlet effect models can be linked to different variants of the revised GPCFA model. Compared to existing approaches, GPCFA offers multidimensionality for both the general and effect factors (or traits) and can address local dependence, mixed-type formats, and missingness jointly. Moreover, the partially confirmatory approach allows for regularization of the loading patterns, resulting in a simpler structure in both the general and special parts. We also provide a subroutine to compute the equivalent effect size. Simulation studies and real-data examples are used to demonstrate the performance and usefulness of the proposed approach under different situations.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141353380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating Directional Invariance in an Item Response Tree Model for Extreme Response Style and Trait-Based Unfolding Responses 研究项目反应树模型中极端反应风格和基于特质的展开反应的方向不变性
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-06-11 DOI: 10.1177/01466216241261705
Siqi He, Justin L. Kern
Item response tree (IRTree) approaches have received increasing attention in the response style literature due to their capability to partial out response style latent traits from content-related latent traits by considering separate decisions for agreement and level of agreement. Additionally, it has shown that the functioning of the intensity of agreement decision may depend upon the agreement decision with an item, so that the item parameters and person parameters may differ by direction of agreement; when the parameters across direction are the same, this is called directional invariance. Furthermore, for non-cognitive psychological constructs, it has been argued that the response process may be best described as following an unfolding process. In this study, a family of IRTree models to handle unfolding responses with the agreement decision following the hyperbolic cosine model and the intensity of agreement decision following a graded response model is investigated. This model family also allows for investigation of item- and person-level directional invariance. A simulation study is conducted to evaluate parameter recovery; model parameters are estimated with a fully Bayesian approach using JAGS (Just Another Gibbs Sampler). The proposed modeling scheme is demonstrated with two data examples with multiple model comparisons allowing for varying levels of directional invariance and unfolding versus dominance processes. An approach to visualizing the final model item response functioning is also developed. The article closes with a short discussion about the results.
项目反应树(IRTree)方法通过考虑不同的同意决定和同意程度决定,能够从内容相关的潜在特质中分离出反应风格潜在特质,因此在反应风格文献中受到越来越多的关注。此外,研究表明,同意强度决定的功能可能取决于对项目的同意决定,因此项目参数和人的参数可能因同意方向的不同而不同;当不同方向的参数相同时,这被称为方向不变性。此外,对于非认知性心理建构而言,有人认为最好将反应过程描述为一个展开过程。在本研究中,我们研究了一个 IRTree 模型系列来处理展开式反应,其中同意决定采用双曲余弦模型,同意强度决定采用分级反应模型。该模型系列还可用于研究项目和个人层面的方向不变性。为评估参数恢复情况,进行了一项模拟研究;使用 JAGS(Just Another Gibbs Sampler,另一种吉布斯采样器)以完全贝叶斯方法估算模型参数。建议的建模方案通过两个数据示例进行了演示,并对多个模型进行了比较,以考虑不同程度的方向不变性和展开过程与优势过程。文章还提出了一种可视化最终模型项目反应功能的方法。文章最后对结果进行了简短讨论。
{"title":"Investigating Directional Invariance in an Item Response Tree Model for Extreme Response Style and Trait-Based Unfolding Responses","authors":"Siqi He, Justin L. Kern","doi":"10.1177/01466216241261705","DOIUrl":"https://doi.org/10.1177/01466216241261705","url":null,"abstract":"Item response tree (IRTree) approaches have received increasing attention in the response style literature due to their capability to partial out response style latent traits from content-related latent traits by considering separate decisions for agreement and level of agreement. Additionally, it has shown that the functioning of the intensity of agreement decision may depend upon the agreement decision with an item, so that the item parameters and person parameters may differ by direction of agreement; when the parameters across direction are the same, this is called directional invariance. Furthermore, for non-cognitive psychological constructs, it has been argued that the response process may be best described as following an unfolding process. In this study, a family of IRTree models to handle unfolding responses with the agreement decision following the hyperbolic cosine model and the intensity of agreement decision following a graded response model is investigated. This model family also allows for investigation of item- and person-level directional invariance. A simulation study is conducted to evaluate parameter recovery; model parameters are estimated with a fully Bayesian approach using JAGS (Just Another Gibbs Sampler). The proposed modeling scheme is demonstrated with two data examples with multiple model comparisons allowing for varying levels of directional invariance and unfolding versus dominance processes. An approach to visualizing the final model item response functioning is also developed. The article closes with a short discussion about the results.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141356467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
aberrance: An R Package for Detecting Aberrant Behavior in Test Data 畸变检测测试数据中异常行为的 R 软件包
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-06-05 DOI: 10.1177/01466216241261707
Kylie Gorney, Jiayi Deng
{"title":"aberrance: An R Package for Detecting Aberrant Behavior in Test Data","authors":"Kylie Gorney, Jiayi Deng","doi":"10.1177/01466216241261707","DOIUrl":"https://doi.org/10.1177/01466216241261707","url":null,"abstract":"","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141385802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Are Large-Scale Test Scores Comparable for At-Home Versus Test Center Testing? 家庭测试与测试中心测试的大规模测试成绩是否相当?
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-05-11 DOI: 10.1177/01466216241253795
Katherine E. Castellano, Matthew S. Johnson, Rene Lawless
The COVID-19 pandemic led to a proliferation of remote-proctored (or “at-home”) assessments. The lack of standardized setting, device, or in-person proctor during at-home testing makes it markedly distinct from testing at a test center. Comparability studies of at-home and test center scores are important in understanding whether these distinctions impact test scores. This study found no significant differences in at-home versus test center test scores on a large-scale admissions test using either a randomized controlled trial or an observational study after adjusting for differences in sample composition along baseline characteristics.
COVID-19 大流行导致了远程监考(或 "在家")评估的激增。在家测试缺乏标准化的环境、设备或亲自监考,因此与在考试中心进行的测试有明显区别。对在家考试和考试中心考试成绩的可比性研究对于了解这些区别是否会影响考试成绩非常重要。本研究采用随机对照试验或观察研究的方法,在对样本组成和基线特征的差异进行调整后,发现在大规模入学考试中,在家与考试中心的考试成绩没有明显差异。
{"title":"Are Large-Scale Test Scores Comparable for At-Home Versus Test Center Testing?","authors":"Katherine E. Castellano, Matthew S. Johnson, Rene Lawless","doi":"10.1177/01466216241253795","DOIUrl":"https://doi.org/10.1177/01466216241253795","url":null,"abstract":"The COVID-19 pandemic led to a proliferation of remote-proctored (or “at-home”) assessments. The lack of standardized setting, device, or in-person proctor during at-home testing makes it markedly distinct from testing at a test center. Comparability studies of at-home and test center scores are important in understanding whether these distinctions impact test scores. This study found no significant differences in at-home versus test center test scores on a large-scale admissions test using either a randomized controlled trial or an observational study after adjusting for differences in sample composition along baseline characteristics.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140989974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test Security and the Pandemic: Comparison of Test Center and Online Proctor Delivery Modalities 考试安全与大流行病:考试中心与在线监考模式的比较
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-04-23 DOI: 10.1177/01466216241248826
Kirk A. Becker, Jinghua Liu, Paul E. Jones
Published information is limited regarding the security of testing programs, and even less on the relative security of different testing modalities: in-person at test centers (TC) versus remote online proctored (OP) testing. This article begins by examining indicators of test security violations across a wide range of programs in professional, admissions, and IT fields. We look at high levels of response overlap as a potential indicator of collusion to cheat on the exam and compare rates by modality and between test center types. Next, we scrutinize indicators of potential test security violations for a single large testing program over the course of 14 months, during which the program went from exclusively in-person TC testing to a mix of OP and TC testing. Test security indicators include high response overlap, large numbers of fast correct responses, large numbers of slow correct responses, large test-retest score gains, unusually fast response times for passing candidates, and measures of differential person functioning. These indicators are examined and compared prior to and after the introduction of OP testing. In addition, test-retest modality is examined for candidates who fail and retest subsequent to the introduction of OP testing, with special attention paid to test takers who change modality between the initial attempt and the retest. These data allow us to understand whether indications of content exposure increase with the introduction of OP testing, and whether testing modalities affect potential score increase in a similar way.
有关考试项目安全性的公开信息十分有限,而有关不同考试模式的相对安全性的信息则更少:在考试中心(TC)进行的现场考试与远程在线监考(OP)考试。本文首先研究了专业、招生和 IT 领域中各种测试项目的测试安全违规指标。我们将高水平的答题重叠作为串通作弊的潜在指标,并比较了不同模式和不同类型考试中心的作弊率。接下来,我们仔细研究了一个大型考试项目在 14 个月内的潜在考试安全违规指标,在此期间,该项目从完全的面对面 TC 考试转变为 OP 和 TC 混合考试。测试安全指标包括:高应答重叠率、大量快速正确应答、大量慢速正确应答、测试后得分大幅提高、及格考生异常快速的应答时间,以及差异人功能的测量。在引入 OP 测试之前和之后,对这些指标进行了研究和比较。此外,我们还对 OP 测试引入后未通过测试和重测的考生的重测方式进行了研究,特别关注了在初次测试和重测之间改变测试方式的考生。通过这些数据,我们可以了解内容暴露的迹象是否会随着 OP 测试的引入而增加,以及测试模式是否会以类似的方式影响潜在分数的增加。
{"title":"Test Security and the Pandemic: Comparison of Test Center and Online Proctor Delivery Modalities","authors":"Kirk A. Becker, Jinghua Liu, Paul E. Jones","doi":"10.1177/01466216241248826","DOIUrl":"https://doi.org/10.1177/01466216241248826","url":null,"abstract":"Published information is limited regarding the security of testing programs, and even less on the relative security of different testing modalities: in-person at test centers (TC) versus remote online proctored (OP) testing. This article begins by examining indicators of test security violations across a wide range of programs in professional, admissions, and IT fields. We look at high levels of response overlap as a potential indicator of collusion to cheat on the exam and compare rates by modality and between test center types. Next, we scrutinize indicators of potential test security violations for a single large testing program over the course of 14 months, during which the program went from exclusively in-person TC testing to a mix of OP and TC testing. Test security indicators include high response overlap, large numbers of fast correct responses, large numbers of slow correct responses, large test-retest score gains, unusually fast response times for passing candidates, and measures of differential person functioning. These indicators are examined and compared prior to and after the introduction of OP testing. In addition, test-retest modality is examined for candidates who fail and retest subsequent to the introduction of OP testing, with special attention paid to test takers who change modality between the initial attempt and the retest. These data allow us to understand whether indications of content exposure increase with the introduction of OP testing, and whether testing modalities affect potential score increase in a similar way.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140667252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Scoring Approaches Impact Estimates of Growth in the Presence of Survey Item Ceiling Effects 在存在调查项目上限效应的情况下,计分方法如何影响增长估计值
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-03-16 DOI: 10.1177/01466216241238749
Kelly D. Edwards, J. Soland
Survey scores are often the basis for understanding how individuals grow psychologically and socio-emotionally. A known problem with many surveys is that the items are all “easy”—that is, individuals tend to use only the top one or two response categories on the Likert scale. Such an issue could be especially problematic, and lead to ceiling effects, when the same survey is administered repeatedly over time. In this study, we conduct simulation and empirical studies to (a) quantify the impact of these ceiling effects on growth estimates when using typical scoring approaches like sum scores and unidimensional item response theory (IRT) models and (b) examine whether approaches to survey design and scoring, including employing various longitudinal multidimensional IRT (MIRT) models, can mitigate any bias in growth estimates. We show that bias is substantial when using typical scoring approaches and that, while lengthening the survey helps somewhat, using a longitudinal MIRT model with plausible values scoring all but alleviates the issue. Results have implications for scoring surveys in growth studies going forward, as well as understanding how Likert item ceiling effects may be contributing to replication failures.
调查得分往往是了解个人心理和社会情感成长的基础。众所周知,许多调查存在一个问题,那就是调查项目都很 "简单"--也就是说,个人倾向于只使用李克特量表中最高的一到两个回答类别。当同一调查在一段时间内重复进行时,这个问题可能会特别严重,并导致天花板效应。在本研究中,我们进行了模拟和实证研究,以便:(a) 量化这些上限效应在使用典型计分方法(如总分和单维项目反应理论 (IRT) 模型)时对增长估计值的影响;(b) 检验调查设计和计分方法(包括采用各种纵向多维 IRT (MIRT) 模型)是否能减轻增长估计值的偏差。我们的研究表明,采用典型的计分方法会产生很大的偏差,虽然延长调查时间会有一定的帮助,但采用具有可信值计分的纵向 MIRT 模型几乎可以缓解这一问题。研究结果对今后成长研究中的调查评分以及了解李克特项目上限效应可能如何导致复制失败具有重要意义。
{"title":"How Scoring Approaches Impact Estimates of Growth in the Presence of Survey Item Ceiling Effects","authors":"Kelly D. Edwards, J. Soland","doi":"10.1177/01466216241238749","DOIUrl":"https://doi.org/10.1177/01466216241238749","url":null,"abstract":"Survey scores are often the basis for understanding how individuals grow psychologically and socio-emotionally. A known problem with many surveys is that the items are all “easy”—that is, individuals tend to use only the top one or two response categories on the Likert scale. Such an issue could be especially problematic, and lead to ceiling effects, when the same survey is administered repeatedly over time. In this study, we conduct simulation and empirical studies to (a) quantify the impact of these ceiling effects on growth estimates when using typical scoring approaches like sum scores and unidimensional item response theory (IRT) models and (b) examine whether approaches to survey design and scoring, including employing various longitudinal multidimensional IRT (MIRT) models, can mitigate any bias in growth estimates. We show that bias is substantial when using typical scoring approaches and that, while lengthening the survey helps somewhat, using a longitudinal MIRT model with plausible values scoring all but alleviates the issue. Results have implications for scoring surveys in growth studies going forward, as well as understanding how Likert item ceiling effects may be contributing to replication failures.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140236784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Douglas-Cohen IRT Goodness of Fit Measure With BIB Sampling of Items 利用 BIB 项目抽样评估道格拉斯-科恩 IRT 拟合度量法
IF 1.2 4区 心理学 Q2 Social Sciences Pub Date : 2024-03-14 DOI: 10.1177/01466216241238740
John R. Donoghue, Adrienne N. Sgammato
Methods to detect item response theory (IRT) item-level misfit are typically derived assuming fixed test forms. However, IRT is also employed with more complicated test designs, such as the balanced incomplete block (BIB) design used in large-scale educational assessments. This study investigates two modifications of Douglas and Cohen’s 2001 nonparametric method of assessing item misfit, based on A) using block total score and B) pooling booklet level scores for analyzing BIB data. Block-level scores showed extreme inflation of Type I error for short blocks containing 5 or 10 items. The pooled booklet method yielded Type I error rates close to nominal [Formula: see text] in most conditions and had power to detect misfitting items. The study also found that the Douglas and Cohen procedure is only slightly affected by the presence of other misfitting items in the block. The pooled booklet method is recommended for practical applications of Douglas and Cohen’s method with BIB data.
检测项目反应理论(IRT)项目级误差的方法通常是在假定测试形式固定的情况下得出的。然而,IRT 也适用于更复杂的测验设计,如大规模教育评估中使用的平衡不完全区组(BIB)设计。本研究调查了 Douglas 和 Cohen 2001 年评估项目不匹配度的非参数方法的两种修改方案,分别基于 A) 使用组块总分和 B) 汇总册级分数来分析 BIB 数据。对于包含 5 或 10 个项目的短块,块级得分显示出 I 类误差的极度膨胀。在大多数情况下,汇总的小册子方法产生的 I 类误差率接近名义误差率[公式:见正文],并且有能力检测出不匹配的项目。研究还发现,Douglas 和 Cohen 程序只会受到区块中存在其他不匹配项目的轻微影响。建议在实际应用道格拉斯和科恩的方法处理 BIB 数据时,采用集合小册子法。
{"title":"Evaluating the Douglas-Cohen IRT Goodness of Fit Measure With BIB Sampling of Items","authors":"John R. Donoghue, Adrienne N. Sgammato","doi":"10.1177/01466216241238740","DOIUrl":"https://doi.org/10.1177/01466216241238740","url":null,"abstract":"Methods to detect item response theory (IRT) item-level misfit are typically derived assuming fixed test forms. However, IRT is also employed with more complicated test designs, such as the balanced incomplete block (BIB) design used in large-scale educational assessments. This study investigates two modifications of Douglas and Cohen’s 2001 nonparametric method of assessing item misfit, based on A) using block total score and B) pooling booklet level scores for analyzing BIB data. Block-level scores showed extreme inflation of Type I error for short blocks containing 5 or 10 items. The pooled booklet method yielded Type I error rates close to nominal [Formula: see text] in most conditions and had power to detect misfitting items. The study also found that the Douglas and Cohen procedure is only slightly affected by the presence of other misfitting items in the block. The pooled booklet method is recommended for practical applications of Douglas and Cohen’s method with BIB data.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140243145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1