首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
Discriminant Validity of Interval Response Formats: Investigating the Dimensional Structure of Interval Widths. 区间反应格式的区分效力:调查区间宽度的维度结构。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-25 DOI: 10.1177/00131644241283400
Matthias Kloft, Daniel W Heck

In psychological research, respondents are usually asked to answer questions with a single response value. A useful alternative are interval response formats like the dual-range slider (DRS) where respondents provide an interval with a lower and an upper bound for each item. Interval responses may be used to measure psychological constructs such as variability in the domain of personality (e.g., self-ratings), uncertainty in estimation tasks (e.g., forecasting), and ambiguity in judgments (e.g., concerning the pragmatic use of verbal quantifiers). However, it is unclear whether respondents are sensitive to the requirements of a particular task and whether interval widths actually measure the constructs of interest. To test the discriminant validity of interval widths, we conducted a study in which respondents answered 92 items belonging to seven different tasks from the domains of personality, estimation, and judgment. We investigated the dimensional structure of interval widths by fitting exploratory and confirmatory factor models while using an appropriate multivariate logit function to transform the bounded interval responses. The estimated factorial structure closely followed the theoretically assumed structure of the tasks, which varied in their degree of similarity. We did not find a strong overarching general factor, which speaks against a response style influencing interval widths across all tasks and domains. Overall, this indicates that respondents are sensitive to the requirements of different tasks and domains when using interval response formats.

在心理研究中,受访者通常会被要求用单一的回答值来回答问题。双区间滑动条(DRS)等区间回答格式是一种有用的替代方法,在这种方法中,受访者为每个项目提供一个具有下限和上限的区间。区间回答可用于测量心理结构,如人格领域中的变异性(如自我评价)、估计任务中的不确定性(如预测)以及判断中的模糊性(如有关言语量词的实际使用)。然而,目前还不清楚被调查者是否对特定任务的要求敏感,也不清楚区间宽度是否真正测量了所关注的结构。为了检验区间宽度的判别效度,我们进行了一项研究,受访者回答了属于人格、估计和判断等领域的七项不同任务的 92 个项目。我们通过拟合探索性和确认性因子模型来研究区间宽度的维度结构,同时使用适当的多元对数函数来转换有界区间的回答。估计的因子结构与理论上假设的任务结构密切相关,而任务的相似程度各不相同。我们并没有发现一个强有力的总体因素,这说明在所有任务和领域中,影响区间宽度的反应风格并不存在。总体而言,这表明受访者在使用间隔回答格式时对不同任务和领域的要求非常敏感。
{"title":"Discriminant Validity of Interval Response Formats: Investigating the Dimensional Structure of Interval Widths.","authors":"Matthias Kloft, Daniel W Heck","doi":"10.1177/00131644241283400","DOIUrl":"10.1177/00131644241283400","url":null,"abstract":"<p><p>In psychological research, respondents are usually asked to answer questions with a single response value. A useful alternative are interval response formats like the dual-range slider (DRS) where respondents provide an interval with a lower and an upper bound for each item. Interval responses may be used to measure psychological constructs such as variability in the domain of personality (e.g., self-ratings), uncertainty in estimation tasks (e.g., forecasting), and ambiguity in judgments (e.g., concerning the pragmatic use of verbal quantifiers). However, it is unclear whether respondents are sensitive to the requirements of a particular task and whether interval widths actually measure the constructs of interest. To test the discriminant validity of interval widths, we conducted a study in which respondents answered 92 items belonging to seven different tasks from the domains of personality, estimation, and judgment. We investigated the dimensional structure of interval widths by fitting exploratory and confirmatory factor models while using an appropriate multivariate logit function to transform the bounded interval responses. The estimated factorial structure closely followed the theoretically assumed structure of the tasks, which varied in their degree of similarity. We did not find a strong overarching general factor, which speaks against a response style influencing interval widths across all tasks and domains. Overall, this indicates that respondents are sensitive to the requirements of different tasks and domains when using interval response formats.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241283400"},"PeriodicalIF":2.1,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586930/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142727066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novick Meets Bayes: Improving the Assessment of Individual Students in Educational Practice and Research by Capitalizing on Assessors' Prior Beliefs. 诺维克与贝叶斯:利用评估者的先验信念,改进教育实践和研究中对学生个体的评估。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-25 DOI: 10.1177/00131644241296139
Steffen Zitzmann, Gabe A Orona, Julian F Lohmann, Christoph König, Lisa Bardach, Martin Hecht

The assessment of individual students is not only crucial in the school setting but also at the core of educational research. Although classical test theory focuses on maximizing insights from student responses, the Bayesian perspective incorporates the assessor's prior belief, thereby enriching assessment with knowledge gained from previous interactions with the student or with similar students. We propose and illustrate a formal Bayesian approach that not only allows to form a stronger belief about a student's competency but also offers a more accurate assessment than classical test theory. In addition, we propose a straightforward method for gauging prior beliefs using two specific items and point to the possibility to integrate additional information.

对学生个体的评估不仅在学校环境中至关重要,而且也是教育研究的核心。尽管经典测试理论侧重于最大限度地从学生的回答中获得启示,但贝叶斯视角将评估者的先验信念纳入其中,从而利用以前与学生或类似学生的互动中获得的知识丰富评估内容。我们提出并举例说明了一种正式的贝叶斯方法,这种方法不仅能让评估者对学生的能力形成更强的信念,还能提供比经典测试理论更准确的评估。此外,我们还提出了一种利用两个特定项目衡量先验信念的直接方法,并指出了整合其他信息的可能性。
{"title":"Novick Meets Bayes: Improving the Assessment of Individual Students in Educational Practice and Research by Capitalizing on Assessors' Prior Beliefs.","authors":"Steffen Zitzmann, Gabe A Orona, Julian F Lohmann, Christoph König, Lisa Bardach, Martin Hecht","doi":"10.1177/00131644241296139","DOIUrl":"10.1177/00131644241296139","url":null,"abstract":"<p><p>The assessment of individual students is not only crucial in the school setting but also at the core of educational research. Although classical test theory focuses on maximizing insights from student responses, the Bayesian perspective incorporates the assessor's prior belief, thereby enriching assessment with knowledge gained from previous interactions with the student or with similar students. We propose and illustrate a formal Bayesian approach that not only allows to form a stronger belief about a student's competency but also offers a more accurate assessment than classical test theory. In addition, we propose a straightforward method for gauging prior beliefs using two specific items and point to the possibility to integrate additional information.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241296139"},"PeriodicalIF":2.1,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11586934/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142727068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differential Item Functioning Effect Size Use for Validity Information. 差异项目功能效应大小用于有效性信息。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-22 DOI: 10.1177/00131644241293694
W Holmes Finch, Maria Dolores Hidalgo Montesinos, Brian F French, Maria Hernandez Finch

There has been an emphasis on effect sizes for differential item functioning (DIF) with the purpose to understand the magnitude of the differences that are detected through statistical significance testing. Several different effect sizes have been suggested that correspond to the method used for analysis, as have different guidelines for interpretation. The purpose of this simulation study was to compare the performance of the DIF effect size measures described for quantifying and comparing the amount of DIF in two assessments. Several factors were manipulated that were thought to influence the effect sizes or are known to influence DIF detection. This study asked the following two questions. First, do the effect sizes accurately capture aggregate DIF across items? Second, do effect sizes accurately identify which assessment has the least amount of DIF? We highlight effect sizes that had support for performing well across several simulated conditions. We also apply these effect sizes to a real data set to provide an example. Results of the study revealed that the log odds ratio of fixed effects (Ln OR ¯ FE ) and the variance of the Mantel-Haenszel log odds ratio ( τ ^ 2 ) were most accurate for identifying which test contains more DIF. We point to future directions with this work to aid the continued focus on effect sizes to understand DIF magnitude.

人们一直在强调差异项目功能(DIF)的效应大小,目的是了解通过统计显著性检验发现的差异的程度。根据分析方法的不同,提出了几种不同的效应大小,以及不同的解释准则。本模拟研究的目的是比较用于量化和比较两个评估中 DIF 量的 DIF 效果大小测量的性能。对一些被认为会影响效应大小或已知会影响 DIF 检测的因素进行了操作。本研究提出了以下两个问题。首先,效应大小是否准确地反映了各项目之间的总体 DIF?其次,效应大小是否能准确确定哪项评估的 DIF 量最少?我们强调了在几种模拟条件下表现良好的效应大小。我们还将这些效应量应用于一个真实数据集,以提供一个示例。研究结果表明,固定效应的对数几率比(Ln OR ¯ FE)和曼特尔-海恩泽尔对数几率比的方差(τ ^ 2)对于识别哪种测试包含更多的 DIF 最为准确。我们指出了这项工作的未来方向,有助于继续关注效应大小以了解 DIF 的程度。
{"title":"Differential Item Functioning Effect Size Use for Validity Information.","authors":"W Holmes Finch, Maria Dolores Hidalgo Montesinos, Brian F French, Maria Hernandez Finch","doi":"10.1177/00131644241293694","DOIUrl":"10.1177/00131644241293694","url":null,"abstract":"<p><p>There has been an emphasis on effect sizes for differential item functioning (DIF) with the purpose to understand the magnitude of the differences that are detected through statistical significance testing. Several different effect sizes have been suggested that correspond to the method used for analysis, as have different guidelines for interpretation. The purpose of this simulation study was to compare the performance of the DIF effect size measures described for quantifying and comparing the amount of DIF in two assessments. Several factors were manipulated that were thought to influence the effect sizes or are known to influence DIF detection. This study asked the following two questions. First, do the effect sizes accurately capture aggregate DIF across items? Second, do effect sizes accurately identify which assessment has the least amount of DIF? We highlight effect sizes that had support for performing well across several simulated conditions. We also apply these effect sizes to a real data set to provide an example. Results of the study revealed that the log odds ratio of fixed effects (Ln <math> <mrow> <msub> <mrow> <mover><mrow><mi>OR</mi></mrow> <mo>¯</mo></mover> </mrow> <mrow><mi>FE</mi></mrow> </msub> </mrow> </math> ) and the variance of the Mantel-Haenszel log odds ratio ( <math> <mrow> <msup> <mrow> <mover><mrow><mi>τ</mi></mrow> <mo>^</mo></mover> </mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> ) were most accurate for identifying which test contains more DIF. We point to future directions with this work to aid the continued focus on effect sizes to understand DIF magnitude.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241293694"},"PeriodicalIF":2.1,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583394/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142709569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Number of Replications for Obtaining Stable Dynamic Fit Index Cutoffs. 获得稳定动态拟合指数临界值的最佳重复次数
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-08 DOI: 10.1177/00131644241290172
Xinran Liu, Daniel McNeish

Factor analysis is commonly used in behavioral sciences to measure latent constructs, and researchers routinely consider approximate fit indices to ensure adequate model fit and to provide important validity evidence. Due to a lack of generalizable fit index cutoffs, methodologists suggest simulation-based methods to create customized cutoffs that allow researchers to assess model fit more accurately. However, simulation-based methods are computationally intensive. An open question is: How many simulation replications are needed for these custom cutoffs to stabilize? This Monte Carlo simulation study focuses on one such simulation-based method-dynamic fit index (DFI) cutoffs-to determine the optimal number of replications for obtaining stable cutoffs. Results indicated that the DFI approach generates stable cutoffs with 500 replications (the currently recommended number), but the process can be more efficient with fewer replications, especially in simulations with categorical data. Using fewer replications significantly reduces the computational time for determining cutoff values with minimal impact on the results. For one-factor or three-factor models, results suggested that in most conditions 200 DFI replications were optimal for balancing fit index cutoff stability and computational efficiency.

因子分析常用于行为科学中潜在结构的测量,研究人员通常会考虑近似的拟合指数,以确保模型充分拟合并提供重要的有效性证据。由于缺乏通用的拟合指数临界值,方法论专家建议采用基于模拟的方法来创建自定义临界值,以便研究人员更准确地评估模型拟合度。然而,基于模拟的方法需要大量计算。一个悬而未决的问题是:需要多少次模拟重复才能使这些自定义截断值趋于稳定?这项蒙特卡洛模拟研究主要针对这样一种基于模拟的方法--动态拟合指数(DFI)截断值,以确定获得稳定截断值的最佳重复次数。研究结果表明,DFI 方法可以通过 500 次重复(目前推荐的次数)生成稳定的临界值,但如果重复次数更少,这一过程的效率会更高,尤其是在使用分类数据进行模拟时。使用更少的重复次数可以大大减少确定临界值的计算时间,而对结果的影响却很小。对于单因素或三因素模型,结果表明,在大多数情况下,200 次 DFI 重复是兼顾拟合指数临界值稳定性和计算效率的最佳选择。
{"title":"Optimal Number of Replications for Obtaining Stable Dynamic Fit Index Cutoffs.","authors":"Xinran Liu, Daniel McNeish","doi":"10.1177/00131644241290172","DOIUrl":"10.1177/00131644241290172","url":null,"abstract":"<p><p>Factor analysis is commonly used in behavioral sciences to measure latent constructs, and researchers routinely consider approximate fit indices to ensure adequate model fit and to provide important validity evidence. Due to a lack of generalizable fit index cutoffs, methodologists suggest simulation-based methods to create customized cutoffs that allow researchers to assess model fit more accurately. However, simulation-based methods are computationally intensive. An open question is: How many simulation replications are needed for these custom cutoffs to stabilize? This Monte Carlo simulation study focuses on one such simulation-based method-dynamic fit index (DFI) cutoffs-to determine the optimal number of replications for obtaining stable cutoffs. Results indicated that the DFI approach generates stable cutoffs with 500 replications (the currently recommended number), but the process can be more efficient with fewer replications, especially in simulations with categorical data. Using fewer replications significantly reduces the computational time for determining cutoff values with minimal impact on the results. For one-factor or three-factor models, results suggested that in most conditions 200 DFI replications were optimal for balancing fit index cutoff stability and computational efficiency.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241290172"},"PeriodicalIF":2.1,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562945/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Invariance: What Does Measurement Invariance Allow Us to Claim? 不变性:测量不变性能让我们宣称什么?
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-28 DOI: 10.1177/00131644241282982
John Protzko

Measurement involves numerous theoretical and empirical steps-ensuring our measures are operating the same in different groups is one step. Measurement invariance occurs when the factor loadings and item intercepts or thresholds of a scale operate similarly for people at the same level of the latent variable in different groups. This is commonly assumed to mean the scale is measuring the same thing in those groups. Here we test the assumption of extending measurement invariance to mean common measurement by randomly assigning American adults (N = 1500) to fill out scales assessing a coherent factor (search for meaning in life) or a nonsense factor measuring nothing. We find a nonsense scale with items measuring nothing shows strong measurement invariance with the original scale, is reliable, and covaries with other constructs. We show measurement invariance can occur without measurement. Thus, we cannot infer that measurement invariance means one is measuring the same thing, it may be a necessary but not a sufficient condition.

测量涉及许多理论和经验步骤--确保我们的测量在不同群体中的操作相同就是其中一步。当一个量表的因子载荷和项目截距或阈值在不同群体中处于同一潜变量水平的人身上运行相似时,就会出现测量不变性。这通常被假定为量表在这些群体中测量的是相同的东西。在这里,我们通过随机分配美国成年人(N = 1500)填写量表,评估一个连贯因子(寻找人生意义)或一个什么都不测量的无意义因子,来检验将测量不变性扩展到共同测量的假设。我们发现,在无意义量表中,什么都不测量的项目显示出与原始量表很强的测量不变性、可靠性以及与其他结构的协变性。我们表明,测量不变性可以在没有测量的情况下发生。因此,我们不能推断测量不变性意味着测量的是同一事物,它可能是一个必要条件,但不是充分条件。
{"title":"Invariance: What Does Measurement Invariance Allow Us to Claim?","authors":"John Protzko","doi":"10.1177/00131644241282982","DOIUrl":"10.1177/00131644241282982","url":null,"abstract":"<p><p>Measurement involves numerous theoretical and empirical steps-ensuring our measures are operating the same in different groups is one step. Measurement invariance occurs when the factor loadings and item intercepts or thresholds of a scale operate similarly for people at the same level of the latent variable in different groups. This is commonly assumed to mean the scale is measuring the same thing in those groups. Here we test the assumption of extending measurement invariance to mean common measurement by randomly assigning American adults (<i>N</i> = 1500) to fill out scales assessing a coherent factor (search for meaning in life) or a nonsense factor measuring nothing. We find a nonsense scale with items measuring nothing shows strong measurement invariance with the original scale, is reliable, and covaries with other constructs. We show measurement invariance can occur without measurement. Thus, we cannot infer that measurement invariance means one is measuring the same thing, it may be a necessary but not a sufficient condition.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241282982"},"PeriodicalIF":2.1,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562939/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Differential Item Functioning Using Response Time. 利用响应时间检测项目功能差异。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-26 DOI: 10.1177/00131644241280400
Qizhou Duan, Ying Cheng

This study investigated uniform differential item functioning (DIF) detection in response times. We proposed a regression analysis approach with both the working speed and the group membership as independent variables, and logarithm transformed response times as the dependent variable. Effect size measures such as Δ R 2 and percentage change in regression coefficients in conjunction with the statistical significance tests were used to flag DIF items. A simulation study was conducted to assess the performance of three DIF detection criteria: (a) significance test, (b) significance test with Δ R 2 , and (c) significance test with the percentage change in regression coefficients. The simulation study considered factors such as sample sizes, proportion of the focal group in relation to total sample size, number of DIF items, and the amount of DIF. The results showed that the significance test alone was too strict; using the percentage change in regression coefficients as an effect size measure reduced the flagging rate when the sample size was large, but the effect was inconsistent across different conditions; using ΔR 2 with significance test reduced the flagging rate and was fairly consistent. The PISA 2018 data were used to illustrate the performance of the proposed method in a real dataset. Furthermore, we provide guidelines for conducting DIF studies with response time.

本研究调查了反应时间中的统一差异项目功能(DIF)检测。我们提出了一种回归分析方法,将工作速度和组员身份作为自变量,将对数转换后的反应时间作为因变量。我们使用Δ R 2 和回归系数变化百分比等效应大小指标,结合统计显著性检验来标记 DIF 项目。我们进行了一项模拟研究,以评估三种 DIF 检测标准的性能:(a) 显著性检验;(b) Δ R 2 的显著性检验;(c) 回归系数百分比变化的显著性检验。模拟研究考虑的因素包括样本量、焦点组在总样本量中所占比例、DIF 项目数和 DIF 量。结果表明,仅使用显著性检验过于严格;使用回归系数的百分比变化作为效应大小衡量标准,在样本量较大时可降低标记率,但在不同条件下效果不一致;使用ΔR 2 并进行显著性检验可降低标记率,且效果相当一致。我们使用 PISA 2018 数据来说明所提方法在真实数据集中的表现。此外,我们还提供了利用响应时间进行 DIF 研究的指南。
{"title":"Detecting Differential Item Functioning Using Response Time.","authors":"Qizhou Duan, Ying Cheng","doi":"10.1177/00131644241280400","DOIUrl":"10.1177/00131644241280400","url":null,"abstract":"<p><p>This study investigated uniform differential item functioning (DIF) detection in response times. We proposed a regression analysis approach with both the working speed and the group membership as independent variables, and logarithm transformed response times as the dependent variable. Effect size measures such as Δ <math> <mrow> <msup><mrow><mi>R</mi></mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> and percentage change in regression coefficients in conjunction with the statistical significance tests were used to flag DIF items. A simulation study was conducted to assess the performance of three DIF detection criteria: (a) significance test, (b) significance test with Δ <math> <mrow> <msup><mrow><mi>R</mi></mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> , and (c) significance test with the percentage change in regression coefficients. The simulation study considered factors such as sample sizes, proportion of the focal group in relation to total sample size, number of DIF items, and the amount of DIF. The results showed that the significance test alone was too strict; using the percentage change in regression coefficients as an effect size measure reduced the flagging rate when the sample size was large, but the effect was inconsistent across different conditions; using Δ<i>R</i> <sup>2</sup> with significance test reduced the flagging rate and was fairly consistent. The PISA 2018 data were used to illustrate the performance of the proposed method in a real dataset. Furthermore, we provide guidelines for conducting DIF studies with response time.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241280400"},"PeriodicalIF":2.1,"publicationDate":"2024-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562889/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Speed-Accuracy Tradeoff in Psychological Testing Using Experimental Manipulations. 利用实验操作评估心理测试中速度与准确性的权衡。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-07 DOI: 10.1177/00131644241271309
Tobias Alfers, Georg Gittler, Esther Ulitzsch, Steffi Pohl

The speed-accuracy tradeoff (SAT), where increased response speed often leads to decreased accuracy, is well established in experimental psychology. However, its implications for psychological assessments, especially in high-stakes settings, remain less understood. This study presents an experimental approach to investigate the SAT within a high-stakes spatial ability assessment. By manipulating instructions in a within-subjects design to induce speed variations in a large sample (N = 1,305) of applicants for an air traffic controller training program, we demonstrate the feasibility of manipulating working speed. Our findings confirm the presence of the SAT for most participants, suggesting that traditional ability scores may not fully reflect performance in high-stakes assessments. Importantly, we observed individual differences in the SAT, challenging the assumption of uniform SAT functions across test takers. These results highlight the complexity of interpreting high-stakes assessment outcomes and the influence of test conditions on performance dynamics. This study offers a valuable addition to the methodological toolkit for assessing the intraindividual relationship between speed and accuracy in psychological testing (including SAT research), providing a controlled approach while acknowledging the need to address potential confounders. Future research may apply this method across various cognitive domains, populations, and testing contexts to deepen our understanding of the SAT's broader implications for psychological measurement.

速度-准确性权衡(SAT),即反应速度的提高往往会导致准确性的降低,这在实验心理学中已得到公认。然而,它对心理测评的影响,尤其是在高风险环境中的影响,仍然鲜为人知。本研究介绍了一种在高风险空间能力评估中研究 SAT 的实验方法。通过在主体内设计中操纵指令,诱导大量(N = 1305)空中交通管制员培训项目申请者的速度变化,我们证明了操纵工作速度的可行性。我们的研究结果证实了大多数参与者的 SAT 存在,这表明传统的能力分数可能无法完全反映高风险评估中的表现。重要的是,我们观察到了 SAT 的个体差异,这挑战了不同应试者 SAT 功能一致的假设。这些结果凸显了解释高风险评估结果的复杂性,以及考试条件对成绩动态的影响。这项研究为评估心理测试(包括 SAT 研究)中速度和准确性之间的个体内部关系提供了一个宝贵的方法工具包,提供了一种受控方法,同时承认有必要解决潜在的混杂因素。未来的研究可能会在不同的认知领域、人群和测试环境中应用这种方法,以加深我们对 SAT 对心理测量的广泛影响的理解。
{"title":"Assessing the Speed-Accuracy Tradeoff in Psychological Testing Using Experimental Manipulations.","authors":"Tobias Alfers, Georg Gittler, Esther Ulitzsch, Steffi Pohl","doi":"10.1177/00131644241271309","DOIUrl":"10.1177/00131644241271309","url":null,"abstract":"<p><p>The speed-accuracy tradeoff (SAT), where increased response speed often leads to decreased accuracy, is well established in experimental psychology. However, its implications for psychological assessments, especially in high-stakes settings, remain less understood. This study presents an experimental approach to investigate the SAT within a high-stakes spatial ability assessment. By manipulating instructions in a within-subjects design to induce speed variations in a large sample (<i>N</i> = 1,305) of applicants for an air traffic controller training program, we demonstrate the feasibility of manipulating working speed. Our findings confirm the presence of the SAT for most participants, suggesting that traditional ability scores may not fully reflect performance in high-stakes assessments. Importantly, we observed individual differences in the SAT, challenging the assumption of uniform SAT functions across test takers. These results highlight the complexity of interpreting high-stakes assessment outcomes and the influence of test conditions on performance dynamics. This study offers a valuable addition to the methodological toolkit for assessing the intraindividual relationship between speed and accuracy in psychological testing (including SAT research), providing a controlled approach while acknowledging the need to address potential confounders. Future research may apply this method across various cognitive domains, populations, and testing contexts to deepen our understanding of the SAT's broader implications for psychological measurement.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241271309"},"PeriodicalIF":2.1,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562887/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Latent Structure Examination of Behavioral Measuring Instruments in Complex Empirical Settings. 论复杂实证环境中行为测量工具的潜在结构检查。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-07 DOI: 10.1177/00131644241281049
Tenko Raykov, Khaled Alkherainej

A multiple-step procedure is outlined that can be used for examining the latent structure of behavior measurement instruments in complex empirical settings. The method permits one to study their latent structure after assessing the need to account for clustering effects and the necessity of its examination within individual levels of fixed factors, such as gender or group membership of substantive relevance. The approach is readily applicable with binary or binary-scored items using popular and widely available software. The described procedure is illustrated with empirical data from a student behavior screening instrument.

本文概述了一种多步骤程序,可用于在复杂的实证环境中研究行为测量工具的潜在结构。在评估是否需要考虑聚类效应以及是否有必要在固定因素(如性别或具有实质性相关性的群体成员资格)的个体水平上对其进行检查之后,该方法允许人们对其潜在结构进行研究。这种方法很容易使用流行且广泛可用的软件来处理二元或二元评分项目。本文以一个学生行为筛查工具的经验数据来说明所述程序。
{"title":"On Latent Structure Examination of Behavioral Measuring Instruments in Complex Empirical Settings.","authors":"Tenko Raykov, Khaled Alkherainej","doi":"10.1177/00131644241281049","DOIUrl":"10.1177/00131644241281049","url":null,"abstract":"<p><p>A multiple-step procedure is outlined that can be used for examining the latent structure of behavior measurement instruments in complex empirical settings. The method permits one to study their latent structure after assessing the need to account for clustering effects and the necessity of its examination within individual levels of fixed factors, such as gender or group membership of substantive relevance. The approach is readily applicable with binary or binary-scored items using popular and widely available software. The described procedure is illustrated with empirical data from a student behavior screening instrument.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241281049"},"PeriodicalIF":2.1,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretation of the Standardized Mean Difference Effect Size When Distributions Are Not Normal or Homoscedastic. 当分布非正态分布或同态分布时,标准化均值差异效应大小的解释。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-06 DOI: 10.1177/00131644241278928
Larry V Hedges

The standardized mean difference (sometimes called Cohen's d) is an effect size measure widely used to describe the outcomes of experiments. It is mathematically natural to describe differences between groups of data that are normally distributed with different means but the same standard deviation. In that context, it can be interpreted as determining several indexes of overlap between the two distributions. If the data are not approximately normally distributed or if they have substantially unequal standard deviations, the relation between d and overlap between distributions can be very different, and interpretations of d that apply when the data are normal with equal variances are unreliable.

标准化均值差异(有时称为科恩 d)是一种效应大小测量方法,广泛用于描述实验结果。它在数学上很自然地用于描述具有不同均值但相同标准差的正态分布数据组之间的差异。在这种情况下,它可以解释为确定两个分布之间重叠的几个指数。如果数据不是近似正态分布,或者它们的标准差严重不等,那么 d 与分布间重叠度之间的关系就会截然不同,而适用于数据正态分布且方差相等时的 d 解释是不可靠的。
{"title":"Interpretation of the Standardized Mean Difference Effect Size When Distributions Are Not Normal or Homoscedastic.","authors":"Larry V Hedges","doi":"10.1177/00131644241278928","DOIUrl":"10.1177/00131644241278928","url":null,"abstract":"<p><p>The standardized mean difference (sometimes called Cohen's d) is an effect size measure widely used to describe the outcomes of experiments. It is mathematically natural to describe differences between groups of data that are normally distributed with different means but the same standard deviation. In that context, it can be interpreted as determining several indexes of overlap between the two distributions. If the data are not approximately normally distributed or if they have substantially unequal standard deviations, the relation between d and overlap between distributions can be very different, and interpretations of d that apply when the data are normal with equal variances are unreliable.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241278928"},"PeriodicalIF":2.1,"publicationDate":"2024-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562970/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Effort-Moderated Item Response Theory Models by Evaluating a Two-Step Estimation Method and Multidimensional Variations on the Model. 通过评估两步估算法和模型的多维变化,改进努力调节的项目反应理论模型。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-06 DOI: 10.1177/00131644241280727
Bowen Wang, Corinne Huggins-Manley, Huan Kuang, Jiawei Xiong

Rapid-guessing behavior in data can compromise our ability to estimate item and person parameters accurately. Consequently, it is crucial to model data with rapid-guessing patterns in a way that can produce unbiased ability estimates. This study proposes and evaluates three alternative modeling approaches that follow the logic of the effort-moderated item response theory model (EM-IRT) to analyze response data with rapid-guessing responses. One is the two-step EM-IRT model, which utilizes the item parameters estimated by respondents without rapid-guessing behavior and was initially proposed by Rios and Soland without further investigation. The other two models are effort-moderated multidimensional models (EM-MIRT), which we introduce in this study and vary as both between-item and within-item structures. The advantage of the EM-MIRT model is to account for the underlying relationship between rapid-guessing propensity and ability. The three models were compared with the traditional EM-IRT model regarding the accuracy of parameter recovery in various simulated conditions. Results demonstrated that the two-step EM-IRT and between-item EM-MIRT model consistently outperformed the traditional EM-IRT model under various conditions, with the two-step EM-IRT estimation generally delivering the best performance, especially for ability and item difficulty parameters estimation. In addition, different rapid-guessing patterns (i.e., difficulty-based, changing state, and decreasing effort) did not affect the performance of the two-step EM-IRT model. Overall, the findings suggest that the EM-IRT model with the two-step parameter estimation method can be applied in practice for estimating ability in the presence of rapid-guessing responses due to its accuracy and efficiency. The between-item EM-MIRT model can be used as an alternative model when there is no significant mean difference in the ability estimates between examinees who exhibit rapid-guessing behavior and those who do not.

数据中的快速猜测行为会影响我们准确估计项目和个人参数的能力。因此,对具有快速猜测模式的数据进行建模,使其能够产生无偏的能力估计值至关重要。本研究提出并评估了三种可供选择的建模方法,它们都遵循努力调节项目反应理论模型(EM-IRT)的逻辑,用于分析具有快速猜测反应的反应数据。其中一种是两步式 EM-IRT 模型,它利用的是没有快速猜测行为的被调查者所估计的项目参数,最初是由 Rios 和 Soland 提出的,没有经过进一步研究。另外两个模型是努力调节多维模型(EM-MIRT),我们在本研究中引入了这两个模型,它们既有项目间结构,也有项目内结构。EM-MIRT 模型的优点是考虑了快速猜测倾向与能力之间的内在关系。我们将这三种模型与传统的 EM-IRT 模型在各种模拟条件下的参数恢复准确性进行了比较。结果表明,在各种条件下,两步式 EM-IRT 模型和项目间 EM-MIRT 模型的性能始终优于传统的 EM-IRT 模型,其中两步式 EM-IRT 估计通常性能最佳,尤其是在能力和项目难度参数估计方面。此外,不同的快速猜测模式(即基于难度、改变状态和减少努力)并不影响两步式 EM-IRT 模型的性能。总之,研究结果表明,采用两步参数估计法的 EM-IRT 模型因其准确性和高效性,可实际用于存在快速猜测反应时的能力估计。当表现出快速猜测行为的考生与未表现出快速猜测行为的考生之间的能力估计平均值无显著差异时,可使用项目间 EM-MIRT 模型作为替代模型。
{"title":"Enhancing Effort-Moderated Item Response Theory Models by Evaluating a Two-Step Estimation Method and Multidimensional Variations on the Model.","authors":"Bowen Wang, Corinne Huggins-Manley, Huan Kuang, Jiawei Xiong","doi":"10.1177/00131644241280727","DOIUrl":"10.1177/00131644241280727","url":null,"abstract":"<p><p>Rapid-guessing behavior in data can compromise our ability to estimate item and person parameters accurately. Consequently, it is crucial to model data with rapid-guessing patterns in a way that can produce unbiased ability estimates. This study proposes and evaluates three alternative modeling approaches that follow the logic of the effort-moderated item response theory model (EM-IRT) to analyze response data with rapid-guessing responses. One is the two-step EM-IRT model, which utilizes the item parameters estimated by respondents without rapid-guessing behavior and was initially proposed by Rios and Soland without further investigation. The other two models are effort-moderated multidimensional models (EM-MIRT), which we introduce in this study and vary as both between-item and within-item structures. The advantage of the EM-MIRT model is to account for the underlying relationship between rapid-guessing propensity and ability. The three models were compared with the traditional EM-IRT model regarding the accuracy of parameter recovery in various simulated conditions. Results demonstrated that the two-step EM-IRT and between-item EM-MIRT model consistently outperformed the traditional EM-IRT model under various conditions, with the two-step EM-IRT estimation generally delivering the best performance, especially for ability and item difficulty parameters estimation. In addition, different rapid-guessing patterns (i.e., difficulty-based, changing state, and decreasing effort) did not affect the performance of the two-step EM-IRT model. Overall, the findings suggest that the EM-IRT model with the two-step parameter estimation method can be applied in practice for estimating ability in the presence of rapid-guessing responses due to its accuracy and efficiency. The between-item EM-MIRT model can be used as an alternative model when there is no significant mean difference in the ability estimates between examinees who exhibit rapid-guessing behavior and those who do not.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644241280727"},"PeriodicalIF":2.1,"publicationDate":"2024-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1