首页 > 最新文献

Applied Psychological Measurement最新文献

英文 中文
Impact of Parameter Predictability and Joint Modeling of Response Accuracy and Response Time on Ability Estimates. 参数可预测性及响应精度和响应时间联合建模对能力估计的影响。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-02-26 DOI: 10.1177/01466216251322646
Maryam Pezeshki, Susan Embretson

To maintain test quality, a large supply of items is typically desired. Automatic item generation can result in a reduction in cost and labor, especially if the generated items have predictable item parameters and thus possibly reducing or eliminating the need for empirical tryout. However, the effect of different levels of item parameter predictability on the accuracy of trait estimation using item response theory models is unclear. If predictability is lower, adding response time as a collateral source of information may mitigate the effect on trait estimation accuracy. The present study investigates the impact of varying item parameter predictability on trait estimation accuracy, along with the impact of adding response time as a collateral source of information. Results indicated that trait estimation accuracy using item family model-based item parameters differed only slightly from using known item parameters. Somewhat larger trait estimation errors resulted from using cognitive complexity features to predict item parameters. Further, adding response times to the model resulted in more accurate trait estimation for tests with lower item difficulty levels (e.g., achievement tests). Implications for item generation and response processes aspect of validity are discussed.

为了保持测试质量,通常需要大量的项目供应。自动项目生成可以减少成本和劳动力,特别是如果生成的项目具有可预测的项目参数,从而可能减少或消除经验试验的需要。然而,不同程度的项目参数可预测性对项目反应理论模型特质估计准确性的影响尚不清楚。如果可预测性较低,增加响应时间作为信息的附带来源可能会减轻对特征估计准确性的影响。本研究探讨了不同项目参数可预测性对特质估计准确性的影响,以及增加反应时间作为辅助信息源的影响。结果表明,使用基于项目族模型的项目参数与使用已知项目参数的特质估计准确率差异不大。使用认知复杂性特征预测项目参数会产生较大的特征估计误差。此外,在模型中增加反应时间,可以对项目难度较低的测试(例如成就测试)进行更准确的特征估计。讨论了项目生成和反应过程对效度的影响。
{"title":"Impact of Parameter Predictability and Joint Modeling of Response Accuracy and Response Time on Ability Estimates.","authors":"Maryam Pezeshki, Susan Embretson","doi":"10.1177/01466216251322646","DOIUrl":"https://doi.org/10.1177/01466216251322646","url":null,"abstract":"<p><p>To maintain test quality, a large supply of items is typically desired. Automatic item generation can result in a reduction in cost and labor, especially if the generated items have predictable item parameters and thus possibly reducing or eliminating the need for empirical tryout. However, the effect of different levels of item parameter predictability on the accuracy of trait estimation using item response theory models is unclear. If predictability is lower, adding response time as a collateral source of information may mitigate the effect on trait estimation accuracy. The present study investigates the impact of varying item parameter predictability on trait estimation accuracy, along with the impact of adding response time as a collateral source of information. Results indicated that trait estimation accuracy using item family model-based item parameters differed only slightly from using known item parameters. Somewhat larger trait estimation errors resulted from using cognitive complexity features to predict item parameters. Further, adding response times to the model resulted in more accurate trait estimation for tests with lower item difficulty levels (e.g., achievement tests). Implications for item generation and response processes aspect of validity are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251322646"},"PeriodicalIF":1.0,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11866334/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143543104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Few and Different: Detecting Examinees With Preknowledge Using Extended Isolation Forests. 少而不同:利用扩展隔离林检测有预见性的考生。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-02-20 DOI: 10.1177/01466216251320403
Nate R Smith, Lisa A Keller, Richard A Feinberg, Chunyan Liu

Item preknowledge refers to the case where examinees have advanced knowledge of test material prior to taking the examination. When examinees have item preknowledge, the scores that result from those item responses are not true reflections of the examinee's proficiency. Further, this contamination in the data also has an impact on the item parameter estimates and therefore has an impact on scores for all examinees, regardless of whether they had prior knowledge. To ensure the validity of test scores, it is essential to identify both issues: compromised items (CIs) and examinees with preknowledge (EWPs). In some cases, the CIs are known, and the task is reduced to determining the EWPs. However, due to the potential threat to validity, it is critical for high-stakes testing programs to have a process for routinely monitoring for evidence of EWPs, often when CIs are unknown. Further, even knowing that specific items may have been compromised does not guarantee that any examinees had prior access to those items, or that those examinees that did have prior access know how to effectively use the preknowledge. Therefore, this paper attempts to use response behavior to identify item preknowledge without knowledge of which items may or may not have been compromised. While most research in this area has relied on traditional psychometric models, we investigate the utility of an unsupervised machine learning algorithm, extended isolation forest (EIF), to detect EWPs. Similar to previous research, the response behavior being analyzed is response time (RT) and response accuracy (RA).

项目预知是指考生在参加考试前对考试材料有预先了解的情况。当考生有项目预知时,这些项目反应的得分并不能真实反映考生的熟练程度。此外,数据中的这种污染也会影响项目参数的估计,因此会影响所有考生的分数,无论他们是否有先验知识。为了确保考试成绩的有效性,有必要确定两个问题:妥协项目(ci)和有预见性的考生(ewp)。在某些情况下,ci是已知的,并且任务简化为确定ewp。然而,由于对有效性的潜在威胁,对于高风险的测试项目来说,有一个常规监测ewp证据的过程是至关重要的,通常在ci未知的情况下。此外,即使知道特定的项目可能已经被泄露,也不能保证任何考生都能事先接触到这些项目,或者那些事先接触到这些项目的考生知道如何有效地利用这些预知。因此,本文试图在不知道哪些项目可能或可能没有受到损害的情况下,使用反应行为来识别项目预知。虽然这一领域的大多数研究都依赖于传统的心理测量模型,但我们研究了一种无监督机器学习算法——扩展隔离森林(EIF)——来检测ewp的效用。与以往的研究类似,我们分析的反应行为是反应时间(RT)和反应准确性(RA)。
{"title":"Few and Different: Detecting Examinees With Preknowledge Using Extended Isolation Forests.","authors":"Nate R Smith, Lisa A Keller, Richard A Feinberg, Chunyan Liu","doi":"10.1177/01466216251320403","DOIUrl":"10.1177/01466216251320403","url":null,"abstract":"<p><p>Item preknowledge refers to the case where examinees have advanced knowledge of test material prior to taking the examination. When examinees have item preknowledge, the scores that result from those item responses are not true reflections of the examinee's proficiency. Further, this contamination in the data also has an impact on the item parameter estimates and therefore has an impact on scores for all examinees, regardless of whether they had prior knowledge. To ensure the validity of test scores, it is essential to identify both issues: compromised items (CIs) and examinees with preknowledge (EWPs). In some cases, the CIs are known, and the task is reduced to determining the EWPs. However, due to the potential threat to validity, it is critical for high-stakes testing programs to have a process for routinely monitoring for evidence of EWPs, often when CIs are unknown. Further, even knowing that specific items may have been compromised does not guarantee that any examinees had prior access to those items, or that those examinees that did have prior access know how to effectively use the preknowledge. Therefore, this paper attempts to use response behavior to identify item preknowledge without knowledge of which items may or may not have been compromised. While most research in this area has relied on traditional psychometric models, we investigate the utility of an unsupervised machine learning algorithm, extended isolation forest (EIF), to detect EWPs. Similar to previous research, the response behavior being analyzed is response time (RT) and response accuracy (RA).</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251320403"},"PeriodicalIF":1.0,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11843570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143484553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of Bayesian Decision Theory in Detecting Test Fraud. 贝叶斯决策理论在考试作弊检测中的应用。
IF 1 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2025-01-27 DOI: 10.1177/01466216251316559
Sandip Sinharay, Matthew S Johnson

This article suggests a new approach based on Bayesian decision theory (e.g., Cronbach & Gleser, 1965; Ferguson, 1967) for detection of test fraud. The approach leads to a simple decision rule that involves the computation of the posterior probability that an examinee committed test fraud given the data. The suggested approach was applied to a real data set that involved actual test fraud.

本文提出了一种基于贝叶斯决策理论的新方法(例如,Cronbach & Gleser, 1965;Ferguson, 1967)用于检测考试作弊。这种方法产生了一个简单的决策规则,该规则涉及计算给定数据的考生考试作弊的后验概率。建议的方法被应用于涉及实际考试作弊的真实数据集。
{"title":"Application of Bayesian Decision Theory in Detecting Test Fraud.","authors":"Sandip Sinharay, Matthew S Johnson","doi":"10.1177/01466216251316559","DOIUrl":"https://doi.org/10.1177/01466216251316559","url":null,"abstract":"<p><p>This article suggests a new approach based on Bayesian decision theory (e.g., Cronbach & Gleser, 1965; Ferguson, 1967) for detection of test fraud. The approach leads to a simple decision rule that involves the computation of the posterior probability that an examinee committed test fraud given the data. The suggested approach was applied to a real data set that involved actual test fraud.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":" ","pages":"01466216251316559"},"PeriodicalIF":1.0,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11773507/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Construct Validity of Instructional Manipulation Checks as Measures of Careless Responding to Surveys. 评估教学操纵检查的结构有效性,以此衡量对调查的粗心答复。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2024-11-01 Epub Date: 2024-09-20 DOI: 10.1177/01466216241284293
Mark C Ramsey, Nathan A Bowling, Preston S Menke

Careless responding measures are important for several purposes, whether it's screening for careless responding or for research centered on careless responding as a substantive variable. One such approach for assessing carelessness in surveys is the use of an instructional manipulation check. Despite its apparent popularity, little is known about the construct validity of instructional manipulation checks as measures of careless responding. Initial results are inconclusive, and no study has thoroughly evaluated the validity of the instructional manipulation check as a measure of careless responding. Across 2 samples (N = 762), we evaluated the construct validity of the instructional manipulation check under a nomological network. We found that the instructional manipulation check converged poorly with other measures of careless responding, weakly predicted participant inability to recognize study content, and did not display incremental validity over existing measures of careless responding. Additional analyses revealed that instructional manipulation checks performed poorly compared to single scores of other alternative careless responding measures and that screening data with alternative measures of careless responding produced greater or similar gains in data quality to instructional manipulation checks. Based on the results of our studies, we do not recommend using instructional manipulation checks to assess or screen for careless responding to surveys.

无论是筛查粗心应答,还是将粗心应答作为一个实质性变量进行研究,粗心应答测量方法在多个方面都非常重要。在调查中评估粗心的一种方法是使用指导性操作检查。尽管这种方法显然很受欢迎,但人们对教学操纵检查作为粗心应答测量方法的构建有效性知之甚少。初步结果尚无定论,也没有研究对教学操纵检查作为粗心应答测量方法的有效性进行全面评估。我们通过 2 个样本(N = 762),评估了名义网络下教学操纵检查的建构有效性。我们发现,教学操纵检查与其他粗心应答测量的收敛性较差,对被试无法识别学习内容的预测能力较弱,而且与现有的粗心应答测量相比,没有显示出增量有效性。其他分析表明,与其他粗心应答测量方法的单项得分相比,指导性操作检查的表现很差,而用其他粗心应答测量方法筛选数据,在数据质量方面的收益要大于或类似于指导性操作检查。根据我们的研究结果,我们不建议使用教学操作检查来评估或筛查粗心应答调查。
{"title":"Evaluating the Construct Validity of Instructional Manipulation Checks as Measures of Careless Responding to Surveys.","authors":"Mark C Ramsey, Nathan A Bowling, Preston S Menke","doi":"10.1177/01466216241284293","DOIUrl":"10.1177/01466216241284293","url":null,"abstract":"<p><p>Careless responding measures are important for several purposes, whether it's screening for careless responding or for research centered on careless responding as a substantive variable. One such approach for assessing carelessness in surveys is the use of an instructional manipulation check. Despite its apparent popularity, little is known about the construct validity of instructional manipulation checks as measures of careless responding. Initial results are inconclusive, and no study has thoroughly evaluated the validity of the instructional manipulation check as a measure of careless responding. Across 2 samples (<i>N</i> = 762), we evaluated the construct validity of the instructional manipulation check under a nomological network. We found that the instructional manipulation check converged poorly with other measures of careless responding, weakly predicted participant inability to recognize study content, and did not display incremental validity over existing measures of careless responding. Additional analyses revealed that instructional manipulation checks performed poorly compared to single scores of other alternative careless responding measures and that screening data with alternative measures of careless responding produced greater or similar gains in data quality to instructional manipulation checks. Based on the results of our studies, we do not recommend using instructional manipulation checks to assess or screen for careless responding to surveys.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"48 7-8","pages":"341-356"},"PeriodicalIF":1.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11501094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142510499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating Test-Retest Reliability in the Presence of Self-Selection Bias and Learning/Practice Effects. 估计存在自我选择偏差和学习/练习效应时的测验-重测信度。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2024-11-01 Epub Date: 2024-09-17 DOI: 10.1177/01466216241284585
William C M Belzak, J R Lockwood

Test-retest reliability is often estimated using naturally occurring data from test repeaters. In settings such as admissions testing, test takers choose if and when to retake an assessment. This self-selection can bias estimates of test-retest reliability because individuals who choose to retest are typically unrepresentative of the broader testing population and because differences among test takers in learning or practice effects may increase with time between test administrations. We develop a set of methods for estimating test-retest reliability from observational data that can mitigate these sources of bias, which include sample weighting, polynomial regression, and Bayesian model averaging. We demonstrate the value of using these methods for reducing bias and improving precision of estimated reliability using empirical and simulated data, both of which are based on more than 40,000 repeaters of a high-stakes English language proficiency test. Finally, these methods generalize to settings in which only a single, error-prone measurement is taken repeatedly over time and where self-selection and/or changes to the underlying construct may be at play.

重测信度通常是利用重测者的自然数据来估算的。在入学考试等情况下,考生会选择是否重考以及何时重考。这种自我选择可能会对重测信度的估计产生偏差,因为选择重测的人通常不能代表更广泛的测试人群,而且随着两次测试之间时间的推移,应试者之间在学习或练习效果方面的差异可能会增大。我们开发了一套从观察数据中估计重测信度的方法,可以减少这些偏差来源,其中包括样本加权、多项式回归和贝叶斯模型平均。我们使用经验数据和模拟数据(均基于一项高风险英语语言能力测试的 40,000 多名重测者)证明了使用这些方法在减少偏差和提高估计信度精度方面的价值。最后,这些方法适用于在一段时间内只重复进行单一的、易出错的测量,以及可能存在自我选择和/或基础结构变化的情况。
{"title":"Estimating Test-Retest Reliability in the Presence of Self-Selection Bias and Learning/Practice Effects.","authors":"William C M Belzak, J R Lockwood","doi":"10.1177/01466216241284585","DOIUrl":"10.1177/01466216241284585","url":null,"abstract":"<p><p>Test-retest reliability is often estimated using naturally occurring data from test repeaters. In settings such as admissions testing, test takers choose if and when to retake an assessment. This self-selection can bias estimates of test-retest reliability because individuals who choose to retest are typically unrepresentative of the broader testing population and because differences among test takers in learning or practice effects may increase with time between test administrations. We develop a set of methods for estimating test-retest reliability from observational data that can mitigate these sources of bias, which include sample weighting, polynomial regression, and Bayesian model averaging. We demonstrate the value of using these methods for reducing bias and improving precision of estimated reliability using empirical and simulated data, both of which are based on more than 40,000 repeaters of a high-stakes English language proficiency test. Finally, these methods generalize to settings in which only a single, error-prone measurement is taken repeatedly over time and where self-selection and/or changes to the underlying construct may be at play.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"48 7-8","pages":"323-340"},"PeriodicalIF":1.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528726/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142569674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Mark-Recapture Approach to Estimating Item Pool Compromise. 估算项目池妥协的标记重捕方法。
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2024-11-01 Epub Date: 2024-09-13 DOI: 10.1177/01466216241284410
Richard A Feinberg

Testing organizations routinely investigate if secure exam material has been compromised and is consequently invalid for scoring and inclusion on future assessments. Beyond identifying individual compromised items, knowing the degree to which a form is compromised can inform decisions on whether the form can no longer be administered or when an item pool is compromised to such an extent that serious action on a broad scale must be taken to ensure the validity of score interpretations. Previous research on estimating the population of item compromise is sparse; however, this is a more generally long-studied problem in ecological research. In this note, we exemplify the utility of the mark-recapture technique to estimate the population of compromised items, first through a brief demonstration to introduce the fundamental concepts and then a more realistic scenario to illustrate applicability to large-scale testing programs. An effective use of this technique would be to longitudinally track changes in the estimated population to inform operational test security strategies. Many variations on mark-recapture exist and interpretation of the estimated population depends on several factors. Thus, this note is only meant to introduce the concept of mark-recapture as a useful application to evaluate a testing organization's compromise mitigation procedures.

考试机构通常会调查安全考试材料是否已被泄露,从而导致评分无效并被列入未来的评估中。除了确定个别受损项目外,了解一份表格的受损程度还能为决定是否不能再使用该表格或当一个项目库受损到必须采取大规模严肃行动以确保分数解释的有效性提供信息。以前关于估计项目受损群体的研究很少;然而,这是生态学研究中一个长期研究的普遍问题。在本说明中,我们将举例说明标记再捕捉技术在估算失分项目群体方面的实用性,首先通过一个简短的演示来介绍基本概念,然后通过一个更现实的场景来说明该技术在大规模测试项目中的适用性。这项技术的一个有效用途是纵向跟踪估计总数的变化,为操作测试安全策略提供信息。标记再捕获有许多变体,对估计群体的解释取决于多个因素。因此,本说明仅介绍标记再捕获的概念,将其作为评估测试机构漏洞缓解程序的有效应用。
{"title":"A Mark-Recapture Approach to Estimating Item Pool Compromise.","authors":"Richard A Feinberg","doi":"10.1177/01466216241284410","DOIUrl":"10.1177/01466216241284410","url":null,"abstract":"<p><p>Testing organizations routinely investigate if secure exam material has been compromised and is consequently invalid for scoring and inclusion on future assessments. Beyond identifying individual compromised items, knowing the degree to which a form is compromised can inform decisions on whether the form can no longer be administered or when an item pool is compromised to such an extent that serious action on a broad scale must be taken to ensure the validity of score interpretations. Previous research on estimating the population of item compromise is sparse; however, this is a more generally long-studied problem in ecological research. In this note, we exemplify the utility of the mark-recapture technique to estimate the population of compromised items, first through a brief demonstration to introduce the fundamental concepts and then a more realistic scenario to illustrate applicability to large-scale testing programs. An effective use of this technique would be to longitudinally track changes in the estimated population to inform operational test security strategies. Many variations on mark-recapture exist and interpretation of the estimated population depends on several factors. Thus, this note is only meant to introduce the concept of mark-recapture as a useful application to evaluate a testing organization's compromise mitigation procedures.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"48 7-8","pages":"357-363"},"PeriodicalIF":1.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142569673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effect of Differential Item Functioning on Computer Adaptive Testing Under Different Conditions. 不同条件下差异化项目功能对计算机自适应测试的影响
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2024-11-01 Epub Date: 2024-09-17 DOI: 10.1177/01466216241284295
Merve Sahin Kursad, Seher Yalcin

This study provides an overview of the effect of differential item functioning (DIF) on measurement precision, test information function (TIF), and test effectiveness in computer adaptive tests (CATs). Simulated data for the study was produced and analyzed with the Rstudio. During the data generation process, item pool size, DIF type, DIF percentage, item selection method for CAT, and the test termination rules were considered changed conditions. Sample size and ability parameter distribution, Item Response Theory (IRT) model, DIF size, ability estimation method, test starting rule, and item usage frequency method regarding CAT conditions were considered fixed conditions. To examine the effect of DIF, measurement precision, TIF and test effectiveness were calculated. Results show DIF has negative effects on measurement precision, TIF, and test effectiveness. In particular, statistically significant effects of the percentage DIF items and DIF type are observed on measurement precision.

本研究概述了差异项目功能(DIF)对计算机自适应测试(CAT)中测量精度、测试信息功能(TIF)和测试有效性的影响。本研究使用 Rstudio 生成和分析模拟数据。在数据生成过程中,项目池规模、DIF 类型、DIF 百分比、CAT 项目选择方法和测试终止规则被视为变化条件。样本量和能力参数分布、项目反应理论(IRT)模型、DIF 大小、能力估计方法、测试开始规则以及有关 CAT 条件的项目使用频率方法被视为固定条件。为了考察 DIF 的影响,计算了测量精度、TIF 和测试有效性。结果表明,DIF 对测量精度、TIF 和测验有效性有负面影响。特别是,DIF 项目百分比和 DIF 类型对测量精度的影响在统计学上具有显著性。
{"title":"Effect of Differential Item Functioning on Computer Adaptive Testing Under Different Conditions.","authors":"Merve Sahin Kursad, Seher Yalcin","doi":"10.1177/01466216241284295","DOIUrl":"10.1177/01466216241284295","url":null,"abstract":"<p><p>This study provides an overview of the effect of differential item functioning (DIF) on measurement precision, test information function (TIF), and test effectiveness in computer adaptive tests (CATs). Simulated data for the study was produced and analyzed with the Rstudio. During the data generation process, item pool size, DIF type, DIF percentage, item selection method for CAT, and the test termination rules were considered changed conditions. Sample size and ability parameter distribution, Item Response Theory (IRT) model, DIF size, ability estimation method, test starting rule, and item usage frequency method regarding CAT conditions were considered fixed conditions. To examine the effect of DIF, measurement precision, TIF and test effectiveness were calculated. Results show DIF has negative effects on measurement precision, TIF, and test effectiveness. In particular, statistically significant effects of the percentage DIF items and DIF type are observed on measurement precision.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"48 7-8","pages":"303-322"},"PeriodicalIF":1.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11501093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142510498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Item Response Modeling of Clinical Instruments With Filter Questions: Disentangling Symptom Presence and Severity. 带有过滤器问题的临床工具的项目反应模型:区分症状的存在与严重程度
IF 1.2 4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2024-09-01 Epub Date: 2024-06-17 DOI: 10.1177/01466216241261709
Brooke E Magnus

Clinical instruments that use a filter/follow-up response format often produce data with excess zeros, especially when administered to nonclinical samples. When the unidimensional graded response model (GRM) is then fit to these data, parameter estimates and scale scores tend to suggest that the instrument measures individual differences only among individuals with severe levels of the psychopathology. In such scenarios, alternative item response models that explicitly account for excess zeros may be more appropriate. The multivariate hurdle graded response model (MH-GRM), which has been previously proposed for handling zero-inflated questionnaire data, includes two latent variables: susceptibility, which underlies responses to the filter question, and severity, which underlies responses to the follow-up question. Using both simulated and empirical data, the current research shows that compared to unidimensional GRMs, the MH-GRM is better able to capture individual differences across a wider range of psychopathology, and that when unidimensional GRMs are fit to data from questionnaires that include filter questions, individual differences at the lower end of the severity continuum largely go unmeasured. Practical implications are discussed.

使用筛选/追踪反应格式的临床工具通常会产生过多零的数据,尤其是在对非临床样本进行施测时。如果将单维分级反应模型(GRM)与这些数据进行拟合,参数估计和量表得分往往表明,该工具只能测量具有严重心理病理学水平的个体之间的个体差异。在这种情况下,明确考虑多余零的替代项目反应模型可能更合适。多变量障碍分级反应模型(MH-GRM)是之前为处理零膨胀问卷数据而提出的,它包括两个潜变量:易感性和严重性,前者是对筛选问题的反应的基础,后者是对后续问题的反应的基础。通过模拟数据和经验数据,目前的研究表明,与单维 GRM 相比,MH-GRM 能够更好地捕捉更广泛的精神病理学中的个体差异,而且当单维 GRM 与包含过滤问题的问卷数据相匹配时,严重程度连续体低端的个体差异在很大程度上得不到测量。本文讨论了其实际意义。
{"title":"Item Response Modeling of Clinical Instruments With Filter Questions: Disentangling Symptom Presence and Severity.","authors":"Brooke E Magnus","doi":"10.1177/01466216241261709","DOIUrl":"10.1177/01466216241261709","url":null,"abstract":"<p><p>Clinical instruments that use a filter/follow-up response format often produce data with excess zeros, especially when administered to nonclinical samples. When the unidimensional graded response model (GRM) is then fit to these data, parameter estimates and scale scores tend to suggest that the instrument measures individual differences only among individuals with severe levels of the psychopathology. In such scenarios, alternative item response models that explicitly account for excess zeros may be more appropriate. The multivariate hurdle graded response model (MH-GRM), which has been previously proposed for handling zero-inflated questionnaire data, includes two latent variables: susceptibility, which underlies responses to the filter question, and severity, which underlies responses to the follow-up question. Using both simulated and empirical data, the current research shows that compared to unidimensional GRMs, the MH-GRM is better able to capture individual differences across a wider range of psychopathology, and that when unidimensional GRMs are fit to data from questionnaires that include filter questions, individual differences at the lower end of the severity continuum largely go unmeasured. Practical implications are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"48 6","pages":"235-256"},"PeriodicalIF":1.2,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11331747/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142009739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Auxiliary Item Information in the Item Parameter Estimation of a Graded Response Model for a Small to Medium Sample Size: Empirical Versus Hierarchical Bayes Estimation 在小到中等样本量的分级反应模型的项目参数估计中使用辅助项目信息:经验与层次贝叶斯估计
4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2023-11-03 DOI: 10.1177/01466216231209758
Matthew Naveiras, Sun-Joo Cho
Marginal maximum likelihood estimation (MMLE) is commonly used for item response theory item parameter estimation. However, sufficiently large sample sizes are not always possible when studying rare populations. In this paper, empirical Bayes and hierarchical Bayes are presented as alternatives to MMLE in small sample sizes, using auxiliary item information to estimate the item parameters of a graded response model with higher accuracy. Empirical Bayes and hierarchical Bayes methods are compared with MMLE to determine under what conditions these Bayes methods can outperform MMLE, and to determine if hierarchical Bayes can act as an acceptable alternative to MMLE in conditions where MMLE is unable to converge. In addition, empirical Bayes and hierarchical Bayes methods are compared to show how hierarchical Bayes can result in estimates of posterior variance with greater accuracy than empirical Bayes by acknowledging the uncertainty of item parameter estimates. The proposed methods were evaluated via a simulation study. Simulation results showed that hierarchical Bayes methods can be acceptable alternatives to MMLE under various testing conditions, and we provide a guideline to indicate which methods would be recommended in different research situations. R functions are provided to implement these proposed methods.
边际最大似然估计是项目反应理论中常用的项目参数估计方法。然而,在研究稀有种群时,足够大的样本量并不总是可能的。本文提出了经验贝叶斯和层次贝叶斯作为小样本量MMLE的替代方法,利用辅助的项目信息来估计分级响应模型的项目参数,具有更高的精度。将经验贝叶斯和层次贝叶斯方法与MMLE进行比较,以确定在哪些条件下这些贝叶斯方法可以优于MMLE,并确定在MMLE无法收敛的情况下,层次贝叶斯是否可以作为MMLE的可接受替代方法。此外,比较了经验贝叶斯和层次贝叶斯方法,显示了层次贝叶斯如何通过承认项目参数估计的不确定性,以比经验贝叶斯更高的精度估计后验方差。通过仿真研究对所提出的方法进行了评估。仿真结果表明,在各种测试条件下,分层贝叶斯方法都是MMLE的可接受替代方法,并给出了在不同研究情况下推荐哪种方法的指导方针。提供R函数来实现这些建议的方法。
{"title":"Using Auxiliary Item Information in the Item Parameter Estimation of a Graded Response Model for a Small to Medium Sample Size: Empirical Versus Hierarchical Bayes Estimation","authors":"Matthew Naveiras, Sun-Joo Cho","doi":"10.1177/01466216231209758","DOIUrl":"https://doi.org/10.1177/01466216231209758","url":null,"abstract":"Marginal maximum likelihood estimation (MMLE) is commonly used for item response theory item parameter estimation. However, sufficiently large sample sizes are not always possible when studying rare populations. In this paper, empirical Bayes and hierarchical Bayes are presented as alternatives to MMLE in small sample sizes, using auxiliary item information to estimate the item parameters of a graded response model with higher accuracy. Empirical Bayes and hierarchical Bayes methods are compared with MMLE to determine under what conditions these Bayes methods can outperform MMLE, and to determine if hierarchical Bayes can act as an acceptable alternative to MMLE in conditions where MMLE is unable to converge. In addition, empirical Bayes and hierarchical Bayes methods are compared to show how hierarchical Bayes can result in estimates of posterior variance with greater accuracy than empirical Bayes by acknowledging the uncertainty of item parameter estimates. The proposed methods were evaluated via a simulation study. Simulation results showed that hierarchical Bayes methods can be acceptable alternatives to MMLE under various testing conditions, and we provide a guideline to indicate which methods would be recommended in different research situations. R functions are provided to implement these proposed methods.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"21 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135819514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Bayesian Random Weights Linear Logistic Test Model for Within-Test Practice Effects 测试内实践效果的贝叶斯随机权重线性Logistic检验模型
4区 心理学 Q4 PSYCHOLOGY, MATHEMATICAL Pub Date : 2023-11-01 DOI: 10.1177/01466216231209752
José H. Lozano, Javier Revuelta
The present paper introduces a random weights linear logistic test model for the measurement of individual differences in operation-specific practice effects within a single administration of a test. The proposed model is an extension of the linear logistic test model of learning developed by Spada (1977) in which the practice effects are considered random effects varying across examinees. A Bayesian framework was used for model estimation and evaluation. A simulation study was conducted to examine the behavior of the model in combination with the Bayesian procedures. The results demonstrated the good performance of the estimation and evaluation methods. Additionally, an empirical study was conducted to illustrate the applicability of the model to real data. The model was applied to a sample of responses from a logical ability test providing evidence of individual differences in operation-specific practice effects.
本文介绍了一个随机权重线性逻辑检验模型,用于测量单个管理测试中特定操作实践效果的个体差异。所提出的模型是对Spada(1977)提出的学习线性逻辑检验模型的扩展,在Spada(1977)的模型中,实践效应被认为是随机效应,在考生之间是不同的。采用贝叶斯框架对模型进行估计和评价。结合贝叶斯过程进行了模拟研究,以检验模型的行为。结果表明,该估计和评价方法具有良好的性能。此外,通过实证研究验证了该模型对实际数据的适用性。该模型应用于逻辑能力测试的反应样本,提供了具体操作实践效果的个体差异的证据。
{"title":"A Bayesian Random Weights Linear Logistic Test Model for Within-Test Practice Effects","authors":"José H. Lozano, Javier Revuelta","doi":"10.1177/01466216231209752","DOIUrl":"https://doi.org/10.1177/01466216231209752","url":null,"abstract":"The present paper introduces a random weights linear logistic test model for the measurement of individual differences in operation-specific practice effects within a single administration of a test. The proposed model is an extension of the linear logistic test model of learning developed by Spada (1977) in which the practice effects are considered random effects varying across examinees. A Bayesian framework was used for model estimation and evaluation. A simulation study was conducted to examine the behavior of the model in combination with the Bayesian procedures. The results demonstrated the good performance of the estimation and evaluation methods. Additionally, an empirical study was conducted to illustrate the applicability of the model to real data. The model was applied to a sample of responses from a logical ability test providing evidence of individual differences in operation-specific practice effects.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"193 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135371926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1