In cognitive diagnostic assessment a property of the Q-matrix, usually referred to as completeness, warrants that the cognitive attributes underlying the observed behaviour can be uniquely assessed. Characterizations of completeness were first derived under the assumption of independent attributes, and are currently under investigation for interdependent attributes. The dominant approach considers so-called attribute hierarchies, which are conceptualized through a partial order on the set of attributes. The present paper extends previously published results on this issue obtained for conjunctive attribute hierarchy models. Drawing upon results from knowledge structure theory, it provides novel sufficient and necessary conditions for completeness of the Q-matrix, not only for conjunctive models on attribute hierarchies, but also on more general attribute structures.
{"title":"Complete Q-matrices in conjunctive models on general attribute structures","authors":"Jürgen Heller","doi":"10.1111/bmsp.12266","DOIUrl":"10.1111/bmsp.12266","url":null,"abstract":"<p>In cognitive diagnostic assessment a property of the <i>Q</i>-matrix, usually referred to as completeness, warrants that the cognitive attributes underlying the observed behaviour can be uniquely assessed. Characterizations of completeness were first derived under the assumption of independent attributes, and are currently under investigation for interdependent attributes. The dominant approach considers so-called attribute hierarchies, which are conceptualized through a partial order on the set of attributes. The present paper extends previously published results on this issue obtained for conjunctive attribute hierarchy models. Drawing upon results from knowledge structure theory, it provides novel sufficient and necessary conditions for completeness of the <i>Q</i>-matrix, not only for conjunctive models on attribute hierarchies, but also on more general attribute structures.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"75 3","pages":"522-549"},"PeriodicalIF":2.6,"publicationDate":"2022-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://bpspsychub.onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12266","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124910395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheridan Grant, Marina Meilă, Elena Erosheva, Carole Lee
We propose a new metric for evaluating the informativeness of a set of ratings from a single rater on a given scale. Such evaluations are of interest when raters rate numerous comparable items on the same scale, as occurs in hiring, college admissions, and peer review. Our exposition takes the context of peer review, which involves univariate and multivariate cardinal ratings. We draw on this context to motivate an information-theoretic measure of the refinement of a set of ratings – entropic refinement – as well as two secondary measures. A mathematical analysis of the three measures reveals that only the first, which captures the information content of the ratings, possesses properties appropriate to a refinement metric. Finally, we analyse refinement in real-world grant-review data, finding evidence that overall merit scores are more refined than criterion scores.
{"title":"Refinement: Measuring informativeness of ratings in the absence of a gold standard","authors":"Sheridan Grant, Marina Meilă, Elena Erosheva, Carole Lee","doi":"10.1111/bmsp.12268","DOIUrl":"10.1111/bmsp.12268","url":null,"abstract":"<p>We propose a new metric for evaluating the informativeness of a set of ratings from a single rater on a given scale. Such evaluations are of interest when raters rate numerous comparable items on the same scale, as occurs in hiring, college admissions, and peer review. Our exposition takes the context of peer review, which involves univariate and multivariate cardinal ratings. We draw on this context to motivate an information-theoretic measure of the <i>refinement</i> of a set of ratings – entropic refinement – as well as two secondary measures. A mathematical analysis of the three measures reveals that only the first, which captures the information content of the ratings, possesses properties appropriate to a refinement metric. Finally, we analyse refinement in real-world grant-review data, finding evidence that overall merit scores are more refined than criterion scores.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"75 3","pages":"593-615"},"PeriodicalIF":2.6,"publicationDate":"2022-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10860001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Interval estimation is one of the most frequently used methods in statistical science, employed to provide a range of credible values a parameter is located in after taking into account the uncertainty in the data. However, while this interpretation only holds for Bayesian interval estimates, these suffer from two problems. First, Bayesian interval estimates can include values which have not been corroborated by observing the data. Second, Bayesian interval estimates and hypothesis tests can yield contradictory conclusions. In this paper a new theory for Bayesian hypothesis testing and interval estimation is presented. A new interval estimate is proposed, the Bayesian evidence interval, which is inspired by the Pereira–Stern theory of the full Bayesian significance test (FBST). It is shown that the evidence interval is a generalization of existing Bayesian interval estimates, that it solves the problems of standard Bayesian interval estimates and that it unifies Bayesian hypothesis testing and parameter estimation. The Bayesian evidence value is introduced, which quantifies the evidence for the (interval) null and alternative hypothesis. Based on the evidence interval and the evidence value, the (full) Bayesian evidence test (FBET) is proposed as a new, model-independent Bayesian hypothesis test. Additionally, a decision rule for hypothesis testing is derived which shows the relationship to a widely used decision rule based on the region of practical equivalence and Bayesian highest posterior density intervals and to the e-value in the FBST. In summary, the proposed method is universally applicable, computationally efficient, and while the evidence interval can be seen as an extension of existing Bayesian interval estimates, the FBET is a generalization of the FBST and contains it as a special case. Together, the theory developed provides a unification of Bayesian hypothesis testing and interval estimation and is made available in the R package fbst.
{"title":"The evidence interval and the Bayesian evidence value: On a unified theory for Bayesian hypothesis testing and interval estimation","authors":"Riko Kelter","doi":"10.1111/bmsp.12267","DOIUrl":"10.1111/bmsp.12267","url":null,"abstract":"<p>Interval estimation is one of the most frequently used methods in statistical science, employed to provide a range of credible values a parameter is located in after taking into account the uncertainty in the data. However, while this interpretation only holds for Bayesian interval estimates, these suffer from two problems. First, Bayesian interval estimates can include values which have not been corroborated by observing the data. Second, Bayesian interval estimates and hypothesis tests can yield contradictory conclusions. In this paper a new theory for Bayesian hypothesis testing and interval estimation is presented. A new interval estimate is proposed, the Bayesian <i>evidence interval</i>, which is inspired by the Pereira–Stern theory of the full Bayesian significance test (FBST). It is shown that the evidence interval is a generalization of existing Bayesian interval estimates, that it solves the problems of standard Bayesian interval estimates and that it unifies Bayesian hypothesis testing and parameter estimation. The Bayesian evidence value is introduced, which quantifies the evidence for the (interval) null and alternative hypothesis. Based on the evidence interval and the evidence value, the (full) Bayesian evidence test (FBET) is proposed as a new, model-independent Bayesian hypothesis test. Additionally, a decision rule for hypothesis testing is derived which shows the relationship to a widely used decision rule based on the region of practical equivalence and Bayesian highest posterior density intervals and to the e-value in the FBST. In summary, the proposed method is universally applicable, computationally efficient, and while the evidence interval can be seen as an extension of existing Bayesian interval estimates, the FBET is a generalization of the FBST and contains it as a special case. Together, the theory developed provides a unification of Bayesian hypothesis testing and interval estimation and is made available in the R package <i>fbst</i>.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"75 3","pages":"550-592"},"PeriodicalIF":2.6,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://bpspsychub.onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12267","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33489369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reliability of scores from psychological or educational assessments provides important information regarding the precision of measurement. The reliability of scores is however population dependent and may vary across groups. In item response theory, this population dependence can be attributed to differential item functioning or to differences in the latent distributions between groups and needs to be accounted for when estimating the reliability of scores for different groups. Here, we introduce group-specific and overall reliability coefficients for sum scores and maximum likelihood ability estimates defined by a multiple group item response theory model. We derive confidence intervals using asymptotic theory and evaluate the empirical properties of estimators and the confidence intervals in a simulation study. The results show that the estimators are largely unbiased and that the confidence intervals are accurate with moderately large sample sizes. We exemplify the approach with the Montreal Cognitive Assessment (MoCA) in two groups defined by education level and give recommendations for applied work.
{"title":"Reliability coefficients for multiple group item response theory models","authors":"Björn Andersson, Hao Luo, Kseniia Marcq","doi":"10.1111/bmsp.12269","DOIUrl":"10.1111/bmsp.12269","url":null,"abstract":"<p>Reliability of scores from psychological or educational assessments provides important information regarding the precision of measurement. The reliability of scores is however population dependent and may vary across groups. In item response theory, this population dependence can be attributed to differential item functioning or to differences in the latent distributions between groups and needs to be accounted for when estimating the reliability of scores for different groups. Here, we introduce group-specific and overall reliability coefficients for sum scores and maximum likelihood ability estimates defined by a multiple group item response theory model. We derive confidence intervals using asymptotic theory and evaluate the empirical properties of estimators and the confidence intervals in a simulation study. The results show that the estimators are largely unbiased and that the confidence intervals are accurate with moderately large sample sizes. We exemplify the approach with the Montreal Cognitive Assessment (MoCA) in two groups defined by education level and give recommendations for applied work.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"75 2","pages":"395-410"},"PeriodicalIF":2.6,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://bpspsychub.onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12269","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127811354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Plackett-Luce model (PL) for ranked data assumes the forward order of the ranking process. This hypothesis postulates that the ranking process of the items is carried out by sequentially assigning the positions from the top (most liked) to the bottom (least liked) alternative. This assumption has been recently relaxed with the Extended Plackett-Luce model (EPL) through the introduction of the discrete reference order parameter, describing the rank attribution path. By starting from two formal properties of the EPL, the former related to the inverse ordering of the item probabilities at the first and last stage of the ranking process and the latter well-known as independence of irrelevant alternatives (or Luce's choice axiom), we derive novel diagnostic tools for testing the appropriateness of the EPL assumption as the actual sampling distribution of the observed rankings. These diagnostic tools can help uncovering possible idiosyncratic paths in the sequential choice process. Besides contributing to fill the gap of goodness-of-fit methods for the family of multistage models, we also show how one of the two statistics can be conveniently exploited to construct a heuristic method, that surrogates the maximum likelihood approach for inferring the underlying reference order parameter. The relative performance of the proposals, compared with more conventional approaches, is illustrated by means of extensive simulation studies.
{"title":"Remarkable properties for diagnostics and inference of ranking data modelling","authors":"Cristina Mollica, Luca Tardella","doi":"10.1111/bmsp.12260","DOIUrl":"10.1111/bmsp.12260","url":null,"abstract":"<p>The Plackett-Luce model (PL) for ranked data assumes the forward order of the ranking process. This hypothesis postulates that the ranking process of the items is carried out by sequentially assigning the positions from the top (most liked) to the bottom (least liked) alternative. This assumption has been recently relaxed with the Extended Plackett-Luce model (EPL) through the introduction of the discrete reference order parameter, describing the rank attribution path. By starting from two formal properties of the EPL, the former related to the inverse ordering of the item probabilities at the first and last stage of the ranking process and the latter well-known as independence of irrelevant alternatives (or Luce's choice axiom), we derive novel diagnostic tools for testing the appropriateness of the EPL assumption as the actual sampling distribution of the observed rankings. These diagnostic tools can help uncovering possible idiosyncratic paths in the sequential choice process. Besides contributing to fill the gap of goodness-of-fit methods for the family of multistage models, we also show how one of the two statistics can be conveniently exploited to construct a heuristic method, that surrogates the maximum likelihood approach for inferring the underlying reference order parameter. The relative performance of the proposals, compared with more conventional approaches, is illustrated by means of extensive simulation studies.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"75 2","pages":"334-362"},"PeriodicalIF":2.6,"publicationDate":"2022-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/a3/fd/BMSP-75-334.PMC9305251.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39898673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ilyas Bakbergenuly, David C. Hoaglin, Elena Kulinskaya
<p>Cochran's <i>Q</i> statistic is routinely used for testing heterogeneity in meta-analysis. Its expected value is also used in several popular estimators of the between-study variance, <math>