{"title":"Sequentially Determined Measures of Interobserver Agreement (Kappa) in Clinical Trials May Vary Independent of Changes in Observer Performance.","authors":"Russell Reeve, Klaus Gottlieb","doi":"10.1177/2168479019874059","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cohen's kappa is a statistic that estimates interobserver agreement. It was originally introduced to help develop diagnostic tests. Interpretative readings of 2 observers, for example, of a mammogram or other imaging, were compared at a single point in time. It is known that kappa depends on the prevalence of disease and that, therefore, kappas across different settings are hard to compare.</p><p><strong>Methods: </strong>Using simulation, we examine an analogous situation, not previously described, that occurs in clinical trials where sequential measurements are obtained to evaluate disease progression or clinical improvement over time.</p><p><strong>Results: </strong>We show that weighted kappa, used for multilevel outcomes, changes during the trial even if we keep the performance of the observer constant.</p><p><strong>Conclusions: </strong>Kappa and closely related measures can therefore only be used with great difficulty, if at all, in quality assurance in clinical trials.</p>","PeriodicalId":30148,"journal":{"name":"Zbornik Matice Srpske za Prirodne Nauke","volume":"1 1","pages":"2168479019874059"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Zbornik Matice Srpske za Prirodne Nauke","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/2168479019874059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Cohen's kappa is a statistic that estimates interobserver agreement. It was originally introduced to help develop diagnostic tests. Interpretative readings of 2 observers, for example, of a mammogram or other imaging, were compared at a single point in time. It is known that kappa depends on the prevalence of disease and that, therefore, kappas across different settings are hard to compare.
Methods: Using simulation, we examine an analogous situation, not previously described, that occurs in clinical trials where sequential measurements are obtained to evaluate disease progression or clinical improvement over time.
Results: We show that weighted kappa, used for multilevel outcomes, changes during the trial even if we keep the performance of the observer constant.
Conclusions: Kappa and closely related measures can therefore only be used with great difficulty, if at all, in quality assurance in clinical trials.