Nathan Stehouwer, Anastasia Rowland-Seymour, Larry Gruppen, Jeffrey M Albert, Kelli Qua
{"title":"Validity and reliability of Brier scoring for assessment of probabilistic diagnostic reasoning.","authors":"Nathan Stehouwer, Anastasia Rowland-Seymour, Larry Gruppen, Jeffrey M Albert, Kelli Qua","doi":"10.1515/dx-2023-0109","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Educators need tools for the assessment of clinical reasoning that reflect the ambiguity of real-world practice and measure learners' ability to determine diagnostic likelihood. In this study, the authors describe the use of the Brier score to assess and provide feedback on the quality of probabilistic diagnostic reasoning.</p><p><strong>Methods: </strong>The authors describe a novel format called Diagnostic Forecasting (DxF), in which participants read a brief clinical case and assign a probability to each item on a differential diagnosis, order tests and select a final diagnosis. DxF was piloted in a cohort of senior medical students. DxF evaluated students' answers with Brier scores, which compare probabilistic forecasts with case outcomes. The validity of Brier scores in DxF was assessed by comparison to subsequent decision-making in the game environment of DxF, as well as external criteria including medical knowledge tests and performance on clinical rotations.</p><p><strong>Results: </strong>Brier scores were statistically significantly correlated with diagnostic accuracy (95 % CI -4.4 to -0.44) and with mean scores on the National Board of Medical Examiners (NBME) shelf exams (95 % CI -474.6 to -225.1). Brier scores did not correlate with clerkship grades or performance on a structured clinical skills exam. Reliability as measured by within-student correlation was low.</p><p><strong>Conclusions: </strong>Brier scoring showed evidence for validity as a measurement of medical knowledge and predictor of clinical decision-making. Further work must evaluated the ability of Brier scores to predict clinical and workplace-based outcomes, and develop reliable approaches to measuring probabilistic reasoning.</p>","PeriodicalId":11273,"journal":{"name":"Diagnosis","volume":" ","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnosis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/dx-2023-0109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: Educators need tools for the assessment of clinical reasoning that reflect the ambiguity of real-world practice and measure learners' ability to determine diagnostic likelihood. In this study, the authors describe the use of the Brier score to assess and provide feedback on the quality of probabilistic diagnostic reasoning.
Methods: The authors describe a novel format called Diagnostic Forecasting (DxF), in which participants read a brief clinical case and assign a probability to each item on a differential diagnosis, order tests and select a final diagnosis. DxF was piloted in a cohort of senior medical students. DxF evaluated students' answers with Brier scores, which compare probabilistic forecasts with case outcomes. The validity of Brier scores in DxF was assessed by comparison to subsequent decision-making in the game environment of DxF, as well as external criteria including medical knowledge tests and performance on clinical rotations.
Results: Brier scores were statistically significantly correlated with diagnostic accuracy (95 % CI -4.4 to -0.44) and with mean scores on the National Board of Medical Examiners (NBME) shelf exams (95 % CI -474.6 to -225.1). Brier scores did not correlate with clerkship grades or performance on a structured clinical skills exam. Reliability as measured by within-student correlation was low.
Conclusions: Brier scoring showed evidence for validity as a measurement of medical knowledge and predictor of clinical decision-making. Further work must evaluated the ability of Brier scores to predict clinical and workplace-based outcomes, and develop reliable approaches to measuring probabilistic reasoning.
期刊介绍:
Diagnosis focuses on how diagnosis can be advanced, how it is taught, and how and why it can fail, leading to diagnostic errors. The journal welcomes both fundamental and applied works, improvement initiatives, opinions, and debates to encourage new thinking on improving this critical aspect of healthcare quality. Topics: -Factors that promote diagnostic quality and safety -Clinical reasoning -Diagnostic errors in medicine -The factors that contribute to diagnostic error: human factors, cognitive issues, and system-related breakdowns -Improving the value of diagnosis – eliminating waste and unnecessary testing -How culture and removing blame promote awareness of diagnostic errors -Training and education related to clinical reasoning and diagnostic skills -Advances in laboratory testing and imaging that improve diagnostic capability -Local, national and international initiatives to reduce diagnostic error