Raising awareness of potential biases in medical machine learning: Experience from a Datathon.

medRxiv : the preprint server for health sciences Pub Date : 2024-11-02 DOI:10.1101/2024.10.21.24315543

Harry Hochheiser, Jesse Klug, Thomas Mathie, Tom J Pollard, Jesse D Raffa, Stephanie L Ballard, Evamarie A Conrad, Smitha Edakalavan, Allan Joseph, Nader Alnomasy, Sarah Nutman, Veronika Hill, Sumit Kapoor, Eddie Pérez Claudio, Olga V Kravchenko, Ruoting Li, Mehdi Nourelahi, Jenny Diaz, W Michael Taylor, Sydney R Rooney, Maeve Woeltje, Leo Anthony Celi, Christopher M Horvat

{"title":"Raising awareness of potential biases in medical machine learning: Experience from a Datathon.","authors":"Harry Hochheiser, Jesse Klug, Thomas Mathie, Tom J Pollard, Jesse D Raffa, Stephanie L Ballard, Evamarie A Conrad, Smitha Edakalavan, Allan Joseph, Nader Alnomasy, Sarah Nutman, Veronika Hill, Sumit Kapoor, Eddie Pérez Claudio, Olga V Kravchenko, Ruoting Li, Mehdi Nourelahi, Jenny Diaz, W Michael Taylor, Sydney R Rooney, Maeve Woeltje, Leo Anthony Celi, Christopher M Horvat","doi":"10.1101/2024.10.21.24315543","DOIUrl":null,"url":null,"abstract":"Objective: To challenge clinicians and informaticians to learn about potential sources of bias in medical machine learning models through investigation of data and predictions from an open-source severity of illness score.Methods: Over a two-day period (total elapsed time approximately 28 hours), we conducted a datathon that challenged interdisciplinary teams to investigate potential sources of bias in the Global Open Source Severity of Illness Score. Teams were invited to develop hypotheses, to use tools of their choosing to identify potential sources of bias, and to provide a final report.Results: Five teams participated, three of which included both informaticians and clinicians. Most (4/5) used Python for analyses, the remaining team used R. Common analysis themes included relationship of the GOSSIS-1 prediction score with demographics and care related variables; relationships between demographics and outcomes; calibration and factors related to the context of care; and the impact of missingness. Representativeness of the population, differences in calibration and model performance among groups, and differences in performance across hospital settings were identified as possible sources of bias.Discussion: Datathons are a promising approach for challenging developers and users to explore questions relating to unrecognized biases in medical machine learning algorithms.Author summary: Disadvantaged groups are at risk of being adversely impacted by biased medical machine learning models. To avoid these undesirable outcomes, developers and users must understand the challenges involved in identifying potential biases. We conducted a datathon aimed at challenging a diverse group of participants to explore an open-source patient severity model for potential biases. Five groups of clinicians and informaticians used tools of their choosing to evaluate possible sources of biases, applying a range of analytic techniques and exploring multiple features. By engaging diverse participants with hands-on data experience with meaningful data, datathons have the potential to raise awareness of potential biases and promote best practices in developing fair and equitable medical machine learning models.","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11537317/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv : the preprint server for health sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.10.21.24315543","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To challenge clinicians and informaticians to learn about potential sources of bias in medical machine learning models through investigation of data and predictions from an open-source severity of illness score.

Methods: Over a two-day period (total elapsed time approximately 28 hours), we conducted a datathon that challenged interdisciplinary teams to investigate potential sources of bias in the Global Open Source Severity of Illness Score. Teams were invited to develop hypotheses, to use tools of their choosing to identify potential sources of bias, and to provide a final report.

Results: Five teams participated, three of which included both informaticians and clinicians. Most (4/5) used Python for analyses, the remaining team used R. Common analysis themes included relationship of the GOSSIS-1 prediction score with demographics and care related variables; relationships between demographics and outcomes; calibration and factors related to the context of care; and the impact of missingness. Representativeness of the population, differences in calibration and model performance among groups, and differences in performance across hospital settings were identified as possible sources of bias.

Discussion: Datathons are a promising approach for challenging developers and users to explore questions relating to unrecognized biases in medical machine learning algorithms.

Author summary: Disadvantaged groups are at risk of being adversely impacted by biased medical machine learning models. To avoid these undesirable outcomes, developers and users must understand the challenges involved in identifying potential biases. We conducted a datathon aimed at challenging a diverse group of participants to explore an open-source patient severity model for potential biases. Five groups of clinicians and informaticians used tools of their choosing to evaluate possible sources of biases, applying a range of analytic techniques and exploring multiple features. By engaging diverse participants with hands-on data experience with meaningful data, datathons have the potential to raise awareness of potential biases and promote best practices in developing fair and equitable medical machine learning models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

提高对医学机器学习中潜在偏见的认识：数据马拉松的经验。

目的挑战临床医生和信息学家，通过调查开源疾病严重程度评分的数据和预测，了解医学机器学习模型中潜在的偏差来源：在为期两天（总耗时约 28 小时）的时间里，我们开展了一次数据马拉松，挑战跨学科团队调查全球开源疾病严重程度评分中潜在的偏差来源。各小组应邀提出假设，使用自己选择的工具确定潜在的偏差来源，并提交最终报告：结果：共有五个团队参与，其中三个团队既有信息学家，也有临床医生。常见的分析主题包括 GOSSIS-1 预测得分与人口统计学和护理相关变量之间的关系；人口统计学与结果之间的关系；校准与护理环境相关因素；以及遗漏的影响。人口的代表性、组间校准和模型性能的差异以及不同医院环境下的性能差异被认为是可能的偏差来源：作者总结：弱势群体有可能受到有偏见的医疗机器学习模型的不利影响。为了避免这些不良后果，开发人员和用户必须了解识别潜在偏见所涉及的挑战。我们举办了一次数据马拉松，旨在让不同的参与者探索一个开源患者严重程度模型，找出潜在的偏见。五组临床医生和信息学家使用他们自己选择的工具来评估可能的偏差来源，应用了一系列分析技术并探索了多种特征。通过让不同参与者亲身体验有意义的数据，数据马拉松有可能提高人们对潜在偏见的认识，并促进开发公平公正的医疗机器学习模型的最佳实践。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

medRxiv : the preprint server for health sciences

自引率

0.00%

发文量