评估和解决医学大语言模型中的人口统计学差异:系统回顾

Mahmud Omar, Vera Sorin, Donald U Apakama, Ali Soroush, Ankit Sakhuja, Robert Freeman, Carol R Horowitz, Lynne D Richardson, Girish Nadkarni, Eyal Klang
{"title":"评估和解决医学大语言模型中的人口统计学差异:系统回顾","authors":"Mahmud Omar, Vera Sorin, Donald U Apakama, Ali Soroush, Ankit Sakhuja, Robert Freeman, Carol R Horowitz, Lynne D Richardson, Girish Nadkarni, Eyal Klang","doi":"10.1101/2024.09.09.24313295","DOIUrl":null,"url":null,"abstract":"Background: Large language models (LLMs) are increasingly evaluated for use in healthcare. However, concerns about their impact on disparities persist. This study reviews current research on demographic biases in LLMs to identify prevalent bias types, assess measurement methods, and evaluate mitigation strategies.\nMethods: We conducted a systematic review, searching publications from January 2018 to July 2024 across five databases. We included peer-reviewed studies evaluating demographic biases in LLMs, focusing on gender, race, ethnicity, age, and other factors. Study quality was assessed using the Joanna Briggs Institute Critical Appraisal Tools. Results: Our review included 24 studies. Of these, 22 (91.7%) identified biases in LLMs. Gender bias was the most prevalent, reported in 15 of 16 studies (93.7%). Racial or ethnic biases were observed in 10 of 11 studies (90.9%). Only two studies found minimal or no bias in certain contexts. Mitigation strategies mainly included prompt engineering, with varying effectiveness.\nHowever, these findings are tempered by a potential publication bias, as studies with negative results are less frequently published.\nConclusion: Biases are observed in LLMs across various medical domains. While bias detection is improving, effective mitigation strategies are still developing. As LLMs increasingly influence critical decisions, addressing these biases and their resultant disparities is essential for ensuring fair AI systems. Future research should focus on a wider range of demographic factors, intersectional analyses, and non-Western cultural contexts.","PeriodicalId":501556,"journal":{"name":"medRxiv - Health Systems and Quality Improvement","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating and Addressing Demographic Disparities in Medical Large Language Models: A Systematic Review\",\"authors\":\"Mahmud Omar, Vera Sorin, Donald U Apakama, Ali Soroush, Ankit Sakhuja, Robert Freeman, Carol R Horowitz, Lynne D Richardson, Girish Nadkarni, Eyal Klang\",\"doi\":\"10.1101/2024.09.09.24313295\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Large language models (LLMs) are increasingly evaluated for use in healthcare. However, concerns about their impact on disparities persist. This study reviews current research on demographic biases in LLMs to identify prevalent bias types, assess measurement methods, and evaluate mitigation strategies.\\nMethods: We conducted a systematic review, searching publications from January 2018 to July 2024 across five databases. We included peer-reviewed studies evaluating demographic biases in LLMs, focusing on gender, race, ethnicity, age, and other factors. Study quality was assessed using the Joanna Briggs Institute Critical Appraisal Tools. Results: Our review included 24 studies. Of these, 22 (91.7%) identified biases in LLMs. Gender bias was the most prevalent, reported in 15 of 16 studies (93.7%). Racial or ethnic biases were observed in 10 of 11 studies (90.9%). Only two studies found minimal or no bias in certain contexts. Mitigation strategies mainly included prompt engineering, with varying effectiveness.\\nHowever, these findings are tempered by a potential publication bias, as studies with negative results are less frequently published.\\nConclusion: Biases are observed in LLMs across various medical domains. While bias detection is improving, effective mitigation strategies are still developing. As LLMs increasingly influence critical decisions, addressing these biases and their resultant disparities is essential for ensuring fair AI systems. Future research should focus on a wider range of demographic factors, intersectional analyses, and non-Western cultural contexts.\",\"PeriodicalId\":501556,\"journal\":{\"name\":\"medRxiv - Health Systems and Quality Improvement\",\"volume\":\"3 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv - Health Systems and Quality Improvement\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.09.09.24313295\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Systems and Quality Improvement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.09.24313295","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:大语言模型(LLMs)越来越多地被评估用于医疗保健领域。然而,人们仍然担心它们对差异的影响。本研究回顾了目前关于 LLMs 中人口统计学偏差的研究,以确定普遍存在的偏差类型、评估测量方法并评估缓解策略:我们进行了一项系统性回顾,在五个数据库中搜索了 2018 年 1 月至 2024 年 7 月期间的出版物。我们纳入了评估法学硕士人口统计学偏见的同行评审研究,重点关注性别、种族、民族、年龄和其他因素。研究质量采用乔安娜-布里格斯研究所的关键评估工具进行评估。结果我们的研究包括 24 项研究。其中,22 项研究(91.7%)发现了法学硕士的偏见。性别偏见最为普遍,16 项研究中有 15 项报告了性别偏见(93.7%)。11 项研究中有 10 项(90.9%)发现了种族或民族偏见。只有两项研究发现在某些情况下存在极少或不存在偏见。然而,这些研究结果因潜在的发表偏差而受到影响,因为出现负面结果的研究发表较少:结论:在各个医学领域的 LLM 中都发现了偏倚。虽然偏倚检测正在不断改进,但有效的缓解策略仍在开发中。随着 LLM 对关键决策的影响越来越大,解决这些偏差及其导致的差异对于确保公平的人工智能系统至关重要。未来的研究应关注更广泛的人口因素、交叉分析和非西方文化背景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluating and Addressing Demographic Disparities in Medical Large Language Models: A Systematic Review
Background: Large language models (LLMs) are increasingly evaluated for use in healthcare. However, concerns about their impact on disparities persist. This study reviews current research on demographic biases in LLMs to identify prevalent bias types, assess measurement methods, and evaluate mitigation strategies. Methods: We conducted a systematic review, searching publications from January 2018 to July 2024 across five databases. We included peer-reviewed studies evaluating demographic biases in LLMs, focusing on gender, race, ethnicity, age, and other factors. Study quality was assessed using the Joanna Briggs Institute Critical Appraisal Tools. Results: Our review included 24 studies. Of these, 22 (91.7%) identified biases in LLMs. Gender bias was the most prevalent, reported in 15 of 16 studies (93.7%). Racial or ethnic biases were observed in 10 of 11 studies (90.9%). Only two studies found minimal or no bias in certain contexts. Mitigation strategies mainly included prompt engineering, with varying effectiveness. However, these findings are tempered by a potential publication bias, as studies with negative results are less frequently published. Conclusion: Biases are observed in LLMs across various medical domains. While bias detection is improving, effective mitigation strategies are still developing. As LLMs increasingly influence critical decisions, addressing these biases and their resultant disparities is essential for ensuring fair AI systems. Future research should focus on a wider range of demographic factors, intersectional analyses, and non-Western cultural contexts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Effect of Monitoring and Evaluation Systems on the Performance of Neonatal Intensive Care Unit at Yumbe Regional referral hospital; A Pre-post quasi-experimental study design Plaintiff experiences of the medico-legal environment in Ireland “We’re here to help them if they want to come”: A qualitative exploration of hospital staff perceptions and experiences with outpatient non-attendance Improving Access and Efficiency of Acute Ischemic Stroke Treatment Across Four Canadian Provinces: A Stepped-Wedge Trial I am a quarterback: A mixed methods study of death investigators' communication with family members of young sudden cardiac death victims from suspected heritable causes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1