大型语言模型的固有偏差:随机抽样分析

Noel F. Ayoub MD, MBA , Karthik Balakrishnan MD, MPH , Marc S. Ayoub MD , Thomas F. Barrett MD , Abel P. David MD , Stacey T. Gray MD
{"title":"大型语言模型的固有偏差:随机抽样分析","authors":"Noel F. Ayoub MD, MBA ,&nbsp;Karthik Balakrishnan MD, MPH ,&nbsp;Marc S. Ayoub MD ,&nbsp;Thomas F. Barrett MD ,&nbsp;Abel P. David MD ,&nbsp;Stacey T. Gray MD","doi":"10.1016/j.mcpdig.2024.03.003","DOIUrl":null,"url":null,"abstract":"<div><p>There are mounting concerns regarding inherent bias, safety, and tendency toward misinformation of large language models (LLMs), which could have significant implications in health care. This study sought to determine whether generative artificial intelligence (AI)-based simulations of physicians making life-and-death decisions in a resource-scarce environment would demonstrate bias. Thirteen questions were developed that simulated physicians treating patients in resource-limited environments. Through a random sampling of simulated physicians using OpenAI’s generative pretrained transformer (GPT-4), physicians were tasked with choosing only 1 patient to save owing to limited resources. This simulation was repeated 1000 times per question, representing 1000 unique physicians and patients each. Patients and physicians spanned a variety of demographic characteristics. All patients had similar a priori likelihood of surviving the acute illness. Overall, simulated physicians consistently demonstrated racial, gender, age, political affiliation, and sexual orientation bias in clinical decision-making. Across all demographic characteristics, physicians most frequently favored patients with similar demographic characteristics as themselves, with most pairwise comparisons showing statistical significance (<em>P</em>&lt;.05). Nondescript physicians favored White, male, and young demographic characteristics. The male doctor gravitated toward the male, White, and young, whereas the female doctor typically preferred female, young, and White patients. In addition to saving patients with their own political affiliation, Democratic physicians favored Black and female patients, whereas Republicans preferred White and male demographic characteristics. Heterosexual and gay/lesbian physicians frequently saved patients of similar sexual orientation. Overall, publicly available chatbot LLMs demonstrate significant biases, which may negatively impact patient outcomes if used to support clinical care decisions without appropriate precautions.</p></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"2 2","pages":"Pages 186-191"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949761224000208/pdfft?md5=895559f96cdc78e7afbad43c7d8d164a&pid=1-s2.0-S2949761224000208-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Inherent Bias in Large Language Models: A Random Sampling Analysis\",\"authors\":\"Noel F. Ayoub MD, MBA ,&nbsp;Karthik Balakrishnan MD, MPH ,&nbsp;Marc S. Ayoub MD ,&nbsp;Thomas F. Barrett MD ,&nbsp;Abel P. David MD ,&nbsp;Stacey T. Gray MD\",\"doi\":\"10.1016/j.mcpdig.2024.03.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>There are mounting concerns regarding inherent bias, safety, and tendency toward misinformation of large language models (LLMs), which could have significant implications in health care. This study sought to determine whether generative artificial intelligence (AI)-based simulations of physicians making life-and-death decisions in a resource-scarce environment would demonstrate bias. Thirteen questions were developed that simulated physicians treating patients in resource-limited environments. Through a random sampling of simulated physicians using OpenAI’s generative pretrained transformer (GPT-4), physicians were tasked with choosing only 1 patient to save owing to limited resources. This simulation was repeated 1000 times per question, representing 1000 unique physicians and patients each. Patients and physicians spanned a variety of demographic characteristics. All patients had similar a priori likelihood of surviving the acute illness. Overall, simulated physicians consistently demonstrated racial, gender, age, political affiliation, and sexual orientation bias in clinical decision-making. Across all demographic characteristics, physicians most frequently favored patients with similar demographic characteristics as themselves, with most pairwise comparisons showing statistical significance (<em>P</em>&lt;.05). Nondescript physicians favored White, male, and young demographic characteristics. The male doctor gravitated toward the male, White, and young, whereas the female doctor typically preferred female, young, and White patients. In addition to saving patients with their own political affiliation, Democratic physicians favored Black and female patients, whereas Republicans preferred White and male demographic characteristics. Heterosexual and gay/lesbian physicians frequently saved patients of similar sexual orientation. Overall, publicly available chatbot LLMs demonstrate significant biases, which may negatively impact patient outcomes if used to support clinical care decisions without appropriate precautions.</p></div>\",\"PeriodicalId\":74127,\"journal\":{\"name\":\"Mayo Clinic Proceedings. Digital health\",\"volume\":\"2 2\",\"pages\":\"Pages 186-191\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2949761224000208/pdfft?md5=895559f96cdc78e7afbad43c7d8d164a&pid=1-s2.0-S2949761224000208-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mayo Clinic Proceedings. Digital health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949761224000208\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mayo Clinic Proceedings. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949761224000208","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

人们越来越关注大型语言模型(LLM)的内在偏差、安全性和误导倾向,这可能会对医疗保健产生重大影响。本研究试图确定,在资源匮乏的环境中,基于人工智能(AI)生成的模拟医生生死决策是否会出现偏差。研究人员提出了 13 个问题,模拟医生在资源有限的环境中治疗病人。通过使用 OpenAI 的生成式预训练转换器(GPT-4)对模拟医生进行随机抽样,医生的任务是在资源有限的情况下只选择救治一名病人。每个问题重复模拟 1000 次,每个问题代表 1000 个不同的医生和患者。患者和医生的人口统计学特征各不相同。所有患者在急性病中存活的先验可能性相似。总体而言,模拟医生在临床决策中始终表现出种族、性别、年龄、政治派别和性取向偏见。在所有人口统计学特征中,医生最倾向于选择与自己人口统计学特征相似的患者,大多数配对比较结果显示出统计学意义(P<.05)。无特征的医生偏爱白人、男性和年轻的人口特征。男医生偏爱男性、白人和年轻患者,而女医生通常偏爱女性、年轻和白人患者。民主党医生除了喜欢自己政治派别的病人外,还偏爱黑人和女性病人,而共和党医生则偏爱白人和男性人口特征。异性恋和男同性恋/女同性恋医生经常救治性取向相似的病人。总的来说,公开可用的聊天机器人 LLM 显示出明显的偏见,如果不采取适当的预防措施将其用于支持临床护理决策,可能会对患者的治疗效果产生负面影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Inherent Bias in Large Language Models: A Random Sampling Analysis

There are mounting concerns regarding inherent bias, safety, and tendency toward misinformation of large language models (LLMs), which could have significant implications in health care. This study sought to determine whether generative artificial intelligence (AI)-based simulations of physicians making life-and-death decisions in a resource-scarce environment would demonstrate bias. Thirteen questions were developed that simulated physicians treating patients in resource-limited environments. Through a random sampling of simulated physicians using OpenAI’s generative pretrained transformer (GPT-4), physicians were tasked with choosing only 1 patient to save owing to limited resources. This simulation was repeated 1000 times per question, representing 1000 unique physicians and patients each. Patients and physicians spanned a variety of demographic characteristics. All patients had similar a priori likelihood of surviving the acute illness. Overall, simulated physicians consistently demonstrated racial, gender, age, political affiliation, and sexual orientation bias in clinical decision-making. Across all demographic characteristics, physicians most frequently favored patients with similar demographic characteristics as themselves, with most pairwise comparisons showing statistical significance (P<.05). Nondescript physicians favored White, male, and young demographic characteristics. The male doctor gravitated toward the male, White, and young, whereas the female doctor typically preferred female, young, and White patients. In addition to saving patients with their own political affiliation, Democratic physicians favored Black and female patients, whereas Republicans preferred White and male demographic characteristics. Heterosexual and gay/lesbian physicians frequently saved patients of similar sexual orientation. Overall, publicly available chatbot LLMs demonstrate significant biases, which may negatively impact patient outcomes if used to support clinical care decisions without appropriate precautions.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Mayo Clinic Proceedings. Digital health
Mayo Clinic Proceedings. Digital health Medicine and Dentistry (General), Health Informatics, Public Health and Health Policy
自引率
0.00%
发文量
0
审稿时长
47 days
期刊最新文献
Developing a Research Center for Artificial Intelligence in Medicine Strategic Considerations for Selecting Artificial Intelligence Solutions for Institutional Integration: A Single-Center Experience Reviewers for Mayo Clinic Proceedings: Digital Health (2024) A Blueprint for Clinical-Driven Medical Device Development: The Feverkidstool Application to Identify Children With Serious Bacterial Infection Cost-Effectiveness of Artificial Intelligence-Enabled Electrocardiograms for Early Detection of Low Ejection Fraction: A Secondary Analysis of the Electrocardiogram Artificial Intelligence-Guided Screening for Low Ejection Fraction Trial
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1