Representation of intensivists’ race/ethnicity, sex, and age by artificial intelligence: a cross-sectional study of two text-to-image models

IF 9.3 1区医学 Q1 CRITICAL CARE MEDICINE Critical Care Pub Date : 2024-11-11 DOI:10.1186/s13054-024-05134-4

Mia Gisselbaek, Mélanie Suppan, Laurens Minsart, Ekin Köselerli, Sheila Nainan Myatra, Idit Matot, Odmara L. Barreto Chang, Sarah Saxena, Joana Berger-Estilita

{"title":"Representation of intensivists’ race/ethnicity, sex, and age by artificial intelligence: a cross-sectional study of two text-to-image models","authors":"Mia Gisselbaek, Mélanie Suppan, Laurens Minsart, Ekin Köselerli, Sheila Nainan Myatra, Idit Matot, Odmara L. Barreto Chang, Sarah Saxena, Joana Berger-Estilita","doi":"10.1186/s13054-024-05134-4","DOIUrl":null,"url":null,"abstract":"Integrating artificial intelligence (AI) into intensive care practices can enhance patient care by providing real-time predictions and aiding clinical decisions. However, biases in AI models can undermine diversity, equity, and inclusion (DEI) efforts, particularly in visual representations of healthcare professionals. This work aims to examine the demographic representation of two AI text-to-image models, Midjourney and ChatGPT DALL-E 2, and assess their accuracy in depicting the demographic characteristics of intensivists. This cross-sectional study, conducted from May to July 2024, used demographic data from the USA workforce report (2022) and intensive care trainees (2021) to compare real-world intensivist demographics with images generated by two AI models, Midjourney v6.0 and ChatGPT 4.0 DALL-E 2. A total of 1,400 images were generated across ICU subspecialties, with outcomes being the comparison of sex, race/ethnicity, and age representation in AI-generated images to the actual workforce demographics. The AI models demonstrated noticeable biases when compared to the actual U.S. intensive care workforce data, notably overrepresenting White and young doctors. ChatGPT-DALL-E2 produced less female (17.3% vs 32.2%, p < 0.0001), more White (61% vs 55.1%, p = 0.002) and younger (53.3% vs 23.9%, p < 0.001) individuals. While Midjourney depicted more female (47.6% vs 32.2%, p < 0.001), more White (60.9% vs 55.1%, p = 0.003) and younger intensivist (49.3% vs 23.9%, p < 0.001). Substantial differences between the specialties within both models were observed. Finally when compared together, both models showed significant differences in the Portrayal of intensivists. Significant biases in AI images of intensivists generated by ChatGPT DALL-E 2 and Midjourney reflect broader cultural issues, potentially perpetuating stereotypes of healthcare worker within the society. This study highlights the need for an approach that ensures fairness, accountability, transparency, and ethics in AI applications for healthcare.","PeriodicalId":10811,"journal":{"name":"Critical Care","volume":"13 1","pages":""},"PeriodicalIF":9.3000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Critical Care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13054-024-05134-4","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CRITICAL CARE MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Integrating artificial intelligence (AI) into intensive care practices can enhance patient care by providing real-time predictions and aiding clinical decisions. However, biases in AI models can undermine diversity, equity, and inclusion (DEI) efforts, particularly in visual representations of healthcare professionals. This work aims to examine the demographic representation of two AI text-to-image models, Midjourney and ChatGPT DALL-E 2, and assess their accuracy in depicting the demographic characteristics of intensivists. This cross-sectional study, conducted from May to July 2024, used demographic data from the USA workforce report (2022) and intensive care trainees (2021) to compare real-world intensivist demographics with images generated by two AI models, Midjourney v6.0 and ChatGPT 4.0 DALL-E 2. A total of 1,400 images were generated across ICU subspecialties, with outcomes being the comparison of sex, race/ethnicity, and age representation in AI-generated images to the actual workforce demographics. The AI models demonstrated noticeable biases when compared to the actual U.S. intensive care workforce data, notably overrepresenting White and young doctors. ChatGPT-DALL-E2 produced less female (17.3% vs 32.2%, p < 0.0001), more White (61% vs 55.1%, p = 0.002) and younger (53.3% vs 23.9%, p < 0.001) individuals. While Midjourney depicted more female (47.6% vs 32.2%, p < 0.001), more White (60.9% vs 55.1%, p = 0.003) and younger intensivist (49.3% vs 23.9%, p < 0.001). Substantial differences between the specialties within both models were observed. Finally when compared together, both models showed significant differences in the Portrayal of intensivists. Significant biases in AI images of intensivists generated by ChatGPT DALL-E 2 and Midjourney reflect broader cultural issues, potentially perpetuating stereotypes of healthcare worker within the society. This study highlights the need for an approach that ensures fairness, accountability, transparency, and ethics in AI applications for healthcare.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

人工智能对重症监护医生种族/民族、性别和年龄的表征：对两种文本到图像模型的横断面研究

将人工智能（AI）融入重症监护实践中，可以通过提供实时预测和辅助临床决策来加强对患者的护理。然而，人工智能模型中的偏差可能会破坏多样性、公平性和包容性（DEI）工作，尤其是在医疗保健专业人员的视觉呈现方面。这项工作旨在研究 Midjourney 和 ChatGPT DALL-E 2 这两个人工智能文本到图像模型的人口统计表征，并评估它们在描述重症监护医师人口统计特征方面的准确性。这项横断面研究于 2024 年 5 月至 7 月进行，使用了来自美国劳动力报告（2022 年）和重症监护受训人员（2021 年）的人口统计数据，将真实世界中的重症监护医师人口统计数据与 Midjourney v6.0 和 ChatGPT 4.0 DALL-E 2 这两种人工智能模型生成的图像进行比较。与美国重症监护人员的实际数据相比，人工智能模型表现出明显的偏差，尤其是白人和年轻医生的比例过高。ChatGPT-DALL-E2 生成的女性较少（17.3% vs 32.2%，p < 0.0001），白人较多（61% vs 55.1%，p = 0.002），年轻医生较多（53.3% vs 23.9%，p < 0.001）。而 "中途"（Midjourney）更多的是女性（47.6% vs 32.2%，p < 0.001），更多的是白人（60.9% vs 55.1%，p = 0.003）和更年轻的重症监护医师（49.3% vs 23.9%，p < 0.001）。在两个模型中都观察到了专科之间的巨大差异。最后，如果将两个模型放在一起进行比较，则会发现在对重症监护医生的描述方面存在显著差异。ChatGPT DALL-E 2 和 Midjourney 所生成的人工智能对重症监护医生形象的显著偏差反映了更广泛的文化问题，有可能在社会中延续对医疗工作者的刻板印象。本研究强调，在医疗保健领域应用人工智能时，需要确保公平、问责、透明和道德。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Critical Care 医学-危重病医学

CiteScore

20.60

自引率

3.30%

发文量

348

审稿时长

1.5 months

期刊介绍： Critical Care is an esteemed international medical journal that undergoes a rigorous peer-review process to maintain its high quality standards. Its primary objective is to enhance the healthcare services offered to critically ill patients. To achieve this, the journal focuses on gathering, exchanging, disseminating, and endorsing evidence-based information that is highly relevant to intensivists. By doing so, Critical Care seeks to provide a thorough and inclusive examination of the intensive care field.