Marie-Laure Charpignon, Leo Anthony Celi, Marisa Cobanaj, Rene Eber, Amelia Fiske, Jack Gallifant, Chenyu Li, Gurucharan Lingamallu, Anton Petushkov, Robin Pierce
{"title":"多样性和包容性:开放数据的一个隐性额外优势","authors":"Marie-Laure Charpignon, Leo Anthony Celi, Marisa Cobanaj, Rene Eber, Amelia Fiske, Jack Gallifant, Chenyu Li, Gurucharan Lingamallu, Anton Petushkov, Robin Pierce","doi":"10.1101/2024.03.17.24304443","DOIUrl":null,"url":null,"abstract":"The recent imperative by the National Institutes of Health to share scientific data publicly underscores a significant shift in academic research. Effective as of January 2023, it emphasizes that transparency in data collection and dedicated efforts towards data sharing are prerequisites for translational research, from the lab to the bedside. Given the role of data access in mitigating potential bias in clinical models, we hypothesize that researchers who leverage open-access datasets rather than privately-owned ones are more diverse. In this brief report, we proposed to test this hypothesis in the transdisciplinary and expanding field of artificial intelligence (AI) for critical care. Specifically, we compared the diversity among authors of publications leveraging open datasets, such as the commonly used MIMIC and eICU databases, with that among authors of publications relying exclusively on private datasets, unavailable to other research investigators (e.g., electronic health records from ICU patients accessible only to Mayo Clinic analysts). To measure the extent of author diversity, we characterized gender balance as well as the presence of researchers from low- and middle-income countries (LMIC) and minority-serving institutions (MSI). Our comparative analysis revealed a greater contribution of authors from LMICs and MSIs among researchers leveraging open critical care datasets than among those relying exclusively on private data resources. The participation of women was similar between the two groups, albeit slightly larger in the former. Notably, although over 70% of all articles included at least one author inferred to be a woman, less than 25% had a woman as a first or last author. Importantly, we found that the proportion of authors from LMICs was substantially higher in the treatment than in the control group (10.1% vs. 6.2%, p<0.001), including as first and last authors. Moreover, we found that the proportion of US-based authors affiliated with a MSI was 1.5 times higher among articles in the treatment than in the control group, suggesting that open data resources attract a larger pool of participants from minority groups (8.6% vs. 5.6%, p<0.001).Thus, our study highlights the valuable contribution of the Open Data strategy to underrepresented groups, while also quantifying persisting gender gaps in academic and clinical research at the intersection of computer science and healthcare. In doing so, we hope our work points to the importance of extending open data practices in deliberate and systematic ways.","PeriodicalId":501249,"journal":{"name":"medRxiv - Intensive Care and Critical Care Medicine","volume":"51 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diversity and inclusion: A hidden additional benefit of Open Data\",\"authors\":\"Marie-Laure Charpignon, Leo Anthony Celi, Marisa Cobanaj, Rene Eber, Amelia Fiske, Jack Gallifant, Chenyu Li, Gurucharan Lingamallu, Anton Petushkov, Robin Pierce\",\"doi\":\"10.1101/2024.03.17.24304443\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The recent imperative by the National Institutes of Health to share scientific data publicly underscores a significant shift in academic research. Effective as of January 2023, it emphasizes that transparency in data collection and dedicated efforts towards data sharing are prerequisites for translational research, from the lab to the bedside. Given the role of data access in mitigating potential bias in clinical models, we hypothesize that researchers who leverage open-access datasets rather than privately-owned ones are more diverse. In this brief report, we proposed to test this hypothesis in the transdisciplinary and expanding field of artificial intelligence (AI) for critical care. Specifically, we compared the diversity among authors of publications leveraging open datasets, such as the commonly used MIMIC and eICU databases, with that among authors of publications relying exclusively on private datasets, unavailable to other research investigators (e.g., electronic health records from ICU patients accessible only to Mayo Clinic analysts). To measure the extent of author diversity, we characterized gender balance as well as the presence of researchers from low- and middle-income countries (LMIC) and minority-serving institutions (MSI). Our comparative analysis revealed a greater contribution of authors from LMICs and MSIs among researchers leveraging open critical care datasets than among those relying exclusively on private data resources. The participation of women was similar between the two groups, albeit slightly larger in the former. Notably, although over 70% of all articles included at least one author inferred to be a woman, less than 25% had a woman as a first or last author. Importantly, we found that the proportion of authors from LMICs was substantially higher in the treatment than in the control group (10.1% vs. 6.2%, p<0.001), including as first and last authors. Moreover, we found that the proportion of US-based authors affiliated with a MSI was 1.5 times higher among articles in the treatment than in the control group, suggesting that open data resources attract a larger pool of participants from minority groups (8.6% vs. 5.6%, p<0.001).Thus, our study highlights the valuable contribution of the Open Data strategy to underrepresented groups, while also quantifying persisting gender gaps in academic and clinical research at the intersection of computer science and healthcare. In doing so, we hope our work points to the importance of extending open data practices in deliberate and systematic ways.\",\"PeriodicalId\":501249,\"journal\":{\"name\":\"medRxiv - Intensive Care and Critical Care Medicine\",\"volume\":\"51 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv - Intensive Care and Critical Care Medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.03.17.24304443\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Intensive Care and Critical Care Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.03.17.24304443","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
美国国立卫生研究院(National Institutes of Health)最近要求公开共享科学数据,这凸显了学术研究的重大转变。从 2023 年 1 月开始生效的这一规定强调,数据收集的透明度和为数据共享所做的不懈努力是转化研究(从实验室到床边)的先决条件。鉴于数据访问在减少临床模型中潜在偏差方面的作用,我们假设,利用开放访问数据集而不是私有数据集的研究人员更加多元化。在这份简短的报告中,我们提议在跨学科和不断扩展的重症监护人工智能(AI)领域验证这一假设。具体来说,我们比较了利用开放数据集(如常用的 MIMIC 和 eICU 数据库)发表文章的作者与完全依赖私人数据集(如只有梅奥诊所分析师才能访问的 ICU 患者电子健康记录)发表文章的作者之间的多样性。为了衡量作者的多样性程度,我们对性别平衡以及来自中低收入国家(LMIC)和少数民族服务机构(MSI)的研究人员进行了分析。我们的比较分析表明,在利用开放式重症监护数据集的研究人员中,来自中低收入国家(LMIC)和少数族裔服务机构(MSI)的作者比那些完全依赖于私有数据资源的研究人员有更大的贡献。两组研究人员中女性的参与度相似,只是前者略高。值得注意的是,尽管超过 70% 的文章中至少有一位作者被推断为女性,但只有不到 25% 的文章的第一或最后作者为女性。重要的是,我们发现治疗组中来自低收入国家的作者比例大大高于对照组(10.1% vs. 6.2%,p<0.001),包括第一作者和最后作者。此外,我们还发现,在治疗组的文章中,隶属于 MSI 的美国作者比例是对照组的 1.5 倍,这表明开放数据资源吸引了更多来自少数群体的参与者(8.6% vs. 5.6%,p<0.001)。因此,我们的研究强调了开放数据战略对代表性不足群体的宝贵贡献,同时也量化了计算机科学与医疗保健交叉领域的学术和临床研究中持续存在的性别差距。因此,我们希望我们的工作能指出以有意识和系统化的方式扩展开放数据实践的重要性。
Diversity and inclusion: A hidden additional benefit of Open Data
The recent imperative by the National Institutes of Health to share scientific data publicly underscores a significant shift in academic research. Effective as of January 2023, it emphasizes that transparency in data collection and dedicated efforts towards data sharing are prerequisites for translational research, from the lab to the bedside. Given the role of data access in mitigating potential bias in clinical models, we hypothesize that researchers who leverage open-access datasets rather than privately-owned ones are more diverse. In this brief report, we proposed to test this hypothesis in the transdisciplinary and expanding field of artificial intelligence (AI) for critical care. Specifically, we compared the diversity among authors of publications leveraging open datasets, such as the commonly used MIMIC and eICU databases, with that among authors of publications relying exclusively on private datasets, unavailable to other research investigators (e.g., electronic health records from ICU patients accessible only to Mayo Clinic analysts). To measure the extent of author diversity, we characterized gender balance as well as the presence of researchers from low- and middle-income countries (LMIC) and minority-serving institutions (MSI). Our comparative analysis revealed a greater contribution of authors from LMICs and MSIs among researchers leveraging open critical care datasets than among those relying exclusively on private data resources. The participation of women was similar between the two groups, albeit slightly larger in the former. Notably, although over 70% of all articles included at least one author inferred to be a woman, less than 25% had a woman as a first or last author. Importantly, we found that the proportion of authors from LMICs was substantially higher in the treatment than in the control group (10.1% vs. 6.2%, p<0.001), including as first and last authors. Moreover, we found that the proportion of US-based authors affiliated with a MSI was 1.5 times higher among articles in the treatment than in the control group, suggesting that open data resources attract a larger pool of participants from minority groups (8.6% vs. 5.6%, p<0.001).Thus, our study highlights the valuable contribution of the Open Data strategy to underrepresented groups, while also quantifying persisting gender gaps in academic and clinical research at the intersection of computer science and healthcare. In doing so, we hope our work points to the importance of extending open data practices in deliberate and systematic ways.