The Data Artifacts Glossary: a community-based repository for bias on health datasets.

IF 9 2区 医学 Q1 CELL BIOLOGY Journal of Biomedical Science Pub Date : 2025-02-04 DOI:10.1186/s12929-024-01106-6
Rodrigo R Gameiro, Naira Link Woite, Christopher M Sauer, Sicheng Hao, Chrystinne Oliveira Fernandes, Anna E Premo, Alice Rangel Teixeira, Isabelle Resli, An-Kwok Ian Wong, Leo Anthony Celi
{"title":"The Data Artifacts Glossary: a community-based repository for bias on health datasets.","authors":"Rodrigo R Gameiro, Naira Link Woite, Christopher M Sauer, Sicheng Hao, Chrystinne Oliveira Fernandes, Anna E Premo, Alice Rangel Teixeira, Isabelle Resli, An-Kwok Ian Wong, Leo Anthony Celi","doi":"10.1186/s12929-024-01106-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The deployment of Artificial Intelligence (AI) in healthcare has the potential to transform patient care through improved diagnostics, personalized treatment plans, and more efficient resource management. However, the effectiveness and fairness of AI are critically dependent on the data it learns from. Biased datasets can lead to AI outputs that perpetuate disparities, particularly affecting social minorities and marginalized groups.</p><p><strong>Objective: </strong>This paper introduces the \"Data Artifacts Glossary\", a dynamic, open-source framework designed to systematically document and update potential biases in healthcare datasets. The aim is to provide a comprehensive tool that enhances the transparency and accuracy of AI applications in healthcare and contributes to understanding and addressing health inequities.</p><p><strong>Methods: </strong>Utilizing a methodology inspired by the Delphi method, a diverse team of experts conducted iterative rounds of discussions and literature reviews. The team synthesized insights to develop a comprehensive list of bias categories and designed the glossary's structure. The Data Artifacts Glossary was piloted using the MIMIC-IV dataset to validate its utility and structure.</p><p><strong>Results: </strong>The Data Artifacts Glossary adopts a collaborative approach modeled on successful open-source projects like Linux and Python. Hosted on GitHub, it utilizes robust version control and collaborative features, allowing stakeholders from diverse backgrounds to contribute. Through a rigorous peer review process managed by community members, the glossary ensures the continual refinement and accuracy of its contents. The implementation of the Data Artifacts Glossary with the MIMIC-IV dataset illustrates its utility. It categorizes biases, and facilitates their identification and understanding.</p><p><strong>Conclusion: </strong>The Data Artifacts Glossary serves as a vital resource for enhancing the integrity of AI applications in healthcare by providing a mechanism to recognize and mitigate dataset biases before they impact AI outputs. It not only aids in avoiding bias in model development but also contributes to understanding and addressing the root causes of health disparities.</p>","PeriodicalId":15365,"journal":{"name":"Journal of Biomedical Science","volume":"32 1","pages":"14"},"PeriodicalIF":9.0000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11792693/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Science","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12929-024-01106-6","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The deployment of Artificial Intelligence (AI) in healthcare has the potential to transform patient care through improved diagnostics, personalized treatment plans, and more efficient resource management. However, the effectiveness and fairness of AI are critically dependent on the data it learns from. Biased datasets can lead to AI outputs that perpetuate disparities, particularly affecting social minorities and marginalized groups.

Objective: This paper introduces the "Data Artifacts Glossary", a dynamic, open-source framework designed to systematically document and update potential biases in healthcare datasets. The aim is to provide a comprehensive tool that enhances the transparency and accuracy of AI applications in healthcare and contributes to understanding and addressing health inequities.

Methods: Utilizing a methodology inspired by the Delphi method, a diverse team of experts conducted iterative rounds of discussions and literature reviews. The team synthesized insights to develop a comprehensive list of bias categories and designed the glossary's structure. The Data Artifacts Glossary was piloted using the MIMIC-IV dataset to validate its utility and structure.

Results: The Data Artifacts Glossary adopts a collaborative approach modeled on successful open-source projects like Linux and Python. Hosted on GitHub, it utilizes robust version control and collaborative features, allowing stakeholders from diverse backgrounds to contribute. Through a rigorous peer review process managed by community members, the glossary ensures the continual refinement and accuracy of its contents. The implementation of the Data Artifacts Glossary with the MIMIC-IV dataset illustrates its utility. It categorizes biases, and facilitates their identification and understanding.

Conclusion: The Data Artifacts Glossary serves as a vital resource for enhancing the integrity of AI applications in healthcare by providing a mechanism to recognize and mitigate dataset biases before they impact AI outputs. It not only aids in avoiding bias in model development but also contributes to understanding and addressing the root causes of health disparities.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Biomedical Science
Journal of Biomedical Science 医学-医学:研究与实验
CiteScore
18.50
自引率
0.90%
发文量
95
审稿时长
1 months
期刊介绍: The Journal of Biomedical Science is an open access, peer-reviewed journal that focuses on fundamental and molecular aspects of basic medical sciences. It emphasizes molecular studies of biomedical problems and mechanisms. The National Science and Technology Council (NSTC), Taiwan supports the journal and covers the publication costs for accepted articles. The journal aims to provide an international platform for interdisciplinary discussions and contribute to the advancement of medicine. It benefits both readers and authors by accelerating the dissemination of research information and providing maximum access to scholarly communication. All articles published in the Journal of Biomedical Science are included in various databases such as Biological Abstracts, BIOSIS, CABI, CAS, Citebase, Current contents, DOAJ, Embase, EmBiology, and Global Health, among others.
期刊最新文献
Tumor-initiating and metastasis-initiating cells of clear-cell renal cell carcinoma. Unlocking precision medicine: clinical applications of integrating health records, genetics, and immunology through artificial intelligence. Targeting the G-quadruplex as a novel strategy for developing antibiotics against hypervirulent drug-resistant Staphylococcus aureus. The Data Artifacts Glossary: a community-based repository for bias on health datasets. Utilisation of an in vivo malaria model to provide functional proof for RhopH1/CLAG essentiality and conserved orthology with P. falciparum.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1