Demographic reporting in biosignal datasets: a comprehensive analysis of the PhysioNet open access database.

IF 23.8 1区 医学 Q1 MEDICAL INFORMATICS Lancet Digital Health Pub Date : 2024-10-01 DOI:10.1016/S2589-7500(24)00170-5
Sarah Jiang, Perisa Ashar, Md Mobashir Hasan Shandhi, Jessilyn Dunn
{"title":"Demographic reporting in biosignal datasets: a comprehensive analysis of the PhysioNet open access database.","authors":"Sarah Jiang, Perisa Ashar, Md Mobashir Hasan Shandhi, Jessilyn Dunn","doi":"10.1016/S2589-7500(24)00170-5","DOIUrl":null,"url":null,"abstract":"<p><p>The PhysioNet open access database (PND) is one of the world's largest and most comprehensive repositories of biosignal data and is widely used by researchers to develop, train, and validate algorithms. To contextualise the results of such algorithms, understanding the underlying demographic distribution of the data is crucial-specifically, the race, ethnicity, sex or gender, and age of study participants. We sought to understand the underlying reporting patterns and characteristics of the demographic data of the datasets available on PND. Of the 181 unique datasets present in the PND as of July 6, 2023, 175 involved human participants, with less than 7% of studies reporting on all four of the key demographic variables. Furthermore, we found a higher rate of reporting sex or gender and age than race and ethnicity. In the studies that did include participant sex or gender, the samples were mostly male. Additionally, we found that most studies were done in North America, particularly in the USA. These imbalances and poor reporting of representation raise concerns regarding potential embedded biases in the algorithms that rely on these datasets. They also underscore the need for universal and comprehensive reporting practices to ensure equitable development and deployment of artificial intelligence and machine learning tools in medicine.</p>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":null,"pages":null},"PeriodicalIF":23.8000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/S2589-7500(24)00170-5","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

The PhysioNet open access database (PND) is one of the world's largest and most comprehensive repositories of biosignal data and is widely used by researchers to develop, train, and validate algorithms. To contextualise the results of such algorithms, understanding the underlying demographic distribution of the data is crucial-specifically, the race, ethnicity, sex or gender, and age of study participants. We sought to understand the underlying reporting patterns and characteristics of the demographic data of the datasets available on PND. Of the 181 unique datasets present in the PND as of July 6, 2023, 175 involved human participants, with less than 7% of studies reporting on all four of the key demographic variables. Furthermore, we found a higher rate of reporting sex or gender and age than race and ethnicity. In the studies that did include participant sex or gender, the samples were mostly male. Additionally, we found that most studies were done in North America, particularly in the USA. These imbalances and poor reporting of representation raise concerns regarding potential embedded biases in the algorithms that rely on these datasets. They also underscore the need for universal and comprehensive reporting practices to ensure equitable development and deployment of artificial intelligence and machine learning tools in medicine.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
生物信号数据集的人口统计学报告:对开放存取的 PhysioNet 数据库的综合分析。
PhysioNet 开放存取数据库 (PND) 是世界上最大、最全面的生物信号数据存储库之一,被研究人员广泛用于开发、训练和验证算法。要使这些算法的结果符合实际情况,了解数据的基本人口分布至关重要,特别是研究参与者的种族、民族、性别和年龄。我们试图了解 PND 数据集人口统计数据的基本报告模式和特征。截至 2023 年 7 月 6 日,PND 上有 181 个独特的数据集,其中 175 个涉及人类参与者,只有不到 7% 的研究报告了所有四个关键人口统计学变量。此外,我们发现报告性别和年龄的比例高于报告种族和民族的比例。在包含参与者性别的研究中,样本大多为男性。此外,我们发现大多数研究都是在北美进行的,尤其是美国。这些不平衡和代表性报告的不足引起了人们对依赖于这些数据集的算法中潜在的嵌入式偏见的担忧。它们还强调了普遍和全面报告实践的必要性,以确保医学中人工智能和机器学习工具的公平开发和部署。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
41.20
自引率
1.60%
发文量
232
审稿时长
13 weeks
期刊介绍: The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health. The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health. We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.
期刊最新文献
Demographic reporting in biosignal datasets: a comprehensive analysis of the PhysioNet open access database. Mobile phone interventions to improve health outcomes among patients with chronic diseases: an umbrella review and evidence synthesis from 34 meta-analyses. In the era of digitalisation and biosignatures, is C-reactive protein still the one to beat? In the era of digitalisation and biosignatures, is C-reactive protein still the one to beat? – Authors' reply Correction to Lancet Digit Health 2024; 6: e755–66
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1