Sarah Jiang , Perisa Ashar , Md Mobashir Hasan Shandhi , Jessilyn Dunn
{"title":"生物信号数据集的人口统计学报告:对开放存取的 PhysioNet 数据库的综合分析。","authors":"Sarah Jiang , Perisa Ashar , Md Mobashir Hasan Shandhi , Jessilyn Dunn","doi":"10.1016/S2589-7500(24)00170-5","DOIUrl":null,"url":null,"abstract":"<div><div>The PhysioNet open access database (PND) is one of the world's largest and most comprehensive repositories of biosignal data and is widely used by researchers to develop, train, and validate algorithms. To contextualise the results of such algorithms, understanding the underlying demographic distribution of the data is crucial—specifically, the race, ethnicity, sex or gender, and age of study participants. We sought to understand the underlying reporting patterns and characteristics of the demographic data of the datasets available on PND. Of the 181 unique datasets present in the PND as of July 6, 2023, 175 involved human participants, with less than 7% of studies reporting on all four of the key demographic variables. Furthermore, we found a higher rate of reporting sex or gender and age than race and ethnicity. In the studies that did include participant sex or gender, the samples were mostly male. Additionally, we found that most studies were done in North America, particularly in the USA. These imbalances and poor reporting of representation raise concerns regarding potential embedded biases in the algorithms that rely on these datasets. They also underscore the need for universal and comprehensive reporting practices to ensure equitable development and deployment of artificial intelligence and machine learning tools in medicine.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"6 11","pages":"Pages e871-e878"},"PeriodicalIF":23.8000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Demographic reporting in biosignal datasets: a comprehensive analysis of the PhysioNet open access database\",\"authors\":\"Sarah Jiang , Perisa Ashar , Md Mobashir Hasan Shandhi , Jessilyn Dunn\",\"doi\":\"10.1016/S2589-7500(24)00170-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The PhysioNet open access database (PND) is one of the world's largest and most comprehensive repositories of biosignal data and is widely used by researchers to develop, train, and validate algorithms. To contextualise the results of such algorithms, understanding the underlying demographic distribution of the data is crucial—specifically, the race, ethnicity, sex or gender, and age of study participants. We sought to understand the underlying reporting patterns and characteristics of the demographic data of the datasets available on PND. Of the 181 unique datasets present in the PND as of July 6, 2023, 175 involved human participants, with less than 7% of studies reporting on all four of the key demographic variables. Furthermore, we found a higher rate of reporting sex or gender and age than race and ethnicity. In the studies that did include participant sex or gender, the samples were mostly male. Additionally, we found that most studies were done in North America, particularly in the USA. These imbalances and poor reporting of representation raise concerns regarding potential embedded biases in the algorithms that rely on these datasets. They also underscore the need for universal and comprehensive reporting practices to ensure equitable development and deployment of artificial intelligence and machine learning tools in medicine.</div></div>\",\"PeriodicalId\":48534,\"journal\":{\"name\":\"Lancet Digital Health\",\"volume\":\"6 11\",\"pages\":\"Pages e871-e878\"},\"PeriodicalIF\":23.8000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lancet Digital Health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2589750024001705\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589750024001705","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
Demographic reporting in biosignal datasets: a comprehensive analysis of the PhysioNet open access database
The PhysioNet open access database (PND) is one of the world's largest and most comprehensive repositories of biosignal data and is widely used by researchers to develop, train, and validate algorithms. To contextualise the results of such algorithms, understanding the underlying demographic distribution of the data is crucial—specifically, the race, ethnicity, sex or gender, and age of study participants. We sought to understand the underlying reporting patterns and characteristics of the demographic data of the datasets available on PND. Of the 181 unique datasets present in the PND as of July 6, 2023, 175 involved human participants, with less than 7% of studies reporting on all four of the key demographic variables. Furthermore, we found a higher rate of reporting sex or gender and age than race and ethnicity. In the studies that did include participant sex or gender, the samples were mostly male. Additionally, we found that most studies were done in North America, particularly in the USA. These imbalances and poor reporting of representation raise concerns regarding potential embedded biases in the algorithms that rely on these datasets. They also underscore the need for universal and comprehensive reporting practices to ensure equitable development and deployment of artificial intelligence and machine learning tools in medicine.
期刊介绍:
The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health.
The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health.
We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.