首页 > 最新文献

Journal of the American Medical Informatics Association最新文献

英文 中文
Pregnancy episodes in All of Us: harnessing multi-source data for pregnancy-related research. 我们所有人的妊娠事件:利用多源数据开展与妊娠有关的研究。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-24 DOI: 10.1093/jamia/ocae195
Louisa H Smith, Wanjiang Wang, Brianna Keefe-Oates

Objectives: The National Institutes of Health's All of Us Research Program addresses gaps in biomedical research by collecting health data from diverse populations. Pregnant individuals have historically been underrepresented in biomedical research, and pregnancy-related research is often limited by data availability, sample size, and inadequate representation of the diversity of pregnant people. All of Us integrates a wealth of health-related data, providing a unique opportunity to conduct comprehensive pregnancy-related research. We aimed to identify pregnancy episodes with high-quality electronic health record (EHR) data in All of Us Research Program data and evaluate the program's utility for pregnancy-related research.

Materials and methods: We used a previously published algorithm to identify pregnancy episodes in All of Us EHR data. We described these pregnancies, validated them with All of Us survey data, and compared them to national statistics.

Results: Our study identified 18 970 pregnancy episodes from 14 234 participants; other possible pregnancy episodes had low-quality or insufficient data. Validation against people who reported a current pregnancy on an All of Us survey found low false positive and negative rates. Demographics were similar in some respects to national data; however, Asian-Americans were underrepresented, and older, highly educated pregnant people were overrepresented.

Discussion: Our approach demonstrates the capacity of All of Us to support pregnancy research and reveals the diversity of the pregnancy cohort. However, we noted an underrepresentation among some demographics. Other limitations include measurement error in gestational age and limited data on non-live births.

Conclusion: The wide variety of data in the All of Us program, encompassing EHR, survey, genomic, and fitness tracker data, offers a valuable resource for studying pregnancy, yet care must be taken to avoid biases.

目标:美国国立卫生研究院的 "我们所有人研究计划 "通过收集不同人群的健康数据来弥补生物医学研究的不足。孕妇在生物医学研究中的代表性历来不足,与怀孕相关的研究往往受到数据可用性、样本大小以及孕妇多样性代表性不足等因素的限制。我们所有人》整合了丰富的健康相关数据,为开展全面的孕期相关研究提供了独特的机会。我们的目的是在 "我们所有人 "研究计划的数据中识别具有高质量电子健康记录(EHR)数据的妊娠事件,并评估该计划在妊娠相关研究中的实用性:我们使用之前发布的算法来识别 All of Us 电子病历数据中的妊娠事件。我们对这些妊娠进行了描述,用 All of Us 调查数据对其进行了验证,并将其与国家统计数据进行了比较:我们的研究从 14 234 名参与者中识别出了 18 970 次怀孕事件;其他可能的怀孕事件数据质量较低或不足。与在 "我们所有人 "调查中报告当前怀孕的人进行验证后发现,假阳性率和假阴性率都很低。人口统计学在某些方面与全国数据相似;但是,亚裔美国人的比例偏低,年龄较大、受过高等教育的孕妇比例偏高:讨论:我们的方法展示了 "我们所有人 "支持孕期研究的能力,并揭示了孕期人群的多样性。然而,我们也注意到某些人口统计中存在代表性不足的情况。其他限制还包括胎龄测量误差和非活产数据有限:结论:All of Us 计划中的数据种类繁多,包括电子病历、调查、基因组和健身追踪器数据,为研究妊娠提供了宝贵的资源,但必须注意避免偏差。
{"title":"Pregnancy episodes in All of Us: harnessing multi-source data for pregnancy-related research.","authors":"Louisa H Smith, Wanjiang Wang, Brianna Keefe-Oates","doi":"10.1093/jamia/ocae195","DOIUrl":"https://doi.org/10.1093/jamia/ocae195","url":null,"abstract":"<p><strong>Objectives: </strong>The National Institutes of Health's All of Us Research Program addresses gaps in biomedical research by collecting health data from diverse populations. Pregnant individuals have historically been underrepresented in biomedical research, and pregnancy-related research is often limited by data availability, sample size, and inadequate representation of the diversity of pregnant people. All of Us integrates a wealth of health-related data, providing a unique opportunity to conduct comprehensive pregnancy-related research. We aimed to identify pregnancy episodes with high-quality electronic health record (EHR) data in All of Us Research Program data and evaluate the program's utility for pregnancy-related research.</p><p><strong>Materials and methods: </strong>We used a previously published algorithm to identify pregnancy episodes in All of Us EHR data. We described these pregnancies, validated them with All of Us survey data, and compared them to national statistics.</p><p><strong>Results: </strong>Our study identified 18 970 pregnancy episodes from 14 234 participants; other possible pregnancy episodes had low-quality or insufficient data. Validation against people who reported a current pregnancy on an All of Us survey found low false positive and negative rates. Demographics were similar in some respects to national data; however, Asian-Americans were underrepresented, and older, highly educated pregnant people were overrepresented.</p><p><strong>Discussion: </strong>Our approach demonstrates the capacity of All of Us to support pregnancy research and reveals the diversity of the pregnancy cohort. However, we noted an underrepresentation among some demographics. Other limitations include measurement error in gestational age and limited data on non-live births.</p><p><strong>Conclusion: </strong>The wide variety of data in the All of Us program, encompassing EHR, survey, genomic, and fitness tracker data, offers a valuable resource for studying pregnancy, yet care must be taken to avoid biases.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-based estimation of individual-level social determinants of health and its applications in All of Us. 基于模型的个人健康社会决定因素估算及其在《我们大家》中的应用。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-14 DOI: 10.1093/jamia/ocae168
Bo Young Kim, Rebecca Anthopolos, Hyungrok Do, Judy Zhong

Objectives: We introduce a widely applicable model-based approach for estimating individual-level Social Determinants of Health (SDoH) and evaluate its effectiveness using the All of Us Research Program.

Materials and methods: Our approach utilizes aggregated SDoH datasets to estimate individual-level SDoH, demonstrated with examples of no high school diploma (NOHSDP) and no health insurance (UNINSUR) variables. Models are estimated using American Community Survey data and applied to derive individual-level estimates for All of Us participants. We assess concordance between model-based SDoH estimates and self-reported SDoHs in All of Us and examine associations with undiagnosed hypertension and diabetes.

Results: Compared to self-reported SDoHs, the area under the curve for NOHSDP is 0.727 (95% CI, 0.724-0.730) and for UNINSUR is 0.730 (95% CI, 0.727-0.733) among the 329 074 All of Us participants, both significantly higher than aggregated SDoHs. The association between model-based NOHSDP and undiagnosed hypertension is concordant with those estimated using self-reported NOHSDP, with a correlation coefficient of 0.649. Similarly, the association between model-based NOHSDP and undiagnosed diabetes is concordant with those estimated using self-reported NOHSDP, with a correlation coefficient of 0.900.

Discussion and conclusion: The model-based SDoH estimation method offers a scalable and easily standardized approach for estimating individual-level SDoHs. Using the All of Us dataset, we demonstrate reasonable concordance between model-based SDoH estimates and self-reported SDoHs, along with consistent associations with health outcomes. Our findings also underscore the critical role of geographic contexts in SDoH estimation and in evaluating the association between SDoHs and health outcomes.

目的:我们介绍了一种广泛适用的基于模型的方法,用于估算个人层面的社会健康决定因素(SDoH),并利用 "我们所有人 "研究计划评估其有效性:我们介绍了一种广泛适用的基于模型的方法,用于估算个人层面的健康社会决定因素(SDoH),并利用 "我们所有人 "研究计划对其有效性进行了评估:我们的方法利用汇总的 SDoH 数据集来估算个人层面的 SDoH,并以无高中文凭(NOHSDP)和无医疗保险(UNINSUR)变量为例进行演示。我们使用美国社区调查数据对模型进行了估算,并将其应用于推导 "我们所有人 "参与者的个人水平估算值。我们评估了基于模型的 SDoH 估计值与 "我们所有人 "中自我报告的 SDoH 之间的一致性,并研究了与未确诊的高血压和糖尿病之间的关联:在 329074 名 All of Us 参与者中,与自我报告的 SDoHs 相比,NOHSDP 的曲线下面积为 0.727(95% CI,0.724-0.730),UNINSUR 的曲线下面积为 0.730(95% CI,0.727-0.733),均显著高于综合 SDoHs。基于模型的 NOHSDP 与未确诊高血压之间的相关性与使用自我报告的 NOHSDP 估算的相关性一致,相关系数为 0.649。同样,基于模型的 NOHSDP 与未确诊糖尿病之间的相关性与使用自我报告的 NOHSDP 估算的相关性一致,相关系数为 0.900:基于模型的 SDoH 估算方法为估算个人层面的 SDoH 提供了一种可扩展且易于标准化的方法。利用 "我们所有人 "数据集,我们证明了基于模型的 SDoH 估算值与自我报告的 SDoH 之间的合理一致性,以及与健康结果之间的一致关联。我们的研究结果还强调了地理环境在 SDoH 估算以及 SDoH 与健康结果之间关联评估中的关键作用。
{"title":"Model-based estimation of individual-level social determinants of health and its applications in All of Us.","authors":"Bo Young Kim, Rebecca Anthopolos, Hyungrok Do, Judy Zhong","doi":"10.1093/jamia/ocae168","DOIUrl":"10.1093/jamia/ocae168","url":null,"abstract":"<p><strong>Objectives: </strong>We introduce a widely applicable model-based approach for estimating individual-level Social Determinants of Health (SDoH) and evaluate its effectiveness using the All of Us Research Program.</p><p><strong>Materials and methods: </strong>Our approach utilizes aggregated SDoH datasets to estimate individual-level SDoH, demonstrated with examples of no high school diploma (NOHSDP) and no health insurance (UNINSUR) variables. Models are estimated using American Community Survey data and applied to derive individual-level estimates for All of Us participants. We assess concordance between model-based SDoH estimates and self-reported SDoHs in All of Us and examine associations with undiagnosed hypertension and diabetes.</p><p><strong>Results: </strong>Compared to self-reported SDoHs, the area under the curve for NOHSDP is 0.727 (95% CI, 0.724-0.730) and for UNINSUR is 0.730 (95% CI, 0.727-0.733) among the 329 074 All of Us participants, both significantly higher than aggregated SDoHs. The association between model-based NOHSDP and undiagnosed hypertension is concordant with those estimated using self-reported NOHSDP, with a correlation coefficient of 0.649. Similarly, the association between model-based NOHSDP and undiagnosed diabetes is concordant with those estimated using self-reported NOHSDP, with a correlation coefficient of 0.900.</p><p><strong>Discussion and conclusion: </strong>The model-based SDoH estimation method offers a scalable and easily standardized approach for estimating individual-level SDoHs. Using the All of Us dataset, we demonstrate reasonable concordance between model-based SDoH estimates and self-reported SDoHs, along with consistent associations with health outcomes. Our findings also underscore the critical role of geographic contexts in SDoH estimation and in evaluating the association between SDoHs and health outcomes.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research to classrooms: a co-designed curriculum brings All of Us data to secondary schools. 将研究带入课堂:共同设计的课程将 "我们所有人 "的数据带入中学。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-09 DOI: 10.1093/jamia/ocae167
Louisa A Stark, Kristin E Fenker, Harini Krishnan, Molly Malone, Rebecca J Peterson, Regina Cowan, Jeremy Ensrud, Hector Gamboa, Crstina Gayed, Patricia Refino, Tia Tolk, Teresa Walters, Yong Crosby, Rubin Baskir

Objectives: We describe new curriculum materials for engaging secondary school students in exploring the "big data" in the NIH All of Us Research Program's Public Data Browser and the co-design processes used to collaboratively develop the materials. We also describe the methods used to develop and validate assessment items for studying the efficacy of the materials for student learning as well as preliminary findings from these studies.

Materials and methods: Secondary-level biology teachers from across the United States participated in a 2.5-day Co-design Summer Institute. After learning about the All of Us Research Program and its Data Browser, they collaboratively developed learning objectives and initial ideas for learning experiences related to exploring the Data Browser and big data. The Genetic Science Learning Center team at the University of Utah further developed the educators' ideas. Additional teachers and their students participated in classroom pilot studies to validate a 22-item instrument that assesses students' knowledge. Educators completed surveys about the materials and their experiences.

Results: The "Exploring Big Data with the All of Us Data Browser" curriculum module includes 3 data exploration guides that engage students in using the Data Browser, 3 related multimedia pieces, and teacher support materials. Pilot testing showed substantial growth in students' understanding of key big data concepts and research applications.

Discussion and conclusion: Our co-design process provides a model for educator engagement. The new curriculum module serves as a model for introducing secondary students to big data and precision medicine research by exploring diverse real-world datasets.

目的:我们介绍了让中学生参与探索美国国立卫生研究院全民研究计划公共数据浏览器中的 "大数据 "的新课程材料,以及合作开发这些材料所采用的共同设计过程。我们还介绍了用于开发和验证评估项目的方法,以研究教材对学生学习的有效性,以及这些研究的初步结果:来自美国各地的中学生物教师参加了为期 2.5 天的共同设计暑期学院。在了解了 "我们所有人 "研究计划及其数据浏览器之后,他们共同制定了学习目标,并初步构想了与探索数据浏览器和大数据有关的学习体验。犹他大学遗传科学学习中心团队进一步完善了教育工作者的想法。其他教师及其学生参与了课堂试点研究,以验证评估学生知识的 22 个项目的工具。教育工作者完成了有关教材及其经验的调查:使用我们所有人的数据浏览器探索大数据 "课程模块包括 3 个数据探索指南(让学生参与使用数据浏览器)、3 个相关的多媒体作品和教师支持材料。试点测试表明,学生对关键大数据概念和研究应用的理解有了很大提高:我们的共同设计过程为教育工作者的参与提供了一种模式。新课程模块是通过探索各种真实世界数据集向中学生介绍大数据和精准医学研究的典范。
{"title":"Research to classrooms: a co-designed curriculum brings All of Us data to secondary schools.","authors":"Louisa A Stark, Kristin E Fenker, Harini Krishnan, Molly Malone, Rebecca J Peterson, Regina Cowan, Jeremy Ensrud, Hector Gamboa, Crstina Gayed, Patricia Refino, Tia Tolk, Teresa Walters, Yong Crosby, Rubin Baskir","doi":"10.1093/jamia/ocae167","DOIUrl":"10.1093/jamia/ocae167","url":null,"abstract":"<p><strong>Objectives: </strong>We describe new curriculum materials for engaging secondary school students in exploring the \"big data\" in the NIH All of Us Research Program's Public Data Browser and the co-design processes used to collaboratively develop the materials. We also describe the methods used to develop and validate assessment items for studying the efficacy of the materials for student learning as well as preliminary findings from these studies.</p><p><strong>Materials and methods: </strong>Secondary-level biology teachers from across the United States participated in a 2.5-day Co-design Summer Institute. After learning about the All of Us Research Program and its Data Browser, they collaboratively developed learning objectives and initial ideas for learning experiences related to exploring the Data Browser and big data. The Genetic Science Learning Center team at the University of Utah further developed the educators' ideas. Additional teachers and their students participated in classroom pilot studies to validate a 22-item instrument that assesses students' knowledge. Educators completed surveys about the materials and their experiences.</p><p><strong>Results: </strong>The \"Exploring Big Data with the All of Us Data Browser\" curriculum module includes 3 data exploration guides that engage students in using the Data Browser, 3 related multimedia pieces, and teacher support materials. Pilot testing showed substantial growth in students' understanding of key big data concepts and research applications.</p><p><strong>Discussion and conclusion: </strong>Our co-design process provides a model for educator engagement. The new curriculum module serves as a model for introducing secondary students to big data and precision medicine research by exploring diverse real-world datasets.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141564952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of calibration to improve the precision of estimates obtained from All of Us data. 利用校准提高从 "我们所有人 "数据中获得的估计值的精确度。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-09 DOI: 10.1093/jamia/ocae181
Vivian Hsing-Chun Wang, Julie Holm, José A Pagán

Objectives: To highlight the use of calibration weighting to improve the precision of estimates obtained from All of Us data and increase the return of value to communities from the All of Us Research Program.

Materials and methods: We used All of Us (2017-2022) data and raking to obtain prevalence estimates in two examples: discrimination in medical settings (N = 41 875) and food insecurity (N = 82 266). Weights were constructed using known population proportions (age, sex, race/ethnicity, region of residence, annual household income, and home ownership) from the 2020 National Health Interview Survey.

Results: About 37% of adults experienced discrimination in a medical setting. About 20% of adults who had not seen a doctor reported being food insecure compared with 14% of adults who regularly saw a doctor.

Conclusions: Calibration using raking is cost-effective and may lead to more precise estimates when analyzing All of Us data.

目标:强调校准加权的使用,以提高从 "我们所有人 "数据中获得的估计值的精确度,并增加 "我们所有人 "研究计划对社区的价值回报:我们使用 All of Us(2017-2022 年)数据和耙法获得了两个实例的流行率估计值:医疗环境中的歧视(N = 41 875)和粮食不安全(N = 82 266)。利用 2020 年全国健康访谈调查的已知人口比例(年龄、性别、种族/民族、居住地区、家庭年收入和房屋所有权)构建权重:约 37% 的成年人在医疗环境中遭受过歧视。约 20% 没有看过医生的成年人表示食物无保障,而定期看医生的成年人中这一比例为 14%:在分析 "我们所有人 "数据时,使用耙法进行校准具有成本效益,并可获得更精确的估计值。
{"title":"Use of calibration to improve the precision of estimates obtained from All of Us data.","authors":"Vivian Hsing-Chun Wang, Julie Holm, José A Pagán","doi":"10.1093/jamia/ocae181","DOIUrl":"https://doi.org/10.1093/jamia/ocae181","url":null,"abstract":"<p><strong>Objectives: </strong>To highlight the use of calibration weighting to improve the precision of estimates obtained from All of Us data and increase the return of value to communities from the All of Us Research Program.</p><p><strong>Materials and methods: </strong>We used All of Us (2017-2022) data and raking to obtain prevalence estimates in two examples: discrimination in medical settings (N = 41 875) and food insecurity (N = 82 266). Weights were constructed using known population proportions (age, sex, race/ethnicity, region of residence, annual household income, and home ownership) from the 2020 National Health Interview Survey.</p><p><strong>Results: </strong>About 37% of adults experienced discrimination in a medical setting. About 20% of adults who had not seen a doctor reported being food insecure compared with 14% of adults who regularly saw a doctor.</p><p><strong>Conclusions: </strong>Calibration using raking is cost-effective and may lead to more precise estimates when analyzing All of Us data.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141564953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fair prediction of 2-year stroke risk in patients with atrial fibrillation. 对心房颤动患者 2 年中风风险的合理预测。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-03 DOI: 10.1093/jamia/ocae170
Jifan Gao, Philip Mar, Zheng-Zheng Tang, Guanhua Chen

Objective: This study aims to develop machine learning models that provide both accurate and equitable predictions of 2-year stroke risk for patients with atrial fibrillation across diverse racial groups.

Materials and methods: Our study utilized structured electronic health records (EHR) data from the All of Us Research Program. Machine learning models (LightGBM) were utilized to capture the relations between stroke risks and the predictors used by the widely recognized CHADS2 and CHA2DS2-VASc scores. We mitigated the racial disparity by creating a representative tuning set, customizing tuning criteria, and setting binary thresholds separately for subgroups. We constructed a hold-out test set that not only supports temporal validation but also includes a larger proportion of Black/African Americans for fairness validation.

Results: Compared to the original CHADS2 and CHA2DS2-VASc scores, significant improvements were achieved by modeling their predictors using machine learning models (Area Under the Receiver Operating Characteristic curve from near 0.70 to above 0.80). Furthermore, applying our disparity mitigation strategies can effectively enhance model fairness compared to the conventional cross-validation approach.

Discussion: Modeling CHADS2 and CHA2DS2-VASc risk factors with LightGBM and our disparity mitigation strategies achieved decent discriminative performance and excellent fairness performance. In addition, this approach can provide a complete interpretation of each predictor. These highlight its potential utility in clinical practice.

Conclusions: Our research presents a practical example of addressing clinical challenges through the All of Us Research Program data. The disparity mitigation framework we proposed is adaptable across various models and data modalities, demonstrating broad potential in clinical informatics.

目的: 本研究旨在开发机器学习模型,以准确、公平地预测不同种族群体心房颤动患者的 2 年中风风险:本研究旨在开发机器学习模型,为不同种族群体的心房颤动患者提供准确、公平的 2 年中风风险预测:我们的研究利用了 "我们所有人研究计划 "的结构化电子健康记录(EHR)数据。我们利用机器学习模型(LightGBM)来捕捉中风风险与被广泛认可的 CHADS2 和 CHA2DS2-VASc 评分所使用的预测因子之间的关系。我们通过创建具有代表性的调整集、定制调整标准以及为亚组分别设置二进制阈值来减少种族差异。我们构建了一个暂不测试集,它不仅支持时间验证,还包括更大比例的黑人/非裔美国人,用于公平性验证:结果:与最初的 CHADS2 和 CHA2DS2-VASc 评分相比,通过使用机器学习模型对其预测因子进行建模,结果有了显著改善(接收者工作特征曲线下面积从接近 0.70 提高到 0.80 以上)。此外,与传统的交叉验证方法相比,采用我们的差异缓解策略可以有效提高模型的公平性:讨论:利用 LightGBM 和我们的差异缓解策略对 CHADS2 和 CHA2DS2-VASc 危险因素建模,取得了良好的判别性能和出色的公平性。此外,这种方法还能提供对每个预测因子的完整解释。这些都凸显了它在临床实践中的潜在用途:我们的研究提供了一个通过 "全民研究计划 "数据应对临床挑战的实例。我们提出的差异缓解框架可适用于各种模型和数据模式,展示了临床信息学的广泛潜力。
{"title":"Fair prediction of 2-year stroke risk in patients with atrial fibrillation.","authors":"Jifan Gao, Philip Mar, Zheng-Zheng Tang, Guanhua Chen","doi":"10.1093/jamia/ocae170","DOIUrl":"https://doi.org/10.1093/jamia/ocae170","url":null,"abstract":"<p><strong>Objective: </strong>This study aims to develop machine learning models that provide both accurate and equitable predictions of 2-year stroke risk for patients with atrial fibrillation across diverse racial groups.</p><p><strong>Materials and methods: </strong>Our study utilized structured electronic health records (EHR) data from the All of Us Research Program. Machine learning models (LightGBM) were utilized to capture the relations between stroke risks and the predictors used by the widely recognized CHADS2 and CHA2DS2-VASc scores. We mitigated the racial disparity by creating a representative tuning set, customizing tuning criteria, and setting binary thresholds separately for subgroups. We constructed a hold-out test set that not only supports temporal validation but also includes a larger proportion of Black/African Americans for fairness validation.</p><p><strong>Results: </strong>Compared to the original CHADS2 and CHA2DS2-VASc scores, significant improvements were achieved by modeling their predictors using machine learning models (Area Under the Receiver Operating Characteristic curve from near 0.70 to above 0.80). Furthermore, applying our disparity mitigation strategies can effectively enhance model fairness compared to the conventional cross-validation approach.</p><p><strong>Discussion: </strong>Modeling CHADS2 and CHA2DS2-VASc risk factors with LightGBM and our disparity mitigation strategies achieved decent discriminative performance and excellent fairness performance. In addition, this approach can provide a complete interpretation of each predictor. These highlight its potential utility in clinical practice.</p><p><strong>Conclusions: </strong>Our research presents a practical example of addressing clinical challenges through the All of Us Research Program data. The disparity mitigation framework we proposed is adaptable across various models and data modalities, demonstrating broad potential in clinical informatics.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141499494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing diagnostic delays in acute hepatic porphyria using health records data and machine learning. 利用健康记录数据和机器学习减少急性肝性卟啉症的诊断延误。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-01 DOI: 10.1093/jamia/ocae141
Balu Bhasuran, Katharina Schmolly, Yuvraaj Kapoor, Nanditha Lakshmi Jayakumar, Raymond Doan, Jigar Amin, Stephen Meninger, Nathan Cheng, Robert Deering, Karl Anderson, Simon W Beaven, Bruce Wang, Vivek A Rudrapatna

Background: Acute hepatic porphyria (AHP) is a group of rare but treatable conditions associated with diagnostic delays of 15 years on average. The advent of electronic health records (EHR) data and machine learning (ML) may improve the timely recognition of rare diseases like AHP. However, prediction models can be difficult to train given the limited case numbers, unstructured EHR data, and selection biases intrinsic to healthcare delivery. We sought to train and characterize models for identifying patients with AHP.

Methods: This diagnostic study used structured and notes-based EHR data from 2 centers at the University of California, UCSF (2012-2022) and UCLA (2019-2022). The data were split into 2 cohorts (referral and diagnosis) and used to develop models that predict (1) who will be referred for testing of acute porphyria, among those who presented with abdominal pain (a cardinal symptom of AHP), and (2) who will test positive, among those referred. The referral cohort consisted of 747 patients referred for testing and 99 849 contemporaneous patients who were not. The diagnosis cohort consisted of 72 confirmed AHP cases and 347 patients who tested negative. The case cohort was 81% female and 6-75 years old at the time of diagnosis. Candidate models used a range of architectures. Feature selection was semi-automated and incorporated publicly available data from knowledge graphs. Our primary outcome was the F-score on an outcome-stratified test set.

Results: The best center-specific referral models achieved an F-score of 86%-91%. The best diagnosis model achieved an F-score of 92%. To further test our model, we contacted 372 current patients who lack an AHP diagnosis but were predicted by our models as potentially having it (≥10% probability of referral, ≥50% of testing positive). However, we were only able to recruit 10 of these patients for biochemical testing, all of whom were negative. Nonetheless, post hoc evaluations suggested that these models could identify 71% of cases earlier than their diagnosis date, saving 1.2 years.

Conclusions: ML can reduce diagnostic delays in AHP and other rare diseases. Robust recruitment strategies and multicenter coordination will be needed to validate these models before they can be deployed.

背景:急性肝卟啉症(AHP)是一组罕见但可治疗的疾病,平均诊断延迟时间长达 15 年。电子健康记录(EHR)数据和机器学习(ML)的出现可能会改善对 AHP 等罕见疾病的及时识别。然而,由于病例数量有限、电子病历数据不结构化以及医疗服务固有的选择偏差,预测模型可能很难训练。我们试图训练和描述识别 AHP 患者的模型:这项诊断研究使用了加州大学旧金山分校(2012-2022 年)和加州大学洛杉矶分校(2019-2022 年)两个中心的结构化和基于笔记的电子病历数据。这些数据被分为两个队列(转诊和诊断),并用于建立模型,预测:(1) 在出现腹痛(AHP 的主要症状)的患者中,哪些人会被转诊接受急性卟啉症检测;(2) 在转诊患者中,哪些人会检测呈阳性。转诊队列由 747 名转诊患者和 99 849 名未转诊的同期患者组成。诊断队列包括 72 例确诊的 AHP 病例和 347 例检测呈阴性的患者。病例群中 81% 为女性,诊断时年龄为 6-75 岁。候选模型采用了一系列架构。特征选择是半自动化的,并结合了知识图谱中的公开数据。我们的主要结果是结果分层测试集上的 F 分数:结果:最佳中心特定转诊模型的 F 分数达到了 86%-91%。最佳诊断模型的 F 分数为 92%。为了进一步测试我们的模型,我们联系了 372 名目前没有 AHP 诊断但被我们的模型预测为可能有 AHP 诊断的患者(转诊概率≥10%,测试阳性概率≥50%)。然而,我们只能招募其中的 10 名患者进行生化检测,结果全部为阴性。尽管如此,事后评估表明,这些模型可以在诊断日期之前发现 71% 的病例,节省了 1.2 年的时间:结论:ML 可以减少 AHP 和其他罕见病的诊断延误。在部署这些模型之前,还需要强有力的招募策略和多中心协调来验证它们。
{"title":"Reducing diagnostic delays in acute hepatic porphyria using health records data and machine learning.","authors":"Balu Bhasuran, Katharina Schmolly, Yuvraaj Kapoor, Nanditha Lakshmi Jayakumar, Raymond Doan, Jigar Amin, Stephen Meninger, Nathan Cheng, Robert Deering, Karl Anderson, Simon W Beaven, Bruce Wang, Vivek A Rudrapatna","doi":"10.1093/jamia/ocae141","DOIUrl":"10.1093/jamia/ocae141","url":null,"abstract":"<p><strong>Background: </strong>Acute hepatic porphyria (AHP) is a group of rare but treatable conditions associated with diagnostic delays of 15 years on average. The advent of electronic health records (EHR) data and machine learning (ML) may improve the timely recognition of rare diseases like AHP. However, prediction models can be difficult to train given the limited case numbers, unstructured EHR data, and selection biases intrinsic to healthcare delivery. We sought to train and characterize models for identifying patients with AHP.</p><p><strong>Methods: </strong>This diagnostic study used structured and notes-based EHR data from 2 centers at the University of California, UCSF (2012-2022) and UCLA (2019-2022). The data were split into 2 cohorts (referral and diagnosis) and used to develop models that predict (1) who will be referred for testing of acute porphyria, among those who presented with abdominal pain (a cardinal symptom of AHP), and (2) who will test positive, among those referred. The referral cohort consisted of 747 patients referred for testing and 99 849 contemporaneous patients who were not. The diagnosis cohort consisted of 72 confirmed AHP cases and 347 patients who tested negative. The case cohort was 81% female and 6-75 years old at the time of diagnosis. Candidate models used a range of architectures. Feature selection was semi-automated and incorporated publicly available data from knowledge graphs. Our primary outcome was the F-score on an outcome-stratified test set.</p><p><strong>Results: </strong>The best center-specific referral models achieved an F-score of 86%-91%. The best diagnosis model achieved an F-score of 92%. To further test our model, we contacted 372 current patients who lack an AHP diagnosis but were predicted by our models as potentially having it (≥10% probability of referral, ≥50% of testing positive). However, we were only able to recruit 10 of these patients for biochemical testing, all of whom were negative. Nonetheless, post hoc evaluations suggested that these models could identify 71% of cases earlier than their diagnosis date, saving 1.2 years.</p><p><strong>Conclusions: </strong>ML can reduce diagnostic delays in AHP and other rare diseases. Robust recruitment strategies and multicenter coordination will be needed to validate these models before they can be deployed.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141472084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
All of whom? Limitations encountered using All of Us Researcher Workbench in a Primary Care residents secondary data analysis research training block. 所有人?在初级保健住院医师二次数据分析研究培训模块中使用 "我们所有人 "研究员工作台遇到的限制。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-25 DOI: 10.1093/jamia/ocae162
Fred Willie Zametkin LaPolla, Marco Barber Grossi, Sharon Chen, Tai Wei Guo, Kathryn Havranek, Olivia Jebb, Minh Thu Nguyen, Sneha Panganamamula, Noah Smith, Shree Sundaresh, Jonathan Yu, Gabrielle Mayer

Objectives: The goal of this case report is to detail experiences and challenges experienced in the training of Primary Care residents in secondary analysis using All of Us Researcher Workbench. At our large, urban safety net hospital, Primary Care/Internal Medicine residents in their third year undergo a research intensive block, the Research Practicum, where they work as a team to conduct secondary data analysis on a dataset with faculty facilitation. In 2023, this research block focused on use of the All of Us Researcher Workbench for secondary data analysis.

Materials and methods: Two groups of 5 residents underwent training to access the All of Us Researcher Workbench, and each group explored available data with a faculty facilitator and generated original research questions. Two blocks of residents successfully completed their research blocks and created original presentations on "social isolation and A1C" levels and "medical discrimination and diabetes management."

Results: Departmental faculty were satisfied with the depth of learning and data exploration. In focus groups, some residents noted that for those without interest in performing research, the activity felt extraneous to their career goals, while others were glad for the opportunity to publish. In both blocks, residents highlighted dissatisfaction with the degree to which the All of Us Researcher Workbench was representative of patients they encounter in a large safety net hospital.

Discussion: Using the All of Us Researcher Workbench provided residents with an opportunity to explore novel questions in a massive data source. Many residents however noted that because the population described in the All of Us Researcher Workbench appeared to be more highly educated and less racially diverse than patients they encounter in their practice, research may be hard to generalize in a community health context. Additionally, given that the data required knowledge of 1 of 2 code-based data analysis languages (R or Python) and work within an idiosyncratic coding environment, residents were heavily reliant on a faculty facilitator to assist with analysis.

Conclusion: Using the All of Us Researcher Workbench for research training allowed residents to explore novel questions and gain first-hand exposure to opportunities and challenges in secondary data analysis.

目的:本病例报告旨在详细介绍使用 "我们所有人 "研究员工作台对初级保健住院医师进行二次分析培训的经验和挑战。在我们这家大型城市安全网医院,初级保健/内科住院医师在第三年要接受研究实习这一研究强化阶段的培训,在这一阶段,他们以团队的形式在教师的协助下对数据集进行二次数据分析。2023 年,该研究单元的重点是使用 "我们所有人 "研究员工作台进行二级数据分析:两组共 5 名住院医师接受了访问 All of Us Researcher Workbench 的培训,每组在教师的协助下探索可用数据,并提出原创研究问题。两组住院医师成功完成了他们的研究模块,并就 "社会隔离与 A1C "水平和 "医疗歧视与糖尿病管理 "发表了原创演讲:部门教师对学习和数据探索的深度表示满意。在焦点小组中,一些住院医师指出,对于那些没有兴趣从事研究的住院医师来说,这项活动感觉与他们的职业目标无关,而另一些住院医师则为有机会发表论文而感到高兴。在这两个讨论组中,住院医师们都强调了对 "我们所有人 "研究人员工作台在多大程度上代表了他们在大型安全网医院中遇到的病人的不满:讨论:使用 "我们所有人 "研究人员工作台为住院医师提供了一个在海量数据源中探索新问题的机会。然而,许多居民指出,由于 "我们所有人 "研究人员工作台中描述的人群与他们在实践中遇到的患者相比,受教育程度更高,种族多样性更少,因此研究可能难以在社区卫生环境中推广。此外,鉴于数据需要掌握 2 种基于代码的数据分析语言(R 或 Python)中的一种,并且需要在特殊的编码环境中工作,因此居民在很大程度上依赖于教师协助分析:使用 "我们所有人 "研究人员工作台进行研究培训,使住院医师能够探索新问题,并亲身体验二手数据分析的机遇和挑战。
{"title":"All of whom? Limitations encountered using All of Us Researcher Workbench in a Primary Care residents secondary data analysis research training block.","authors":"Fred Willie Zametkin LaPolla, Marco Barber Grossi, Sharon Chen, Tai Wei Guo, Kathryn Havranek, Olivia Jebb, Minh Thu Nguyen, Sneha Panganamamula, Noah Smith, Shree Sundaresh, Jonathan Yu, Gabrielle Mayer","doi":"10.1093/jamia/ocae162","DOIUrl":"https://doi.org/10.1093/jamia/ocae162","url":null,"abstract":"<p><strong>Objectives: </strong>The goal of this case report is to detail experiences and challenges experienced in the training of Primary Care residents in secondary analysis using All of Us Researcher Workbench. At our large, urban safety net hospital, Primary Care/Internal Medicine residents in their third year undergo a research intensive block, the Research Practicum, where they work as a team to conduct secondary data analysis on a dataset with faculty facilitation. In 2023, this research block focused on use of the All of Us Researcher Workbench for secondary data analysis.</p><p><strong>Materials and methods: </strong>Two groups of 5 residents underwent training to access the All of Us Researcher Workbench, and each group explored available data with a faculty facilitator and generated original research questions. Two blocks of residents successfully completed their research blocks and created original presentations on \"social isolation and A1C\" levels and \"medical discrimination and diabetes management.\"</p><p><strong>Results: </strong>Departmental faculty were satisfied with the depth of learning and data exploration. In focus groups, some residents noted that for those without interest in performing research, the activity felt extraneous to their career goals, while others were glad for the opportunity to publish. In both blocks, residents highlighted dissatisfaction with the degree to which the All of Us Researcher Workbench was representative of patients they encounter in a large safety net hospital.</p><p><strong>Discussion: </strong>Using the All of Us Researcher Workbench provided residents with an opportunity to explore novel questions in a massive data source. Many residents however noted that because the population described in the All of Us Researcher Workbench appeared to be more highly educated and less racially diverse than patients they encounter in their practice, research may be hard to generalize in a community health context. Additionally, given that the data required knowledge of 1 of 2 code-based data analysis languages (R or Python) and work within an idiosyncratic coding environment, residents were heavily reliant on a faculty facilitator to assist with analysis.</p><p><strong>Conclusion: </strong>Using the All of Us Researcher Workbench for research training allowed residents to explore novel questions and gain first-hand exposure to opportunities and challenges in secondary data analysis.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141452050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disparities in ABO blood type determination across diverse ancestries: a systematic review and validation in the All of Us Research Program. 不同血统 ABO 血型测定的差异:"我们所有人 "研究计划的系统回顾和验证。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-25 DOI: 10.1093/jamia/ocae161
Kiana L Martinez, Andrew Klein, Jennifer R Martin, Chinwuwanuju U Sampson, Jason B Giles, Madison L Beck, Krupa Bhakta, Gino Quatraro, Juvie Farol, Jason H Karnes

Objectives: ABO blood types have widespread clinical use and robust associations with disease. The purpose of this study is to evaluate the portability and suitability of tag single-nucleotide polymorphisms (tSNPs) used to determine ABO alleles and blood types across diverse populations in published literature.

Materials and methods: Bibliographic databases were searched for studies using tSNPs to determine ABO alleles. We calculated linkage between tSNPs and functional variants across inferred continental ancestry groups from 1000 Genomes. We compared r2 across ancestry and assessed real-world consequences by comparing tSNP-derived blood types to serology in a diverse population from the All of Us Research Program.

Results: Linkage between functional variants and O allele tSNPs was significantly lower in African (median r2 = 0.443) compared to East Asian (r2 = 0.946, P = 1.1 × 10-5) and European (r2 = 0.869, P = .023) populations. In All of Us, discordance between tSNP-derived blood types and serology was high across all SNPs in African ancestry individuals and linkage was strongly correlated with discordance across all ancestries (ρ = -0.90, P = 3.08 × 10-23).

Discussion: Many studies determine ABO blood types using tSNPs. However, tSNPs with low linkage disequilibrium promote misinference of ABO blood types, particularly in diverse populations. We observe common use of inappropriate tSNPs to determine ABO blood type, particularly for O alleles and with some tSNPs mistyping up to 58% of individuals.

Conclusion: Our results highlight the lack of transferability of tSNPs across ancestries and potential exacerbation of disparities in genomic research for underrepresented populations. This is especially relevant as more diverse cohorts are made publicly available.

目的:ABO 血型在临床上广泛使用,并与疾病密切相关。本研究旨在评估已发表文献中用于确定不同人群 ABO 等位基因和血型的标记单核苷酸多态性(tSNPs)的可移植性和适用性:我们在文献数据库中搜索了使用 tSNPs 确定 ABO 等位基因的研究。我们计算了从 1000 个基因组中推断出的大陆祖先群体中 tSNPs 与功能变异之间的联系。我们比较了不同祖先的 r2,并通过比较 tSNP 导出的血型与 "我们所有人研究计划 "中不同人群的血清学来评估现实世界的后果:结果:与东亚人(r2 = 0.946,P = 1.1 × 10-5)和欧洲人(r2 = 0.869,P = .023)相比,非洲人(中位数 r2 = 0.443)的功能变异和 O 等位基因 tSNPs 之间的联系明显较低。在 "我们所有人 "中,在非洲血统个体的所有 SNPs 上,tSNP 导出的血型与血清学之间的不一致性都很高,而且连接与所有血统的不一致性密切相关(ρ = -0.90,P = 3.08 × 10-23):讨论:许多研究利用 tSNPs 确定 ABO 血型。然而,低连锁不平衡的 tSNPs 会导致 ABO 血型的错误推断,尤其是在不同的人群中。我们观察到使用不恰当的 tSNPs 来确定 ABO 血型的情况很普遍,尤其是 O 等位基因,有些 tSNPs 可误判高达 58% 的个体:我们的研究结果凸显了 tSNPs 缺乏跨血统的可转移性,可能会加剧代表性不足人群在基因组研究中的不平等。随着更多不同的队列被公开,这一点尤为重要。
{"title":"Disparities in ABO blood type determination across diverse ancestries: a systematic review and validation in the All of Us Research Program.","authors":"Kiana L Martinez, Andrew Klein, Jennifer R Martin, Chinwuwanuju U Sampson, Jason B Giles, Madison L Beck, Krupa Bhakta, Gino Quatraro, Juvie Farol, Jason H Karnes","doi":"10.1093/jamia/ocae161","DOIUrl":"https://doi.org/10.1093/jamia/ocae161","url":null,"abstract":"<p><strong>Objectives: </strong>ABO blood types have widespread clinical use and robust associations with disease. The purpose of this study is to evaluate the portability and suitability of tag single-nucleotide polymorphisms (tSNPs) used to determine ABO alleles and blood types across diverse populations in published literature.</p><p><strong>Materials and methods: </strong>Bibliographic databases were searched for studies using tSNPs to determine ABO alleles. We calculated linkage between tSNPs and functional variants across inferred continental ancestry groups from 1000 Genomes. We compared r2 across ancestry and assessed real-world consequences by comparing tSNP-derived blood types to serology in a diverse population from the All of Us Research Program.</p><p><strong>Results: </strong>Linkage between functional variants and O allele tSNPs was significantly lower in African (median r2 = 0.443) compared to East Asian (r2 = 0.946, P = 1.1 × 10-5) and European (r2 = 0.869, P = .023) populations. In All of Us, discordance between tSNP-derived blood types and serology was high across all SNPs in African ancestry individuals and linkage was strongly correlated with discordance across all ancestries (ρ = -0.90, P = 3.08 × 10-23).</p><p><strong>Discussion: </strong>Many studies determine ABO blood types using tSNPs. However, tSNPs with low linkage disequilibrium promote misinference of ABO blood types, particularly in diverse populations. We observe common use of inappropriate tSNPs to determine ABO blood type, particularly for O alleles and with some tSNPs mistyping up to 58% of individuals.</p><p><strong>Conclusion: </strong>Our results highlight the lack of transferability of tSNPs across ancestries and potential exacerbation of disparities in genomic research for underrepresented populations. This is especially relevant as more diverse cohorts are made publicly available.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141452052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pneumonia diagnosis performance in the emergency department: a mixed-methods study about clinicians' experiences and exploration of individual differences and response to diagnostic performance feedback. 急诊科肺炎诊断表现:一项关于临床医生经验的混合方法研究,探讨个体差异和对诊断表现反馈的反应。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-20 DOI: 10.1093/jamia/ocae112
Jorie M Butler, Teresa Taft, Peter Taber, Elizabeth Rutter, Megan Fix, Alden Baker, Charlene Weir, McKenna Nevers, David Classen, Karen Cosby, Makoto Jones, Alec Chapman, Barbara E Jones

Objectives: We sought to (1) characterize the process of diagnosing pneumonia in an emergency department (ED) and (2) examine clinician reactions to a clinician-facing diagnostic discordance feedback tool.

Materials and methods: We designed a diagnostic feedback tool, using electronic health record data from ED clinicians' patients to establish concordance or discordance between ED diagnosis, radiology reports, and hospital discharge diagnosis for pneumonia. We conducted semistructured interviews with 11 ED clinicians about pneumonia diagnosis and reactions to the feedback tool. We administered surveys measuring individual differences in mindset beliefs, comfort with feedback, and feedback tool usability. We qualitatively analyzed interview transcripts and descriptively analyzed survey data.

Results: Thematic results revealed: (1) the diagnostic process for pneumonia in the ED is characterized by diagnostic uncertainty and may be secondary to goals to treat and dispose the patient; (2) clinician diagnostic self-evaluation is a fragmented, inconsistent process of case review and follow-up that a feedback tool could fill; (3) the feedback tool was described favorably, with task and normative feedback harnessing clinician values of high-quality patient care and personal excellence; and (4) strong reactions to diagnostic feedback varied from implicit trust to profound skepticism about the validity of the concordance metric. Survey results suggested a relationship between clinicians' individual differences in learning and failure beliefs, feedback experience, and usability ratings.

Discussion and conclusion: Clinicians value feedback on pneumonia diagnoses. Our results highlight the importance of feedback about diagnostic performance and suggest directions for considering individual differences in feedback tool design and implementation.

目的:我们试图(1)描述急诊科诊断肺炎的过程;(2)研究临床医生对诊断不一致反馈工具的反应:我们试图(1)描述急诊科(ED)诊断肺炎的过程;(2)研究临床医生对面向临床医生的诊断不一致反馈工具的反应:我们设计了一种诊断反馈工具,利用急诊科临床医生的患者电子健康记录数据来确定急诊科诊断、放射学报告和出院诊断之间的一致性或不一致性。我们就肺炎诊断和对反馈工具的反应对 11 名急诊室临床医生进行了半结构化访谈。我们进行了问卷调查,测量个体在心态信念、反馈舒适度和反馈工具可用性方面的差异。我们对访谈记录进行了定性分析,并对调查数据进行了描述性分析:主题结果显示:(1) 急诊室肺炎诊断过程的特点是诊断不确定性,而且可能是治疗和处置患者的次要目标;(2) 临床医生的诊断自我评估是一个零散、不一致的病例回顾和随访过程,而反馈工具可以填补这一空白;(3) 对反馈工具的描述是好的,任务和规范反馈利用了临床医生对高质量患者护理和个人卓越的价值观;(4) 对诊断反馈的强烈反应从隐性信任到对一致性指标有效性的深刻怀疑不等。调查结果表明,临床医生在学习和失败信念、反馈经验和可用性评级方面存在个体差异:临床医生重视肺炎诊断的反馈。我们的研究结果强调了诊断结果反馈的重要性,并提出了在设计和实施反馈工具时考虑个体差异的方向。
{"title":"Pneumonia diagnosis performance in the emergency department: a mixed-methods study about clinicians' experiences and exploration of individual differences and response to diagnostic performance feedback.","authors":"Jorie M Butler, Teresa Taft, Peter Taber, Elizabeth Rutter, Megan Fix, Alden Baker, Charlene Weir, McKenna Nevers, David Classen, Karen Cosby, Makoto Jones, Alec Chapman, Barbara E Jones","doi":"10.1093/jamia/ocae112","DOIUrl":"10.1093/jamia/ocae112","url":null,"abstract":"<p><strong>Objectives: </strong>We sought to (1) characterize the process of diagnosing pneumonia in an emergency department (ED) and (2) examine clinician reactions to a clinician-facing diagnostic discordance feedback tool.</p><p><strong>Materials and methods: </strong>We designed a diagnostic feedback tool, using electronic health record data from ED clinicians' patients to establish concordance or discordance between ED diagnosis, radiology reports, and hospital discharge diagnosis for pneumonia. We conducted semistructured interviews with 11 ED clinicians about pneumonia diagnosis and reactions to the feedback tool. We administered surveys measuring individual differences in mindset beliefs, comfort with feedback, and feedback tool usability. We qualitatively analyzed interview transcripts and descriptively analyzed survey data.</p><p><strong>Results: </strong>Thematic results revealed: (1) the diagnostic process for pneumonia in the ED is characterized by diagnostic uncertainty and may be secondary to goals to treat and dispose the patient; (2) clinician diagnostic self-evaluation is a fragmented, inconsistent process of case review and follow-up that a feedback tool could fill; (3) the feedback tool was described favorably, with task and normative feedback harnessing clinician values of high-quality patient care and personal excellence; and (4) strong reactions to diagnostic feedback varied from implicit trust to profound skepticism about the validity of the concordance metric. Survey results suggested a relationship between clinicians' individual differences in learning and failure beliefs, feedback experience, and usability ratings.</p><p><strong>Discussion and conclusion: </strong>Clinicians value feedback on pneumonia diagnoses. Our results highlight the importance of feedback about diagnostic performance and suggest directions for considering individual differences in feedback tool design and implementation.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1503-1513"},"PeriodicalIF":4.7,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11187426/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141155741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning with noisy labels in medical prediction problems: a scoping review. 医疗预测问题中的噪声标签深度学习:范围综述。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-20 DOI: 10.1093/jamia/ocae108
Yishu Wei, Yu Deng, Cong Sun, Mingquan Lin, Hongmei Jiang, Yifan Peng

Objectives: Medical research faces substantial challenges from noisy labels attributed to factors like inter-expert variability and machine-extracted labels. Despite this, the adoption of label noise management remains limited, and label noise is largely ignored. To this end, there is a critical need to conduct a scoping review focusing on the problem space. This scoping review aims to comprehensively review label noise management in deep learning-based medical prediction problems, which includes label noise detection, label noise handling, and evaluation. Research involving label uncertainty is also included.

Methods: Our scoping review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We searched 4 databases, including PubMed, IEEE Xplore, Google Scholar, and Semantic Scholar. Our search terms include "noisy label AND medical/healthcare/clinical," "uncertainty AND medical/healthcare/clinical," and "noise AND medical/healthcare/clinical."

Results: A total of 60 papers met inclusion criteria between 2016 and 2023. A series of practical questions in medical research are investigated. These include the sources of label noise, the impact of label noise, the detection of label noise, label noise handling techniques, and their evaluation. Categorization of both label noise detection methods and handling techniques are provided.

Discussion: From a methodological perspective, we observe that the medical community has been up to date with the broader deep-learning community, given that most techniques have been evaluated on medical data. We recommend considering label noise as a standard element in medical research, even if it is not dedicated to handling noisy labels. Initial experiments can start with easy-to-implement methods, such as noise-robust loss functions, weighting, and curriculum learning.

目的:医学研究面临着由专家间差异和机器提取标签等因素造成的标签噪声带来的巨大挑战。尽管如此,标签噪声管理的应用仍然有限,标签噪声在很大程度上被忽视。为此,亟需对问题空间进行范围界定。本范围综述旨在全面综述基于深度学习的医疗预测问题中的标签噪声管理,包括标签噪声检测、标签噪声处理和评估。涉及标签不确定性的研究也包括在内:我们的范围界定综述遵循系统综述和荟萃分析首选报告项目(PRISMA)指南。我们检索了 4 个数据库,包括 PubMed、IEEE Xplore、Google Scholar 和 Semantic Scholar。检索词包括 "噪声标签和医疗/保健/临床"、"不确定性和医疗/保健/临床 "以及 "噪声和医疗/保健/临床":2016年至2023年间共有60篇论文符合纳入标准。研究了医学研究中的一系列实际问题。这些问题包括标签噪声的来源、标签噪声的影响、标签噪声的检测、标签噪声处理技术及其评估。对标签噪声检测方法和处理技术进行了分类:从方法论的角度来看,我们发现医学界已经跟上了更广泛的深度学习界的步伐,因为大多数技术已经在医学数据上进行了评估。我们建议将标签噪声视为医学研究的一个标准要素,即使不是专门处理噪声标签。初始实验可以从易于实施的方法入手,例如抗噪损失函数、加权和课程学习。
{"title":"Deep learning with noisy labels in medical prediction problems: a scoping review.","authors":"Yishu Wei, Yu Deng, Cong Sun, Mingquan Lin, Hongmei Jiang, Yifan Peng","doi":"10.1093/jamia/ocae108","DOIUrl":"10.1093/jamia/ocae108","url":null,"abstract":"<p><strong>Objectives: </strong>Medical research faces substantial challenges from noisy labels attributed to factors like inter-expert variability and machine-extracted labels. Despite this, the adoption of label noise management remains limited, and label noise is largely ignored. To this end, there is a critical need to conduct a scoping review focusing on the problem space. This scoping review aims to comprehensively review label noise management in deep learning-based medical prediction problems, which includes label noise detection, label noise handling, and evaluation. Research involving label uncertainty is also included.</p><p><strong>Methods: </strong>Our scoping review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We searched 4 databases, including PubMed, IEEE Xplore, Google Scholar, and Semantic Scholar. Our search terms include \"noisy label AND medical/healthcare/clinical,\" \"uncertainty AND medical/healthcare/clinical,\" and \"noise AND medical/healthcare/clinical.\"</p><p><strong>Results: </strong>A total of 60 papers met inclusion criteria between 2016 and 2023. A series of practical questions in medical research are investigated. These include the sources of label noise, the impact of label noise, the detection of label noise, label noise handling techniques, and their evaluation. Categorization of both label noise detection methods and handling techniques are provided.</p><p><strong>Discussion: </strong>From a methodological perspective, we observe that the medical community has been up to date with the broader deep-learning community, given that most techniques have been evaluated on medical data. We recommend considering label noise as a standard element in medical research, even if it is not dedicated to handling noisy labels. Initial experiments can start with easy-to-implement methods, such as noise-robust loss functions, weighting, and curriculum learning.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1596-1607"},"PeriodicalIF":4.7,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11187424/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141174561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the American Medical Informatics Association
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1