首页 > 最新文献

Journal of the American Medical Informatics Association最新文献

英文 中文
An evaluation of the All of Us Research Program database to examine cumulative stress. 对 "我们所有人 "研究计划数据库进行评估,以检查累积压力。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-26 DOI: 10.1093/jamia/ocae201
Shawna Beese, Demetrius A Abshire, Trey L DeJong, Jason T Carbone

Objectives: To evaluate the NIH All of Us Research Program database as a potential data source for studying allostatic load and stress among adults in the United States (US).

Materials and methods: We evaluated the All of Us database to determine sample size significance for original-10 allostatic load biomarkers, Allostatic Load Index-5 (ALI-5), Allostatic Load Five, and Cohen's Perceived Stress Scale (PSS). We conducted a priori, post hoc, and sensitivity power analyses to determine sample sizes for conducting null hypothesis significance tests.

Results: The maximum number of responses available for each measure is 21 participants for the original-10 allostatic load biomarkers, 150 for the ALI-5, 22 476 for Allostatic Load Five, and n = 90 583 for the PSS.

Discussion: The NIH All of Us Research Program is well-suited for studying allostatic load using the Allostatic Load Five and psychological stress using PSS.

Conclusion: Improving biomarker data collection in All of Us will facilitate more nuanced examinations of allostatic load among US adults.

目的:评估美国国立卫生研究院(NIH)"我们所有人 "研究计划数据库作为研究美国成年人异质负荷和压力的潜在数据源的价值:评估美国国立卫生研究院(NIH)"我们所有人 "研究计划数据库,将其作为研究美国成年人的静态负荷和压力的潜在数据源:我们对 "我们所有人 "数据库进行了评估,以确定原有的 10 个静态负荷生物标志物、静态负荷指数-5 (ALI-5)、静态负荷五项和 Cohen 感知压力量表 (PSS) 的样本大小。我们进行了先验、事后和敏感性功率分析,以确定进行虚假假设显著性检验的样本量:结果:对于最初的 10 种静态负荷生物标志物,每种测量方法的最大响应人数为 21 人;对于 ALI-5 测量方法,最大响应人数为 150 人;对于 Allostatic Load Five 测量方法,最大响应人数为 22 476 人;对于 PSS 测量方法,最大响应人数为 90 583 人:讨论:美国国立卫生研究院的 "我们所有人 "研究计划非常适合使用 "静态负荷五项 "来研究静态负荷,使用 PSS 来研究心理压力:结论:改进 "我们所有人 "项目的生物标志物数据收集工作将有助于对美国成年人的静态负荷进行更细致的研究。
{"title":"An evaluation of the All of Us Research Program database to examine cumulative stress.","authors":"Shawna Beese, Demetrius A Abshire, Trey L DeJong, Jason T Carbone","doi":"10.1093/jamia/ocae201","DOIUrl":"https://doi.org/10.1093/jamia/ocae201","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the NIH All of Us Research Program database as a potential data source for studying allostatic load and stress among adults in the United States (US).</p><p><strong>Materials and methods: </strong>We evaluated the All of Us database to determine sample size significance for original-10 allostatic load biomarkers, Allostatic Load Index-5 (ALI-5), Allostatic Load Five, and Cohen's Perceived Stress Scale (PSS). We conducted a priori, post hoc, and sensitivity power analyses to determine sample sizes for conducting null hypothesis significance tests.</p><p><strong>Results: </strong>The maximum number of responses available for each measure is 21 participants for the original-10 allostatic load biomarkers, 150 for the ALI-5, 22 476 for Allostatic Load Five, and n = 90 583 for the PSS.</p><p><strong>Discussion: </strong>The NIH All of Us Research Program is well-suited for studying allostatic load using the Allostatic Load Five and psychological stress using PSS.</p><p><strong>Conclusion: </strong>Improving biomarker data collection in All of Us will facilitate more nuanced examinations of allostatic load among US adults.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141767874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Engagement with health research summaries via digital communication to All of Us participants. 通过向 "我们所有人 "参与者提供数字通信,让他们参与健康研究摘要。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-25 DOI: 10.1093/jamia/ocae185
Janna Ter Meer, Royan Kamyar, Christina Orlovsky, Ting-Yang Hung, Tamara Benrey, Ethan Dinh-Luong, Giorgio Quer, Julia Moore Vogel

Objective: Summaries of health research can be a complementary way to return value to participants. We assess how research participants engage with summaries via email communication and how this can be improved.

Materials and methods: We look at correlations between demographic subgroups and engagement in a longitudinal dataset of 305 626 participants (77% are classified as underrepresented in biomedical research) from the All of Us Research Program. We compare this against engagement with other program communications and use impact evaluations (N = 421 510) to measure the effect of tailoring communication by (1) eliciting content preferences, (2) Spanish focused content, (3) informational videos, and (4) article content in the email subject line.

Results: Between March 2020 and October 2021, research summaries reached 67% of enrolled participants, outperforming other program communication (60%) and return of results (31%), which have a high uptake rate but have been extended to a subset of eligible participants. While all demographic subgroups engage with research summaries, participants with higher income, educational attainment, White, and older than 45 years open and click content most often. Surfacing article content in the email subject line and Spanish focused content had negative effects on engagement. Video and social media content and eliciting preferences led to a small directional increase in clicks.

Discussion: Further individualization of tailoring efforts may be needed to drive larger engagement effects (eg, delivering multiple articles in line with stated preferences, expanding preference options). Our findings are likely a conservative representation of engagement effects, given the coarseness of our click rate measure.

Conclusions: Health research summaries show promise as a way to return value to research participants, especially if individual-level results cannot be returned. Personalization of communication requires testing to determine whether efforts are having the expected effect.

目的:健康研究摘要可以作为一种补充方式,为参与者提供价值回报。我们评估了研究参与者如何通过电子邮件交流参与摘要,以及如何改进这种方式:我们从 "我们所有人 "研究计划的 305 626 名参与者(其中 77% 被归类为生物医学研究中代表性不足的人)的纵向数据集中研究了人口统计亚群与参与度之间的相关性。我们将其与其他项目交流的参与情况进行比较,并利用影响评估(N = 421 510)来衡量通过以下方式定制交流的效果:(1)激发内容偏好;(2)以西班牙语为重点的内容;(3)信息视频;以及(4)电子邮件主题行中的文章内容:在 2020 年 3 月至 2021 年 10 月期间,67% 的注册参与者收到了研究摘要,超过了其他项目宣传(60%)和结果返还(31%),后者的接受率较高,但仅限于一部分符合条件的参与者。虽然所有人口亚群都参与了研究摘要,但收入较高、受教育程度较高、白人和 45 岁以上的参与者打开和点击内容的频率最高。在电子邮件主题行中出现文章内容和以西班牙语为重点的内容对参与度有负面影响。视频和社交媒体内容以及征询偏好会使点击率有小幅上升:讨论:要提高参与度,可能还需要进一步个性化定制(例如,根据既定偏好提供多篇文章,扩大偏好选项)。鉴于我们的点击率衡量标准比较粗略,我们的研究结果很可能只是对参与效果的保守表述:健康研究摘要有望成为一种向研究参与者回报价值的方式,尤其是在无法回报个人层面结果的情况下。个性化交流需要进行测试,以确定是否达到了预期效果。
{"title":"Engagement with health research summaries via digital communication to All of Us participants.","authors":"Janna Ter Meer, Royan Kamyar, Christina Orlovsky, Ting-Yang Hung, Tamara Benrey, Ethan Dinh-Luong, Giorgio Quer, Julia Moore Vogel","doi":"10.1093/jamia/ocae185","DOIUrl":"https://doi.org/10.1093/jamia/ocae185","url":null,"abstract":"<p><strong>Objective: </strong>Summaries of health research can be a complementary way to return value to participants. We assess how research participants engage with summaries via email communication and how this can be improved.</p><p><strong>Materials and methods: </strong>We look at correlations between demographic subgroups and engagement in a longitudinal dataset of 305 626 participants (77% are classified as underrepresented in biomedical research) from the All of Us Research Program. We compare this against engagement with other program communications and use impact evaluations (N = 421 510) to measure the effect of tailoring communication by (1) eliciting content preferences, (2) Spanish focused content, (3) informational videos, and (4) article content in the email subject line.</p><p><strong>Results: </strong>Between March 2020 and October 2021, research summaries reached 67% of enrolled participants, outperforming other program communication (60%) and return of results (31%), which have a high uptake rate but have been extended to a subset of eligible participants. While all demographic subgroups engage with research summaries, participants with higher income, educational attainment, White, and older than 45 years open and click content most often. Surfacing article content in the email subject line and Spanish focused content had negative effects on engagement. Video and social media content and eliciting preferences led to a small directional increase in clicks.</p><p><strong>Discussion: </strong>Further individualization of tailoring efforts may be needed to drive larger engagement effects (eg, delivering multiple articles in line with stated preferences, expanding preference options). Our findings are likely a conservative representation of engagement effects, given the coarseness of our click rate measure.</p><p><strong>Conclusions: </strong>Health research summaries show promise as a way to return value to research participants, especially if individual-level results cannot be returned. Personalization of communication requires testing to determine whether efforts are having the expected effect.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141762142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
allofus: an R package to facilitate use of the All of Us Researcher Workbench. allofus:方便使用 "全民研究员工作台 "的 R 软件包。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-24 DOI: 10.1093/jamia/ocae198
Louisa H Smith, Robert Cavanaugh

Objectives: Despite easy-to-use tools like the Cohort Builder, using All of Us Research Program data for complex research questions requires a relatively high level of technical expertise. We aimed to increase research and training capacity and reduce barriers to entry for the All of Us community through an R package, allofus. In this article, we describe functions that address common challenges we encountered while working with All of Us Research Program data, and we demonstrate this functionality with an example of creating a cohort of All of Us participants by synthesizing electronic health record and survey data with time dependencies.

Target audience: All of Us Research Program data are widely available to health researchers. The allofus R package is aimed at a wide range of researchers who wish to conduct complex analyses using best practices for reproducibility and transparency, and who have a range of experience using R. Because the All of Us data are transformed into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), researchers familiar with existing OMOP CDM tools or who wish to conduct network studies in conjunction with other OMOP CDM data will also find value in the package.

Scope: We developed an initial set of functions that solve problems we experienced across survey and electronic health record data in our own research and in mentoring student projects. The package will continue to grow and develop with the All of Us Research Program. The allofus R package can help build community research capacity by increasing access to the All of Us Research Program data, the efficiency of its use, and the rigor and reproducibility of the resulting research.

目标:尽管有队列生成器等易于使用的工具,但使用 "我们所有人 "研究计划的数据来解决复杂的研究问题需要相对较高的专业技术水平。我们的目标是通过 R 软件包 allofus 提高研究和培训能力,减少 "我们所有人 "社区的准入门槛。在本文中,我们将介绍一些功能,这些功能可解决我们在使用我们所有人研究计划数据时遇到的常见难题,我们还将以通过综合电子健康记录和调查数据来创建我们所有人参与者队列的例子来演示这些功能:我们所有人研究计划的数据可供健康研究人员广泛使用。allofus R 软件包的目标受众是希望使用可重复性和透明度方面的最佳实践进行复杂分析,并具有一定 R 使用经验的广大研究人员。由于 All of Us 数据已转化为观察性医疗结果合作组织通用数据模型(OMOP CDM),因此熟悉现有 OMOP CDM 工具或希望结合其他 OMOP CDM 数据进行网络研究的研究人员也会发现该软件包的价值:我们开发了一套初步功能,以解决我们在自己的研究和指导学生项目中遇到的调查和电子健康记录数据问题。该软件包将随着 "我们所有人 "研究计划继续成长和发展。allofus R 软件包可以提高对 "我们所有人研究计划 "数据的访问、使用效率以及研究的严谨性和可重复性,从而帮助提高社区研究能力。
{"title":"allofus: an R package to facilitate use of the All of Us Researcher Workbench.","authors":"Louisa H Smith, Robert Cavanaugh","doi":"10.1093/jamia/ocae198","DOIUrl":"https://doi.org/10.1093/jamia/ocae198","url":null,"abstract":"<p><strong>Objectives: </strong>Despite easy-to-use tools like the Cohort Builder, using All of Us Research Program data for complex research questions requires a relatively high level of technical expertise. We aimed to increase research and training capacity and reduce barriers to entry for the All of Us community through an R package, allofus. In this article, we describe functions that address common challenges we encountered while working with All of Us Research Program data, and we demonstrate this functionality with an example of creating a cohort of All of Us participants by synthesizing electronic health record and survey data with time dependencies.</p><p><strong>Target audience: </strong>All of Us Research Program data are widely available to health researchers. The allofus R package is aimed at a wide range of researchers who wish to conduct complex analyses using best practices for reproducibility and transparency, and who have a range of experience using R. Because the All of Us data are transformed into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), researchers familiar with existing OMOP CDM tools or who wish to conduct network studies in conjunction with other OMOP CDM data will also find value in the package.</p><p><strong>Scope: </strong>We developed an initial set of functions that solve problems we experienced across survey and electronic health record data in our own research and in mentoring student projects. The package will continue to grow and develop with the All of Us Research Program. The allofus R package can help build community research capacity by increasing access to the All of Us Research Program data, the efficiency of its use, and the rigor and reproducibility of the resulting research.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pregnancy episodes in All of Us: harnessing multi-source data for pregnancy-related research. 我们所有人的妊娠事件:利用多源数据开展与妊娠有关的研究。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-24 DOI: 10.1093/jamia/ocae195
Louisa H Smith, Wanjiang Wang, Brianna Keefe-Oates

Objectives: The National Institutes of Health's All of Us Research Program addresses gaps in biomedical research by collecting health data from diverse populations. Pregnant individuals have historically been underrepresented in biomedical research, and pregnancy-related research is often limited by data availability, sample size, and inadequate representation of the diversity of pregnant people. All of Us integrates a wealth of health-related data, providing a unique opportunity to conduct comprehensive pregnancy-related research. We aimed to identify pregnancy episodes with high-quality electronic health record (EHR) data in All of Us Research Program data and evaluate the program's utility for pregnancy-related research.

Materials and methods: We used a previously published algorithm to identify pregnancy episodes in All of Us EHR data. We described these pregnancies, validated them with All of Us survey data, and compared them to national statistics.

Results: Our study identified 18 970 pregnancy episodes from 14 234 participants; other possible pregnancy episodes had low-quality or insufficient data. Validation against people who reported a current pregnancy on an All of Us survey found low false positive and negative rates. Demographics were similar in some respects to national data; however, Asian-Americans were underrepresented, and older, highly educated pregnant people were overrepresented.

Discussion: Our approach demonstrates the capacity of All of Us to support pregnancy research and reveals the diversity of the pregnancy cohort. However, we noted an underrepresentation among some demographics. Other limitations include measurement error in gestational age and limited data on non-live births.

Conclusion: The wide variety of data in the All of Us program, encompassing EHR, survey, genomic, and fitness tracker data, offers a valuable resource for studying pregnancy, yet care must be taken to avoid biases.

目标:美国国立卫生研究院的 "我们所有人研究计划 "通过收集不同人群的健康数据来弥补生物医学研究的不足。孕妇在生物医学研究中的代表性历来不足,与怀孕相关的研究往往受到数据可用性、样本大小以及孕妇多样性代表性不足等因素的限制。我们所有人》整合了丰富的健康相关数据,为开展全面的孕期相关研究提供了独特的机会。我们的目的是在 "我们所有人 "研究计划的数据中识别具有高质量电子健康记录(EHR)数据的妊娠事件,并评估该计划在妊娠相关研究中的实用性:我们使用之前发布的算法来识别 All of Us 电子病历数据中的妊娠事件。我们对这些妊娠进行了描述,用 All of Us 调查数据对其进行了验证,并将其与国家统计数据进行了比较:我们的研究从 14 234 名参与者中识别出了 18 970 次怀孕事件;其他可能的怀孕事件数据质量较低或不足。与在 "我们所有人 "调查中报告当前怀孕的人进行验证后发现,假阳性率和假阴性率都很低。人口统计学在某些方面与全国数据相似;但是,亚裔美国人的比例偏低,年龄较大、受过高等教育的孕妇比例偏高:讨论:我们的方法展示了 "我们所有人 "支持孕期研究的能力,并揭示了孕期人群的多样性。然而,我们也注意到某些人口统计中存在代表性不足的情况。其他限制还包括胎龄测量误差和非活产数据有限:结论:All of Us 计划中的数据种类繁多,包括电子病历、调查、基因组和健身追踪器数据,为研究妊娠提供了宝贵的资源,但必须注意避免偏差。
{"title":"Pregnancy episodes in All of Us: harnessing multi-source data for pregnancy-related research.","authors":"Louisa H Smith, Wanjiang Wang, Brianna Keefe-Oates","doi":"10.1093/jamia/ocae195","DOIUrl":"https://doi.org/10.1093/jamia/ocae195","url":null,"abstract":"<p><strong>Objectives: </strong>The National Institutes of Health's All of Us Research Program addresses gaps in biomedical research by collecting health data from diverse populations. Pregnant individuals have historically been underrepresented in biomedical research, and pregnancy-related research is often limited by data availability, sample size, and inadequate representation of the diversity of pregnant people. All of Us integrates a wealth of health-related data, providing a unique opportunity to conduct comprehensive pregnancy-related research. We aimed to identify pregnancy episodes with high-quality electronic health record (EHR) data in All of Us Research Program data and evaluate the program's utility for pregnancy-related research.</p><p><strong>Materials and methods: </strong>We used a previously published algorithm to identify pregnancy episodes in All of Us EHR data. We described these pregnancies, validated them with All of Us survey data, and compared them to national statistics.</p><p><strong>Results: </strong>Our study identified 18 970 pregnancy episodes from 14 234 participants; other possible pregnancy episodes had low-quality or insufficient data. Validation against people who reported a current pregnancy on an All of Us survey found low false positive and negative rates. Demographics were similar in some respects to national data; however, Asian-Americans were underrepresented, and older, highly educated pregnant people were overrepresented.</p><p><strong>Discussion: </strong>Our approach demonstrates the capacity of All of Us to support pregnancy research and reveals the diversity of the pregnancy cohort. However, we noted an underrepresentation among some demographics. Other limitations include measurement error in gestational age and limited data on non-live births.</p><p><strong>Conclusion: </strong>The wide variety of data in the All of Us program, encompassing EHR, survey, genomic, and fitness tracker data, offers a valuable resource for studying pregnancy, yet care must be taken to avoid biases.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-based estimation of individual-level social determinants of health and its applications in All of Us. 基于模型的个人健康社会决定因素估算及其在《我们大家》中的应用。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-14 DOI: 10.1093/jamia/ocae168
Bo Young Kim, Rebecca Anthopolos, Hyungrok Do, Judy Zhong

Objectives: We introduce a widely applicable model-based approach for estimating individual-level Social Determinants of Health (SDoH) and evaluate its effectiveness using the All of Us Research Program.

Materials and methods: Our approach utilizes aggregated SDoH datasets to estimate individual-level SDoH, demonstrated with examples of no high school diploma (NOHSDP) and no health insurance (UNINSUR) variables. Models are estimated using American Community Survey data and applied to derive individual-level estimates for All of Us participants. We assess concordance between model-based SDoH estimates and self-reported SDoHs in All of Us and examine associations with undiagnosed hypertension and diabetes.

Results: Compared to self-reported SDoHs, the area under the curve for NOHSDP is 0.727 (95% CI, 0.724-0.730) and for UNINSUR is 0.730 (95% CI, 0.727-0.733) among the 329 074 All of Us participants, both significantly higher than aggregated SDoHs. The association between model-based NOHSDP and undiagnosed hypertension is concordant with those estimated using self-reported NOHSDP, with a correlation coefficient of 0.649. Similarly, the association between model-based NOHSDP and undiagnosed diabetes is concordant with those estimated using self-reported NOHSDP, with a correlation coefficient of 0.900.

Discussion and conclusion: The model-based SDoH estimation method offers a scalable and easily standardized approach for estimating individual-level SDoHs. Using the All of Us dataset, we demonstrate reasonable concordance between model-based SDoH estimates and self-reported SDoHs, along with consistent associations with health outcomes. Our findings also underscore the critical role of geographic contexts in SDoH estimation and in evaluating the association between SDoHs and health outcomes.

目的:我们介绍了一种广泛适用的基于模型的方法,用于估算个人层面的社会健康决定因素(SDoH),并利用 "我们所有人 "研究计划评估其有效性:我们介绍了一种广泛适用的基于模型的方法,用于估算个人层面的健康社会决定因素(SDoH),并利用 "我们所有人 "研究计划对其有效性进行了评估:我们的方法利用汇总的 SDoH 数据集来估算个人层面的 SDoH,并以无高中文凭(NOHSDP)和无医疗保险(UNINSUR)变量为例进行演示。我们使用美国社区调查数据对模型进行了估算,并将其应用于推导 "我们所有人 "参与者的个人水平估算值。我们评估了基于模型的 SDoH 估计值与 "我们所有人 "中自我报告的 SDoH 之间的一致性,并研究了与未确诊的高血压和糖尿病之间的关联:在 329074 名 All of Us 参与者中,与自我报告的 SDoHs 相比,NOHSDP 的曲线下面积为 0.727(95% CI,0.724-0.730),UNINSUR 的曲线下面积为 0.730(95% CI,0.727-0.733),均显著高于综合 SDoHs。基于模型的 NOHSDP 与未确诊高血压之间的相关性与使用自我报告的 NOHSDP 估算的相关性一致,相关系数为 0.649。同样,基于模型的 NOHSDP 与未确诊糖尿病之间的相关性与使用自我报告的 NOHSDP 估算的相关性一致,相关系数为 0.900:基于模型的 SDoH 估算方法为估算个人层面的 SDoH 提供了一种可扩展且易于标准化的方法。利用 "我们所有人 "数据集,我们证明了基于模型的 SDoH 估算值与自我报告的 SDoH 之间的合理一致性,以及与健康结果之间的一致关联。我们的研究结果还强调了地理环境在 SDoH 估算以及 SDoH 与健康结果之间关联评估中的关键作用。
{"title":"Model-based estimation of individual-level social determinants of health and its applications in All of Us.","authors":"Bo Young Kim, Rebecca Anthopolos, Hyungrok Do, Judy Zhong","doi":"10.1093/jamia/ocae168","DOIUrl":"10.1093/jamia/ocae168","url":null,"abstract":"<p><strong>Objectives: </strong>We introduce a widely applicable model-based approach for estimating individual-level Social Determinants of Health (SDoH) and evaluate its effectiveness using the All of Us Research Program.</p><p><strong>Materials and methods: </strong>Our approach utilizes aggregated SDoH datasets to estimate individual-level SDoH, demonstrated with examples of no high school diploma (NOHSDP) and no health insurance (UNINSUR) variables. Models are estimated using American Community Survey data and applied to derive individual-level estimates for All of Us participants. We assess concordance between model-based SDoH estimates and self-reported SDoHs in All of Us and examine associations with undiagnosed hypertension and diabetes.</p><p><strong>Results: </strong>Compared to self-reported SDoHs, the area under the curve for NOHSDP is 0.727 (95% CI, 0.724-0.730) and for UNINSUR is 0.730 (95% CI, 0.727-0.733) among the 329 074 All of Us participants, both significantly higher than aggregated SDoHs. The association between model-based NOHSDP and undiagnosed hypertension is concordant with those estimated using self-reported NOHSDP, with a correlation coefficient of 0.649. Similarly, the association between model-based NOHSDP and undiagnosed diabetes is concordant with those estimated using self-reported NOHSDP, with a correlation coefficient of 0.900.</p><p><strong>Discussion and conclusion: </strong>The model-based SDoH estimation method offers a scalable and easily standardized approach for estimating individual-level SDoHs. Using the All of Us dataset, we demonstrate reasonable concordance between model-based SDoH estimates and self-reported SDoHs, along with consistent associations with health outcomes. Our findings also underscore the critical role of geographic contexts in SDoH estimation and in evaluating the association between SDoHs and health outcomes.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research to classrooms: a co-designed curriculum brings All of Us data to secondary schools. 将研究带入课堂:共同设计的课程将 "我们所有人 "的数据带入中学。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-09 DOI: 10.1093/jamia/ocae167
Louisa A Stark, Kristin E Fenker, Harini Krishnan, Molly Malone, Rebecca J Peterson, Regina Cowan, Jeremy Ensrud, Hector Gamboa, Crstina Gayed, Patricia Refino, Tia Tolk, Teresa Walters, Yong Crosby, Rubin Baskir

Objectives: We describe new curriculum materials for engaging secondary school students in exploring the "big data" in the NIH All of Us Research Program's Public Data Browser and the co-design processes used to collaboratively develop the materials. We also describe the methods used to develop and validate assessment items for studying the efficacy of the materials for student learning as well as preliminary findings from these studies.

Materials and methods: Secondary-level biology teachers from across the United States participated in a 2.5-day Co-design Summer Institute. After learning about the All of Us Research Program and its Data Browser, they collaboratively developed learning objectives and initial ideas for learning experiences related to exploring the Data Browser and big data. The Genetic Science Learning Center team at the University of Utah further developed the educators' ideas. Additional teachers and their students participated in classroom pilot studies to validate a 22-item instrument that assesses students' knowledge. Educators completed surveys about the materials and their experiences.

Results: The "Exploring Big Data with the All of Us Data Browser" curriculum module includes 3 data exploration guides that engage students in using the Data Browser, 3 related multimedia pieces, and teacher support materials. Pilot testing showed substantial growth in students' understanding of key big data concepts and research applications.

Discussion and conclusion: Our co-design process provides a model for educator engagement. The new curriculum module serves as a model for introducing secondary students to big data and precision medicine research by exploring diverse real-world datasets.

目的:我们介绍了让中学生参与探索美国国立卫生研究院全民研究计划公共数据浏览器中的 "大数据 "的新课程材料,以及合作开发这些材料所采用的共同设计过程。我们还介绍了用于开发和验证评估项目的方法,以研究教材对学生学习的有效性,以及这些研究的初步结果:来自美国各地的中学生物教师参加了为期 2.5 天的共同设计暑期学院。在了解了 "我们所有人 "研究计划及其数据浏览器之后,他们共同制定了学习目标,并初步构想了与探索数据浏览器和大数据有关的学习体验。犹他大学遗传科学学习中心团队进一步完善了教育工作者的想法。其他教师及其学生参与了课堂试点研究,以验证评估学生知识的 22 个项目的工具。教育工作者完成了有关教材及其经验的调查:使用我们所有人的数据浏览器探索大数据 "课程模块包括 3 个数据探索指南(让学生参与使用数据浏览器)、3 个相关的多媒体作品和教师支持材料。试点测试表明,学生对关键大数据概念和研究应用的理解有了很大提高:我们的共同设计过程为教育工作者的参与提供了一种模式。新课程模块是通过探索各种真实世界数据集向中学生介绍大数据和精准医学研究的典范。
{"title":"Research to classrooms: a co-designed curriculum brings All of Us data to secondary schools.","authors":"Louisa A Stark, Kristin E Fenker, Harini Krishnan, Molly Malone, Rebecca J Peterson, Regina Cowan, Jeremy Ensrud, Hector Gamboa, Crstina Gayed, Patricia Refino, Tia Tolk, Teresa Walters, Yong Crosby, Rubin Baskir","doi":"10.1093/jamia/ocae167","DOIUrl":"10.1093/jamia/ocae167","url":null,"abstract":"<p><strong>Objectives: </strong>We describe new curriculum materials for engaging secondary school students in exploring the \"big data\" in the NIH All of Us Research Program's Public Data Browser and the co-design processes used to collaboratively develop the materials. We also describe the methods used to develop and validate assessment items for studying the efficacy of the materials for student learning as well as preliminary findings from these studies.</p><p><strong>Materials and methods: </strong>Secondary-level biology teachers from across the United States participated in a 2.5-day Co-design Summer Institute. After learning about the All of Us Research Program and its Data Browser, they collaboratively developed learning objectives and initial ideas for learning experiences related to exploring the Data Browser and big data. The Genetic Science Learning Center team at the University of Utah further developed the educators' ideas. Additional teachers and their students participated in classroom pilot studies to validate a 22-item instrument that assesses students' knowledge. Educators completed surveys about the materials and their experiences.</p><p><strong>Results: </strong>The \"Exploring Big Data with the All of Us Data Browser\" curriculum module includes 3 data exploration guides that engage students in using the Data Browser, 3 related multimedia pieces, and teacher support materials. Pilot testing showed substantial growth in students' understanding of key big data concepts and research applications.</p><p><strong>Discussion and conclusion: </strong>Our co-design process provides a model for educator engagement. The new curriculum module serves as a model for introducing secondary students to big data and precision medicine research by exploring diverse real-world datasets.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141564952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of calibration to improve the precision of estimates obtained from All of Us data. 利用校准提高从 "我们所有人 "数据中获得的估计值的精确度。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-09 DOI: 10.1093/jamia/ocae181
Vivian Hsing-Chun Wang, Julie Holm, José A Pagán

Objectives: To highlight the use of calibration weighting to improve the precision of estimates obtained from All of Us data and increase the return of value to communities from the All of Us Research Program.

Materials and methods: We used All of Us (2017-2022) data and raking to obtain prevalence estimates in two examples: discrimination in medical settings (N = 41 875) and food insecurity (N = 82 266). Weights were constructed using known population proportions (age, sex, race/ethnicity, region of residence, annual household income, and home ownership) from the 2020 National Health Interview Survey.

Results: About 37% of adults experienced discrimination in a medical setting. About 20% of adults who had not seen a doctor reported being food insecure compared with 14% of adults who regularly saw a doctor.

Conclusions: Calibration using raking is cost-effective and may lead to more precise estimates when analyzing All of Us data.

目标:强调校准加权的使用,以提高从 "我们所有人 "数据中获得的估计值的精确度,并增加 "我们所有人 "研究计划对社区的价值回报:我们使用 All of Us(2017-2022 年)数据和耙法获得了两个实例的流行率估计值:医疗环境中的歧视(N = 41 875)和粮食不安全(N = 82 266)。利用 2020 年全国健康访谈调查的已知人口比例(年龄、性别、种族/民族、居住地区、家庭年收入和房屋所有权)构建权重:约 37% 的成年人在医疗环境中遭受过歧视。约 20% 没有看过医生的成年人表示食物无保障,而定期看医生的成年人中这一比例为 14%:在分析 "我们所有人 "数据时,使用耙法进行校准具有成本效益,并可获得更精确的估计值。
{"title":"Use of calibration to improve the precision of estimates obtained from All of Us data.","authors":"Vivian Hsing-Chun Wang, Julie Holm, José A Pagán","doi":"10.1093/jamia/ocae181","DOIUrl":"https://doi.org/10.1093/jamia/ocae181","url":null,"abstract":"<p><strong>Objectives: </strong>To highlight the use of calibration weighting to improve the precision of estimates obtained from All of Us data and increase the return of value to communities from the All of Us Research Program.</p><p><strong>Materials and methods: </strong>We used All of Us (2017-2022) data and raking to obtain prevalence estimates in two examples: discrimination in medical settings (N = 41 875) and food insecurity (N = 82 266). Weights were constructed using known population proportions (age, sex, race/ethnicity, region of residence, annual household income, and home ownership) from the 2020 National Health Interview Survey.</p><p><strong>Results: </strong>About 37% of adults experienced discrimination in a medical setting. About 20% of adults who had not seen a doctor reported being food insecure compared with 14% of adults who regularly saw a doctor.</p><p><strong>Conclusions: </strong>Calibration using raking is cost-effective and may lead to more precise estimates when analyzing All of Us data.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141564953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fair prediction of 2-year stroke risk in patients with atrial fibrillation. 对心房颤动患者 2 年中风风险的合理预测。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-03 DOI: 10.1093/jamia/ocae170
Jifan Gao, Philip Mar, Zheng-Zheng Tang, Guanhua Chen

Objective: This study aims to develop machine learning models that provide both accurate and equitable predictions of 2-year stroke risk for patients with atrial fibrillation across diverse racial groups.

Materials and methods: Our study utilized structured electronic health records (EHR) data from the All of Us Research Program. Machine learning models (LightGBM) were utilized to capture the relations between stroke risks and the predictors used by the widely recognized CHADS2 and CHA2DS2-VASc scores. We mitigated the racial disparity by creating a representative tuning set, customizing tuning criteria, and setting binary thresholds separately for subgroups. We constructed a hold-out test set that not only supports temporal validation but also includes a larger proportion of Black/African Americans for fairness validation.

Results: Compared to the original CHADS2 and CHA2DS2-VASc scores, significant improvements were achieved by modeling their predictors using machine learning models (Area Under the Receiver Operating Characteristic curve from near 0.70 to above 0.80). Furthermore, applying our disparity mitigation strategies can effectively enhance model fairness compared to the conventional cross-validation approach.

Discussion: Modeling CHADS2 and CHA2DS2-VASc risk factors with LightGBM and our disparity mitigation strategies achieved decent discriminative performance and excellent fairness performance. In addition, this approach can provide a complete interpretation of each predictor. These highlight its potential utility in clinical practice.

Conclusions: Our research presents a practical example of addressing clinical challenges through the All of Us Research Program data. The disparity mitigation framework we proposed is adaptable across various models and data modalities, demonstrating broad potential in clinical informatics.

目的: 本研究旨在开发机器学习模型,以准确、公平地预测不同种族群体心房颤动患者的 2 年中风风险:本研究旨在开发机器学习模型,为不同种族群体的心房颤动患者提供准确、公平的 2 年中风风险预测:我们的研究利用了 "我们所有人研究计划 "的结构化电子健康记录(EHR)数据。我们利用机器学习模型(LightGBM)来捕捉中风风险与被广泛认可的 CHADS2 和 CHA2DS2-VASc 评分所使用的预测因子之间的关系。我们通过创建具有代表性的调整集、定制调整标准以及为亚组分别设置二进制阈值来减少种族差异。我们构建了一个暂不测试集,它不仅支持时间验证,还包括更大比例的黑人/非裔美国人,用于公平性验证:结果:与最初的 CHADS2 和 CHA2DS2-VASc 评分相比,通过使用机器学习模型对其预测因子进行建模,结果有了显著改善(接收者工作特征曲线下面积从接近 0.70 提高到 0.80 以上)。此外,与传统的交叉验证方法相比,采用我们的差异缓解策略可以有效提高模型的公平性:讨论:利用 LightGBM 和我们的差异缓解策略对 CHADS2 和 CHA2DS2-VASc 危险因素建模,取得了良好的判别性能和出色的公平性。此外,这种方法还能提供对每个预测因子的完整解释。这些都凸显了它在临床实践中的潜在用途:我们的研究提供了一个通过 "全民研究计划 "数据应对临床挑战的实例。我们提出的差异缓解框架可适用于各种模型和数据模式,展示了临床信息学的广泛潜力。
{"title":"Fair prediction of 2-year stroke risk in patients with atrial fibrillation.","authors":"Jifan Gao, Philip Mar, Zheng-Zheng Tang, Guanhua Chen","doi":"10.1093/jamia/ocae170","DOIUrl":"https://doi.org/10.1093/jamia/ocae170","url":null,"abstract":"<p><strong>Objective: </strong>This study aims to develop machine learning models that provide both accurate and equitable predictions of 2-year stroke risk for patients with atrial fibrillation across diverse racial groups.</p><p><strong>Materials and methods: </strong>Our study utilized structured electronic health records (EHR) data from the All of Us Research Program. Machine learning models (LightGBM) were utilized to capture the relations between stroke risks and the predictors used by the widely recognized CHADS2 and CHA2DS2-VASc scores. We mitigated the racial disparity by creating a representative tuning set, customizing tuning criteria, and setting binary thresholds separately for subgroups. We constructed a hold-out test set that not only supports temporal validation but also includes a larger proportion of Black/African Americans for fairness validation.</p><p><strong>Results: </strong>Compared to the original CHADS2 and CHA2DS2-VASc scores, significant improvements were achieved by modeling their predictors using machine learning models (Area Under the Receiver Operating Characteristic curve from near 0.70 to above 0.80). Furthermore, applying our disparity mitigation strategies can effectively enhance model fairness compared to the conventional cross-validation approach.</p><p><strong>Discussion: </strong>Modeling CHADS2 and CHA2DS2-VASc risk factors with LightGBM and our disparity mitigation strategies achieved decent discriminative performance and excellent fairness performance. In addition, this approach can provide a complete interpretation of each predictor. These highlight its potential utility in clinical practice.</p><p><strong>Conclusions: </strong>Our research presents a practical example of addressing clinical challenges through the All of Us Research Program data. The disparity mitigation framework we proposed is adaptable across various models and data modalities, demonstrating broad potential in clinical informatics.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141499494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing diagnostic delays in acute hepatic porphyria using health records data and machine learning. 利用健康记录数据和机器学习减少急性肝性卟啉症的诊断延误。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-01 DOI: 10.1093/jamia/ocae141
Balu Bhasuran, Katharina Schmolly, Yuvraaj Kapoor, Nanditha Lakshmi Jayakumar, Raymond Doan, Jigar Amin, Stephen Meninger, Nathan Cheng, Robert Deering, Karl Anderson, Simon W Beaven, Bruce Wang, Vivek A Rudrapatna

Background: Acute hepatic porphyria (AHP) is a group of rare but treatable conditions associated with diagnostic delays of 15 years on average. The advent of electronic health records (EHR) data and machine learning (ML) may improve the timely recognition of rare diseases like AHP. However, prediction models can be difficult to train given the limited case numbers, unstructured EHR data, and selection biases intrinsic to healthcare delivery. We sought to train and characterize models for identifying patients with AHP.

Methods: This diagnostic study used structured and notes-based EHR data from 2 centers at the University of California, UCSF (2012-2022) and UCLA (2019-2022). The data were split into 2 cohorts (referral and diagnosis) and used to develop models that predict (1) who will be referred for testing of acute porphyria, among those who presented with abdominal pain (a cardinal symptom of AHP), and (2) who will test positive, among those referred. The referral cohort consisted of 747 patients referred for testing and 99 849 contemporaneous patients who were not. The diagnosis cohort consisted of 72 confirmed AHP cases and 347 patients who tested negative. The case cohort was 81% female and 6-75 years old at the time of diagnosis. Candidate models used a range of architectures. Feature selection was semi-automated and incorporated publicly available data from knowledge graphs. Our primary outcome was the F-score on an outcome-stratified test set.

Results: The best center-specific referral models achieved an F-score of 86%-91%. The best diagnosis model achieved an F-score of 92%. To further test our model, we contacted 372 current patients who lack an AHP diagnosis but were predicted by our models as potentially having it (≥10% probability of referral, ≥50% of testing positive). However, we were only able to recruit 10 of these patients for biochemical testing, all of whom were negative. Nonetheless, post hoc evaluations suggested that these models could identify 71% of cases earlier than their diagnosis date, saving 1.2 years.

Conclusions: ML can reduce diagnostic delays in AHP and other rare diseases. Robust recruitment strategies and multicenter coordination will be needed to validate these models before they can be deployed.

背景:急性肝卟啉症(AHP)是一组罕见但可治疗的疾病,平均诊断延迟时间长达 15 年。电子健康记录(EHR)数据和机器学习(ML)的出现可能会改善对 AHP 等罕见疾病的及时识别。然而,由于病例数量有限、电子病历数据不结构化以及医疗服务固有的选择偏差,预测模型可能很难训练。我们试图训练和描述识别 AHP 患者的模型:这项诊断研究使用了加州大学旧金山分校(2012-2022 年)和加州大学洛杉矶分校(2019-2022 年)两个中心的结构化和基于笔记的电子病历数据。这些数据被分为两个队列(转诊和诊断),并用于建立模型,预测:(1) 在出现腹痛(AHP 的主要症状)的患者中,哪些人会被转诊接受急性卟啉症检测;(2) 在转诊患者中,哪些人会检测呈阳性。转诊队列由 747 名转诊患者和 99 849 名未转诊的同期患者组成。诊断队列包括 72 例确诊的 AHP 病例和 347 例检测呈阴性的患者。病例群中 81% 为女性,诊断时年龄为 6-75 岁。候选模型采用了一系列架构。特征选择是半自动化的,并结合了知识图谱中的公开数据。我们的主要结果是结果分层测试集上的 F 分数:结果:最佳中心特定转诊模型的 F 分数达到了 86%-91%。最佳诊断模型的 F 分数为 92%。为了进一步测试我们的模型,我们联系了 372 名目前没有 AHP 诊断但被我们的模型预测为可能有 AHP 诊断的患者(转诊概率≥10%,测试阳性概率≥50%)。然而,我们只能招募其中的 10 名患者进行生化检测,结果全部为阴性。尽管如此,事后评估表明,这些模型可以在诊断日期之前发现 71% 的病例,节省了 1.2 年的时间:结论:ML 可以减少 AHP 和其他罕见病的诊断延误。在部署这些模型之前,还需要强有力的招募策略和多中心协调来验证它们。
{"title":"Reducing diagnostic delays in acute hepatic porphyria using health records data and machine learning.","authors":"Balu Bhasuran, Katharina Schmolly, Yuvraaj Kapoor, Nanditha Lakshmi Jayakumar, Raymond Doan, Jigar Amin, Stephen Meninger, Nathan Cheng, Robert Deering, Karl Anderson, Simon W Beaven, Bruce Wang, Vivek A Rudrapatna","doi":"10.1093/jamia/ocae141","DOIUrl":"10.1093/jamia/ocae141","url":null,"abstract":"<p><strong>Background: </strong>Acute hepatic porphyria (AHP) is a group of rare but treatable conditions associated with diagnostic delays of 15 years on average. The advent of electronic health records (EHR) data and machine learning (ML) may improve the timely recognition of rare diseases like AHP. However, prediction models can be difficult to train given the limited case numbers, unstructured EHR data, and selection biases intrinsic to healthcare delivery. We sought to train and characterize models for identifying patients with AHP.</p><p><strong>Methods: </strong>This diagnostic study used structured and notes-based EHR data from 2 centers at the University of California, UCSF (2012-2022) and UCLA (2019-2022). The data were split into 2 cohorts (referral and diagnosis) and used to develop models that predict (1) who will be referred for testing of acute porphyria, among those who presented with abdominal pain (a cardinal symptom of AHP), and (2) who will test positive, among those referred. The referral cohort consisted of 747 patients referred for testing and 99 849 contemporaneous patients who were not. The diagnosis cohort consisted of 72 confirmed AHP cases and 347 patients who tested negative. The case cohort was 81% female and 6-75 years old at the time of diagnosis. Candidate models used a range of architectures. Feature selection was semi-automated and incorporated publicly available data from knowledge graphs. Our primary outcome was the F-score on an outcome-stratified test set.</p><p><strong>Results: </strong>The best center-specific referral models achieved an F-score of 86%-91%. The best diagnosis model achieved an F-score of 92%. To further test our model, we contacted 372 current patients who lack an AHP diagnosis but were predicted by our models as potentially having it (≥10% probability of referral, ≥50% of testing positive). However, we were only able to recruit 10 of these patients for biochemical testing, all of whom were negative. Nonetheless, post hoc evaluations suggested that these models could identify 71% of cases earlier than their diagnosis date, saving 1.2 years.</p><p><strong>Conclusions: </strong>ML can reduce diagnostic delays in AHP and other rare diseases. Robust recruitment strategies and multicenter coordination will be needed to validate these models before they can be deployed.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141472084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
All of whom? Limitations encountered using All of Us Researcher Workbench in a Primary Care residents secondary data analysis research training block. 所有人?在初级保健住院医师二次数据分析研究培训模块中使用 "我们所有人 "研究员工作台遇到的限制。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-06-25 DOI: 10.1093/jamia/ocae162
Fred Willie Zametkin LaPolla, Marco Barber Grossi, Sharon Chen, Tai Wei Guo, Kathryn Havranek, Olivia Jebb, Minh Thu Nguyen, Sneha Panganamamula, Noah Smith, Shree Sundaresh, Jonathan Yu, Gabrielle Mayer

Objectives: The goal of this case report is to detail experiences and challenges experienced in the training of Primary Care residents in secondary analysis using All of Us Researcher Workbench. At our large, urban safety net hospital, Primary Care/Internal Medicine residents in their third year undergo a research intensive block, the Research Practicum, where they work as a team to conduct secondary data analysis on a dataset with faculty facilitation. In 2023, this research block focused on use of the All of Us Researcher Workbench for secondary data analysis.

Materials and methods: Two groups of 5 residents underwent training to access the All of Us Researcher Workbench, and each group explored available data with a faculty facilitator and generated original research questions. Two blocks of residents successfully completed their research blocks and created original presentations on "social isolation and A1C" levels and "medical discrimination and diabetes management."

Results: Departmental faculty were satisfied with the depth of learning and data exploration. In focus groups, some residents noted that for those without interest in performing research, the activity felt extraneous to their career goals, while others were glad for the opportunity to publish. In both blocks, residents highlighted dissatisfaction with the degree to which the All of Us Researcher Workbench was representative of patients they encounter in a large safety net hospital.

Discussion: Using the All of Us Researcher Workbench provided residents with an opportunity to explore novel questions in a massive data source. Many residents however noted that because the population described in the All of Us Researcher Workbench appeared to be more highly educated and less racially diverse than patients they encounter in their practice, research may be hard to generalize in a community health context. Additionally, given that the data required knowledge of 1 of 2 code-based data analysis languages (R or Python) and work within an idiosyncratic coding environment, residents were heavily reliant on a faculty facilitator to assist with analysis.

Conclusion: Using the All of Us Researcher Workbench for research training allowed residents to explore novel questions and gain first-hand exposure to opportunities and challenges in secondary data analysis.

目的:本病例报告旨在详细介绍使用 "我们所有人 "研究员工作台对初级保健住院医师进行二次分析培训的经验和挑战。在我们这家大型城市安全网医院,初级保健/内科住院医师在第三年要接受研究实习这一研究强化阶段的培训,在这一阶段,他们以团队的形式在教师的协助下对数据集进行二次数据分析。2023 年,该研究单元的重点是使用 "我们所有人 "研究员工作台进行二级数据分析:两组共 5 名住院医师接受了访问 All of Us Researcher Workbench 的培训,每组在教师的协助下探索可用数据,并提出原创研究问题。两组住院医师成功完成了他们的研究模块,并就 "社会隔离与 A1C "水平和 "医疗歧视与糖尿病管理 "发表了原创演讲:部门教师对学习和数据探索的深度表示满意。在焦点小组中,一些住院医师指出,对于那些没有兴趣从事研究的住院医师来说,这项活动感觉与他们的职业目标无关,而另一些住院医师则为有机会发表论文而感到高兴。在这两个讨论组中,住院医师们都强调了对 "我们所有人 "研究人员工作台在多大程度上代表了他们在大型安全网医院中遇到的病人的不满:讨论:使用 "我们所有人 "研究人员工作台为住院医师提供了一个在海量数据源中探索新问题的机会。然而,许多居民指出,由于 "我们所有人 "研究人员工作台中描述的人群与他们在实践中遇到的患者相比,受教育程度更高,种族多样性更少,因此研究可能难以在社区卫生环境中推广。此外,鉴于数据需要掌握 2 种基于代码的数据分析语言(R 或 Python)中的一种,并且需要在特殊的编码环境中工作,因此居民在很大程度上依赖于教师协助分析:使用 "我们所有人 "研究人员工作台进行研究培训,使住院医师能够探索新问题,并亲身体验二手数据分析的机遇和挑战。
{"title":"All of whom? Limitations encountered using All of Us Researcher Workbench in a Primary Care residents secondary data analysis research training block.","authors":"Fred Willie Zametkin LaPolla, Marco Barber Grossi, Sharon Chen, Tai Wei Guo, Kathryn Havranek, Olivia Jebb, Minh Thu Nguyen, Sneha Panganamamula, Noah Smith, Shree Sundaresh, Jonathan Yu, Gabrielle Mayer","doi":"10.1093/jamia/ocae162","DOIUrl":"https://doi.org/10.1093/jamia/ocae162","url":null,"abstract":"<p><strong>Objectives: </strong>The goal of this case report is to detail experiences and challenges experienced in the training of Primary Care residents in secondary analysis using All of Us Researcher Workbench. At our large, urban safety net hospital, Primary Care/Internal Medicine residents in their third year undergo a research intensive block, the Research Practicum, where they work as a team to conduct secondary data analysis on a dataset with faculty facilitation. In 2023, this research block focused on use of the All of Us Researcher Workbench for secondary data analysis.</p><p><strong>Materials and methods: </strong>Two groups of 5 residents underwent training to access the All of Us Researcher Workbench, and each group explored available data with a faculty facilitator and generated original research questions. Two blocks of residents successfully completed their research blocks and created original presentations on \"social isolation and A1C\" levels and \"medical discrimination and diabetes management.\"</p><p><strong>Results: </strong>Departmental faculty were satisfied with the depth of learning and data exploration. In focus groups, some residents noted that for those without interest in performing research, the activity felt extraneous to their career goals, while others were glad for the opportunity to publish. In both blocks, residents highlighted dissatisfaction with the degree to which the All of Us Researcher Workbench was representative of patients they encounter in a large safety net hospital.</p><p><strong>Discussion: </strong>Using the All of Us Researcher Workbench provided residents with an opportunity to explore novel questions in a massive data source. Many residents however noted that because the population described in the All of Us Researcher Workbench appeared to be more highly educated and less racially diverse than patients they encounter in their practice, research may be hard to generalize in a community health context. Additionally, given that the data required knowledge of 1 of 2 code-based data analysis languages (R or Python) and work within an idiosyncratic coding environment, residents were heavily reliant on a faculty facilitator to assist with analysis.</p><p><strong>Conclusion: </strong>Using the All of Us Researcher Workbench for research training allowed residents to explore novel questions and gain first-hand exposure to opportunities and challenges in secondary data analysis.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141452050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the American Medical Informatics Association
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1