Objective: The aim of this project was to create time-aware, individual-level risk score models for adverse drug events related to multiple sclerosis disease-modifying therapy and to provide interpretable explanations for model prediction behavior.
Materials and methods: We used temporal sequences of observational medical outcomes partnership common data model (OMOP CDM) concepts derived from an electronic health record as model features. Each concept was assigned an embedding representation that was learned from a graph convolution network trained on a knowledge graph (KG) of OMOP concept relationships. Concept embeddings were fed into long short-term memory networks for 1-year adverse event prediction following drug exposure. Finally, we implemented a novel extension of the local interpretable model agnostic explanation (LIME) method, knowledge graph LIME (KG-LIME) to leverage the KG and explain individual predictions of each model.
Results: For a set of 4859 patients, we found that our model was effective at predicting 32 out of 56 adverse event types (P < .05) when compared to demographics and past diagnosis as variables. We also assessed discrimination in the form of area under the curve (AUC = 0.77 ± 0.15) and area under the precision-recall curve (AUC-PR = 0.31 ± 0.27) and assessed calibration in the form of Brier score (BS = 0.04 ± 0.04). Additionally, KG-LIME generated interpretable literature-validated lists of relevant medical concepts used for prediction.
Discussion and conclusion: Many of our risk models demonstrated high calibration and discrimination for adverse event prediction. Furthermore, our novel KG-LIME method was able to utilize the knowledge graph to highlight concepts that were important to prediction. Future work will be required to further explore the temporal window of adverse event occurrence beyond the generic 1-year window used here, particularly for short-term inpatient adverse events and long-term severe adverse events.
{"title":"KG-LIME: predicting individualized risk of adverse drug events for multiple sclerosis disease-modifying therapy.","authors":"Jason Patterson, Nicholas Tatonetti","doi":"10.1093/jamia/ocae155","DOIUrl":"10.1093/jamia/ocae155","url":null,"abstract":"<p><strong>Objective: </strong>The aim of this project was to create time-aware, individual-level risk score models for adverse drug events related to multiple sclerosis disease-modifying therapy and to provide interpretable explanations for model prediction behavior.</p><p><strong>Materials and methods: </strong>We used temporal sequences of observational medical outcomes partnership common data model (OMOP CDM) concepts derived from an electronic health record as model features. Each concept was assigned an embedding representation that was learned from a graph convolution network trained on a knowledge graph (KG) of OMOP concept relationships. Concept embeddings were fed into long short-term memory networks for 1-year adverse event prediction following drug exposure. Finally, we implemented a novel extension of the local interpretable model agnostic explanation (LIME) method, knowledge graph LIME (KG-LIME) to leverage the KG and explain individual predictions of each model.</p><p><strong>Results: </strong>For a set of 4859 patients, we found that our model was effective at predicting 32 out of 56 adverse event types (P < .05) when compared to demographics and past diagnosis as variables. We also assessed discrimination in the form of area under the curve (AUC = 0.77 ± 0.15) and area under the precision-recall curve (AUC-PR = 0.31 ± 0.27) and assessed calibration in the form of Brier score (BS = 0.04 ± 0.04). Additionally, KG-LIME generated interpretable literature-validated lists of relevant medical concepts used for prediction.</p><p><strong>Discussion and conclusion: </strong>Many of our risk models demonstrated high calibration and discrimination for adverse event prediction. Furthermore, our novel KG-LIME method was able to utilize the knowledge graph to highlight concepts that were important to prediction. Future work will be required to further explore the temporal window of adverse event occurrence beyond the generic 1-year window used here, particularly for short-term inpatient adverse events and long-term severe adverse events.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1693-1703"},"PeriodicalIF":4.7,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11535856/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141535796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The integration of large language models (LLMs) like ChatGPT into medical education presents potential benefits and challenges. These technologies, aligned with constructivist learning theories, could potentially enhance critical thinking and problem-solving through inquiry-based learning environments. However, the actual impact on educational outcomes and the effectiveness of these tools in fostering learning require further empirical study. This technological shift necessitates a reevaluation of curriculum design and the development of new assessment methodologies to measure its effects accurately. Additionally, the use of LLMs introduces significant ethical concerns, particularly in addressing inherent AI biases to ensure equitable educational access. LLMs may also help reduce global disparities in medical education by providing broader access to contemporary medical knowledge and practices, though their deployment must be managed carefully to truly support the training of competent, ethical medical professionals.
{"title":"Constructing knowledge: the role of AI in medical learning.","authors":"Aaron Lawson McLean","doi":"10.1093/jamia/ocae124","DOIUrl":"10.1093/jamia/ocae124","url":null,"abstract":"<p><p>The integration of large language models (LLMs) like ChatGPT into medical education presents potential benefits and challenges. These technologies, aligned with constructivist learning theories, could potentially enhance critical thinking and problem-solving through inquiry-based learning environments. However, the actual impact on educational outcomes and the effectiveness of these tools in fostering learning require further empirical study. This technological shift necessitates a reevaluation of curriculum design and the development of new assessment methodologies to measure its effects accurately. Additionally, the use of LLMs introduces significant ethical concerns, particularly in addressing inherent AI biases to ensure equitable educational access. LLMs may also help reduce global disparities in medical education by providing broader access to contemporary medical knowledge and practices, though their deployment must be managed carefully to truly support the training of competent, ethical medical professionals.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1797-1798"},"PeriodicalIF":4.7,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11258398/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141174560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ivann Agapito, Tu Hoang, Michael Sayer, Ali Naqvi, Pranav M Patel, Aya F Ozaki
Importance and objective: Identifying sources of sex-based disparities is the first step in improving clinical outcomes for female patients. Using All of Us data, we examined the association of biological sex with cost-related medication adherence (CRMA) issues in patients with cardiovascular comorbidities.
Materials and methods: Retrospective data collection identified the following patients: 18 and older, completing personal medical history surveys, having hypertension (HTN), ischemic heart disease (IHD), or heart failure (HF) with medication use history consistent with these diagnoses. Implementing univariable and adjusted logistic regression, we assessed the influence of biological sex on 7 different patient-reported CRMA outcomes within HTN, IHD, and HF patients.
Results: Our study created cohorts of HTN (n = 3891), IHD (n = 5373), and HF (n = 2151) patients having CRMA outcomes data. Within each cohort, females were significantly more likely to report various cost-related medication issues: being unable to afford medications (HTN hazards ratio [HR]: 1.68, confidence interval [CI]: 1.33-2.13; IHD HR: 2.33, CI: 1.72-3.16; HF HR: 1.82, CI: 1.22-2.71), skipping doses (HTN HR: 1.76, CI: 1.30-2.39; IHD HR: 2.37, CI: 1.69-3.64; HF HR: 3.15, CI: 1.87-5.31), taking less medication (HTN HR: 1.86, CI: 1.37-2.45; IHD HR: 2.22, CI: 1.53-3.22; HF HR: 2.99, CI: 1.78-5.02), delaying filling prescriptions (HTN HR: 1.83, CI: 1.43-2.39; IHD HR: 2.02, CI: 1.48-2.77; HF HR: 2.99, CI: 1.79-5.03), and asking for lower cost medications (HTN HR: 1.41, CI: 1.16-1.72; IHD HR: 1.75, CI: 1.37-2.22; HF HR: 1.61, CI: 1.14-2.27).
Discussion and conclusion: Our results clearly demonstrate CRMA issues disproportionately affect female patients with cardiovascular comorbidities, which may contribute to the larger sex-based disparities in cardiovascular care. These findings call for targeted interventions and strategies to address these disparities and ensure equitable access to cardiovascular medications and care for all patients.
{"title":"Sex-based disparities with cost-related medication adherence issues in patients with hypertension, ischemic heart disease, and heart failure.","authors":"Ivann Agapito, Tu Hoang, Michael Sayer, Ali Naqvi, Pranav M Patel, Aya F Ozaki","doi":"10.1093/jamia/ocae203","DOIUrl":"https://doi.org/10.1093/jamia/ocae203","url":null,"abstract":"<p><strong>Importance and objective: </strong>Identifying sources of sex-based disparities is the first step in improving clinical outcomes for female patients. Using All of Us data, we examined the association of biological sex with cost-related medication adherence (CRMA) issues in patients with cardiovascular comorbidities.</p><p><strong>Materials and methods: </strong>Retrospective data collection identified the following patients: 18 and older, completing personal medical history surveys, having hypertension (HTN), ischemic heart disease (IHD), or heart failure (HF) with medication use history consistent with these diagnoses. Implementing univariable and adjusted logistic regression, we assessed the influence of biological sex on 7 different patient-reported CRMA outcomes within HTN, IHD, and HF patients.</p><p><strong>Results: </strong>Our study created cohorts of HTN (n = 3891), IHD (n = 5373), and HF (n = 2151) patients having CRMA outcomes data. Within each cohort, females were significantly more likely to report various cost-related medication issues: being unable to afford medications (HTN hazards ratio [HR]: 1.68, confidence interval [CI]: 1.33-2.13; IHD HR: 2.33, CI: 1.72-3.16; HF HR: 1.82, CI: 1.22-2.71), skipping doses (HTN HR: 1.76, CI: 1.30-2.39; IHD HR: 2.37, CI: 1.69-3.64; HF HR: 3.15, CI: 1.87-5.31), taking less medication (HTN HR: 1.86, CI: 1.37-2.45; IHD HR: 2.22, CI: 1.53-3.22; HF HR: 2.99, CI: 1.78-5.02), delaying filling prescriptions (HTN HR: 1.83, CI: 1.43-2.39; IHD HR: 2.02, CI: 1.48-2.77; HF HR: 2.99, CI: 1.79-5.03), and asking for lower cost medications (HTN HR: 1.41, CI: 1.16-1.72; IHD HR: 1.75, CI: 1.37-2.22; HF HR: 1.61, CI: 1.14-2.27).</p><p><strong>Discussion and conclusion: </strong>Our results clearly demonstrate CRMA issues disproportionately affect female patients with cardiovascular comorbidities, which may contribute to the larger sex-based disparities in cardiovascular care. These findings call for targeted interventions and strategies to address these disparities and ensure equitable access to cardiovascular medications and care for all patients.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141861446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chelsea N Wong, Louisa H Smith, Robert Cavanaugh, Dae H Kim, Carl G Streed, Farzana Kapadia, Brianne Olivieri-Mui
Objectives: To understand how frailty and healthcare delays differentially mediate the association between sexual and gender minority older adults (OSGM) status and healthcare utilization.
Materials and methods: Data from the All of Us Research Program participants ≥50 years old were analyzed using marginal structural modelling to assess if frailty or healthcare delays mediated OSGM status and healthcare utilization. OSGM status, healthcare delays, and frailty were assessed using survey data. Electronic health record (EHR) data was used to measure the number of medical visits or mental health (MH) visit days, following 12 months from the calculated All of Us Frailty Index. Analyses adjusted for age, race and ethnicity, income, HIV, marital status ± general MH (only MH analyses).
Results: Compared to non-OSGM, OSGM adults have higher rates of medical visits (adjusted rate ratio [aRR]: 1.14; 95% CI: 1.03, 1.24) and MH visits (aRR: 1.85; 95% CI: 1.07, 2.91). Frailty mediated the association between OSGM status medical visits (Controlled direct effect [Rcde] aRR: 1.03, 95% CI [0.87, 1.22]), but not MH visits (Rcde aRR: 0.37 [95% CI: 0.06, 1.47]). Delays mediated the association between OSGM status and MH visit days (Rcde aRR: 2.27, 95% CI [1.15, 3.76]), but not medical visits (Rcde aRR: 1.06 [95% CI: 0.97, 1.17]).
Discussion: Frailty represents a need for medical care among OSGM adults, highlighting the importance of addressing it to improve health and healthcare utilization disparities. In contrast, healthcare delays are a barrier to MH care, underscoring the necessity of targeted strategies to ensure timely MH care for OSGM adults.
摘要了解虚弱和医疗保健延误如何在不同程度上介导性少数群体和性别少数群体老年人(OSGM)状况与医疗保健利用率之间的关联:采用边际结构模型对 "我们所有人研究计划"(All of Us Research Program)中年龄≥50岁的参与者的数据进行分析,以评估虚弱或医疗保健延误是否会介导OSGM状况和医疗保健利用率。OSGM状况、医疗保健延误和虚弱程度通过调查数据进行评估。电子健康记录(EHR)数据用于测量计算出 "我们所有人 "虚弱指数后 12 个月内的就诊次数或精神健康(MH)就诊天数。分析对年龄、种族和民族、收入、HIV、婚姻状况±一般 MH(仅 MH 分析)进行了调整:与非 OSGM 相比,OSGM 成年人的就诊率(调整后比率比 [aRR]:1.14;95% CI:1.03,1.24)和 MH 就诊率(aRR:1.85;95% CI:1.07,2.91)更高。虚弱是 OSGM 状况与就诊次数之间关系的中介(控制直接效应 [Rcde] aRR:1.03,95% CI [0.87,1.22]),但不是 MH 就诊次数的中介(Rcde aRR:0.37 [95% CI:0.06,1.47])。延迟介导了 OSGM 状态与 MH 就诊天数之间的关联(Rcde aRR:2.27,95% CI [1.15,3.76]),但不介导医疗就诊(Rcde aRR:1.06 [95% CI:0.97,1.17]):讨论:体弱是 OSGM 成年人对医疗护理的一种需求,突出了解决体弱问题以改善健康和医疗使用差异的重要性。与此相反,医疗保健延误是获得医疗保健服务的障碍,因此有必要采取有针对性的策略,确保为 OSGM 成年人提供及时的医疗保健服务。
{"title":"Assessing how frailty and healthcare delays mediate the association between sexual and gender minority status and healthcare utilization in the All of Us Research Program.","authors":"Chelsea N Wong, Louisa H Smith, Robert Cavanaugh, Dae H Kim, Carl G Streed, Farzana Kapadia, Brianne Olivieri-Mui","doi":"10.1093/jamia/ocae205","DOIUrl":"10.1093/jamia/ocae205","url":null,"abstract":"<p><strong>Objectives: </strong>To understand how frailty and healthcare delays differentially mediate the association between sexual and gender minority older adults (OSGM) status and healthcare utilization.</p><p><strong>Materials and methods: </strong>Data from the All of Us Research Program participants ≥50 years old were analyzed using marginal structural modelling to assess if frailty or healthcare delays mediated OSGM status and healthcare utilization. OSGM status, healthcare delays, and frailty were assessed using survey data. Electronic health record (EHR) data was used to measure the number of medical visits or mental health (MH) visit days, following 12 months from the calculated All of Us Frailty Index. Analyses adjusted for age, race and ethnicity, income, HIV, marital status ± general MH (only MH analyses).</p><p><strong>Results: </strong>Compared to non-OSGM, OSGM adults have higher rates of medical visits (adjusted rate ratio [aRR]: 1.14; 95% CI: 1.03, 1.24) and MH visits (aRR: 1.85; 95% CI: 1.07, 2.91). Frailty mediated the association between OSGM status medical visits (Controlled direct effect [Rcde] aRR: 1.03, 95% CI [0.87, 1.22]), but not MH visits (Rcde aRR: 0.37 [95% CI: 0.06, 1.47]). Delays mediated the association between OSGM status and MH visit days (Rcde aRR: 2.27, 95% CI [1.15, 3.76]), but not medical visits (Rcde aRR: 1.06 [95% CI: 0.97, 1.17]).</p><p><strong>Discussion: </strong>Frailty represents a need for medical care among OSGM adults, highlighting the importance of addressing it to improve health and healthcare utilization disparities. In contrast, healthcare delays are a barrier to MH care, underscoring the necessity of targeted strategies to ensure timely MH care for OSGM adults.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141793939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elizabeth Cohn, Frida Esther Kleiman, Shayaa Muhammad, S Scott Jones, Nakisa Pourkey, Louise Bier
Objective: The All of Us Research Program aims to return value to participants by developing research capacity in communities. We describe a novel set of introductory exercises (Data Sandboxes) and specialized trainings to orient researchers to the Researcher Workbench to foster health equity research.
Materials and methods: We developed a tailored training to familiarize researchers with the All of Us Research Program: (1) orientation, (2) tailored "data treasure hunt" using the Public Data Browser, and (3) overview of the analyses tools and platform.
Results: Participants' pre- and post-knowledge of the contents and structure of the All of Us dataset scores increased significantly after training. These trainings effectively engaged researchers in exploring this rich dataset.
Conclusion: We describe ways of orienting and familiarizing a wide variety of researchers with the All of Us Research Program dataset, sparking their interest, and "jump-starting" their research.
{"title":"Returning value to the community through the All of Us Research Program Data Sandbox model.","authors":"Elizabeth Cohn, Frida Esther Kleiman, Shayaa Muhammad, S Scott Jones, Nakisa Pourkey, Louise Bier","doi":"10.1093/jamia/ocae174","DOIUrl":"https://doi.org/10.1093/jamia/ocae174","url":null,"abstract":"<p><strong>Objective: </strong>The All of Us Research Program aims to return value to participants by developing research capacity in communities. We describe a novel set of introductory exercises (Data Sandboxes) and specialized trainings to orient researchers to the Researcher Workbench to foster health equity research.</p><p><strong>Materials and methods: </strong>We developed a tailored training to familiarize researchers with the All of Us Research Program: (1) orientation, (2) tailored \"data treasure hunt\" using the Public Data Browser, and (3) overview of the analyses tools and platform.</p><p><strong>Results: </strong>Participants' pre- and post-knowledge of the contents and structure of the All of Us dataset scores increased significantly after training. These trainings effectively engaged researchers in exploring this rich dataset.</p><p><strong>Conclusion: </strong>We describe ways of orienting and familiarizing a wide variety of researchers with the All of Us Research Program dataset, sparking their interest, and \"jump-starting\" their research.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141793942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective: We aimed to evaluate the feasibility of using ChatGPT as programming support for nursing PhD students conducting analyses using the All of Us Researcher Workbench.
Materials and methods: 9 students in a PhD-level nursing course were prospectively randomized into 2 groups who used ChatGPT for programming support on alternating assignments in the workbench. Students reported completion time, confidence, and qualitative reflections on barriers, resources used, and the learning process.
Results: The median completion time was shorter for novices and certain assignments using ChatGPT. In qualitative reflections, students reported ChatGPT helped generate and troubleshoot code and facilitated learning but was occasionally inaccurate.
Discussion: ChatGPT provided cognitive scaffolding that enabled students to move toward complex programming tasks using the All of Us Researcher Workbench but should be used in combination with other resources.
Conclusion: Our findings support the feasibility of using ChatGPT to help PhD nursing students use the All of Us Researcher Workbench to pursue novel research directions.
目的我们旨在评估使用 ChatGPT 作为编程支持的可行性,以帮助护理学博士生使用 "我们所有人 "研究员工作台进行分析。材料与方法:9 名护理学博士课程的学生被随机分为两组,在工作台中交替作业时使用 ChatGPT 作为编程支持。学生们报告了完成时间、信心以及对障碍、所用资源和学习过程的定性反思:结果:使用 ChatGPT 的新手和某些作业的中位完成时间较短。在定性反思中,学生们表示 ChatGPT 有助于生成代码和排除故障,促进了学习,但偶尔也会出现不准确的情况:讨论:ChatGPT 提供了认知支架,使学生能够使用 All of Us Researcher 工作台完成复杂的编程任务,但应与其他资源结合使用:我们的研究结果支持使用 ChatGPT 帮助护理学博士生使用 All of Us Researcher Workbench 追求新的研究方向的可行性。
{"title":"Returning value from the All of Us research program to PhD-level nursing students using ChatGPT as programming support: results from a mixed-methods experimental feasibility study.","authors":"Meghan Reading Turchioe, Sergey Kisselev, Ruilin Fan, Suzanne Bakken","doi":"10.1093/jamia/ocae208","DOIUrl":"https://doi.org/10.1093/jamia/ocae208","url":null,"abstract":"<p><strong>Objective: </strong>We aimed to evaluate the feasibility of using ChatGPT as programming support for nursing PhD students conducting analyses using the All of Us Researcher Workbench.</p><p><strong>Materials and methods: </strong>9 students in a PhD-level nursing course were prospectively randomized into 2 groups who used ChatGPT for programming support on alternating assignments in the workbench. Students reported completion time, confidence, and qualitative reflections on barriers, resources used, and the learning process.</p><p><strong>Results: </strong>The median completion time was shorter for novices and certain assignments using ChatGPT. In qualitative reflections, students reported ChatGPT helped generate and troubleshoot code and facilitated learning but was occasionally inaccurate.</p><p><strong>Discussion: </strong>ChatGPT provided cognitive scaffolding that enabled students to move toward complex programming tasks using the All of Us Researcher Workbench but should be used in combination with other resources.</p><p><strong>Conclusion: </strong>Our findings support the feasibility of using ChatGPT to help PhD nursing students use the All of Us Researcher Workbench to pursue novel research directions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141793941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han Yang, Sicheng Zhou, Zexi Rao, Chen Zhao, Erjia Cui, Chetan Shenoy, Anne H Blaes, Nishitha Paidimukkala, Jinhua Wang, Jue Hou, Rui Zhang
Objective: This study leverages the rich diversity of the All of Us Research Program (All of Us)'s dataset to devise a predictive model for cardiovascular disease (CVD) in breast cancer (BC) survivors. Central to this endeavor is the creation of a robust data integration pipeline that synthesizes electronic health records (EHRs), patient surveys, and genomic data, while upholding fairness across demographic variables.
Materials and methods: We have developed a universal data wrangling pipeline to process and merge heterogeneous data sources of the All of Us dataset, address missingness and variance in data, and align disparate data modalities into a coherent framework for analysis. Utilizing a composite feature set including EHR, lifestyle, and social determinants of health (SDoH) data, we then employed Adaptive Lasso and Random Forest regression models to predict 6 CVD outcomes. The models were evaluated using the c-index and time-dependent Area Under the Receiver Operating Characteristic Curve over a 10-year period.
Results: The Adaptive Lasso model showed consistent performance across most CVD outcomes, while the Random Forest model excelled particularly in predicting outcomes like transient ischemic attack when incorporating the full multi-model feature set. Feature importance analysis revealed age and previous coronary events as dominant predictors across CVD outcomes, with SDoH clustering labels highlighting the nuanced impact of social factors.
Discussion: The development of both Cox-based predictive model and Random Forest Regression model represents the extensive application of the All of Us, in integrating EHR and patient surveys to enhance precision medicine. And the inclusion of SDoH clustering labels revealed the significant impact of sociobehavioral factors on patient outcomes, emphasizing the importance of comprehensive health determinants in predictive models. Despite these advancements, limitations include the exclusion of genetic data, broad categorization of CVD conditions, and the need for fairness analyses to ensure equitable model performance across diverse populations. Future work should refine clinical and social variable measurements, incorporate advanced imputation techniques, and explore additional predictive algorithms to enhance model precision and fairness.
Conclusion: This study demonstrates the liability of the All of Us's diverse dataset in developing a multi-modality predictive model for CVD in BC survivors risk stratification in oncological survivorship. The data integration pipeline and subsequent predictive models establish a methodological foundation for future research into personalized healthcare.
研究目的本研究利用 "我们所有人研究计划"(All of Us)数据集的丰富多样性,设计出乳腺癌(BC)幸存者心血管疾病(CVD)的预测模型。这项工作的核心是创建一个强大的数据集成管道,该管道可综合电子健康记录(EHR)、患者调查和基因组数据,同时维护不同人口统计学变量之间的公平性:我们开发了一个通用数据处理管道,用于处理和合并 "我们所有人 "数据集的异构数据源,解决数据缺失和数据差异问题,并将不同的数据模式整合到一个连贯的分析框架中。利用包括电子病历、生活方式和健康的社会决定因素 (SDoH) 数据在内的复合特征集,我们采用自适应拉索和随机森林回归模型来预测 6 种心血管疾病的结果。在 10 年的时间里,我们使用 c 指数和随时间变化的接收者工作特征曲线下面积对模型进行了评估:结果:自适应套索模型在大多数心血管疾病结果中表现出一致的性能,而随机森林模型在预测短暂性脑缺血发作等结果时表现尤为突出,因为它结合了完整的多模型特征集。特征重要性分析表明,年龄和既往冠心病事件是预测心血管疾病结果的主要因素,而SDoH聚类标签则突出了社会因素的细微影响:基于 Cox 的预测模型和随机森林回归模型的开发代表了 "我们所有人 "在整合电子病历和患者调查以提高精准医疗方面的广泛应用。SDoH聚类标签的加入揭示了社会行为因素对患者预后的重大影响,强调了预测模型中综合健康决定因素的重要性。尽管取得了这些进步,但仍存在一些局限性,包括未纳入基因数据、心血管疾病分类过宽,以及需要进行公平性分析以确保模型在不同人群中的公平表现。未来的工作应完善临床和社会变量测量,采用先进的估算技术,并探索更多的预测算法,以提高模型的精确性和公平性:本研究证明了 "我们所有人 "的多样化数据集在开发多模式预测模型以预测不列颠哥伦比亚省幸存者心血管疾病方面的作用。数据整合管道和后续预测模型为未来个性化医疗保健研究奠定了方法论基础。
{"title":"Multi-modality risk prediction of cardiovascular diseases for breast cancer cohort in the All of Us Research Program.","authors":"Han Yang, Sicheng Zhou, Zexi Rao, Chen Zhao, Erjia Cui, Chetan Shenoy, Anne H Blaes, Nishitha Paidimukkala, Jinhua Wang, Jue Hou, Rui Zhang","doi":"10.1093/jamia/ocae199","DOIUrl":"https://doi.org/10.1093/jamia/ocae199","url":null,"abstract":"<p><strong>Objective: </strong>This study leverages the rich diversity of the All of Us Research Program (All of Us)'s dataset to devise a predictive model for cardiovascular disease (CVD) in breast cancer (BC) survivors. Central to this endeavor is the creation of a robust data integration pipeline that synthesizes electronic health records (EHRs), patient surveys, and genomic data, while upholding fairness across demographic variables.</p><p><strong>Materials and methods: </strong>We have developed a universal data wrangling pipeline to process and merge heterogeneous data sources of the All of Us dataset, address missingness and variance in data, and align disparate data modalities into a coherent framework for analysis. Utilizing a composite feature set including EHR, lifestyle, and social determinants of health (SDoH) data, we then employed Adaptive Lasso and Random Forest regression models to predict 6 CVD outcomes. The models were evaluated using the c-index and time-dependent Area Under the Receiver Operating Characteristic Curve over a 10-year period.</p><p><strong>Results: </strong>The Adaptive Lasso model showed consistent performance across most CVD outcomes, while the Random Forest model excelled particularly in predicting outcomes like transient ischemic attack when incorporating the full multi-model feature set. Feature importance analysis revealed age and previous coronary events as dominant predictors across CVD outcomes, with SDoH clustering labels highlighting the nuanced impact of social factors.</p><p><strong>Discussion: </strong>The development of both Cox-based predictive model and Random Forest Regression model represents the extensive application of the All of Us, in integrating EHR and patient surveys to enhance precision medicine. And the inclusion of SDoH clustering labels revealed the significant impact of sociobehavioral factors on patient outcomes, emphasizing the importance of comprehensive health determinants in predictive models. Despite these advancements, limitations include the exclusion of genetic data, broad categorization of CVD conditions, and the need for fairness analyses to ensure equitable model performance across diverse populations. Future work should refine clinical and social variable measurements, incorporate advanced imputation techniques, and explore additional predictive algorithms to enhance model precision and fairness.</p><p><strong>Conclusion: </strong>This study demonstrates the liability of the All of Us's diverse dataset in developing a multi-modality predictive model for CVD in BC survivors risk stratification in oncological survivorship. The data integration pipeline and subsequent predictive models establish a methodological foundation for future research into personalized healthcare.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141767875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shawna Beese, Demetrius A Abshire, Trey L DeJong, Jason T Carbone
Objectives: To evaluate the NIH All of Us Research Program database as a potential data source for studying allostatic load and stress among adults in the United States (US).
Materials and methods: We evaluated the All of Us database to determine sample size significance for original-10 allostatic load biomarkers, Allostatic Load Index-5 (ALI-5), Allostatic Load Five, and Cohen's Perceived Stress Scale (PSS). We conducted a priori, post hoc, and sensitivity power analyses to determine sample sizes for conducting null hypothesis significance tests.
Results: The maximum number of responses available for each measure is 21 participants for the original-10 allostatic load biomarkers, 150 for the ALI-5, 22 476 for Allostatic Load Five, and n = 90 583 for the PSS.
Discussion: The NIH All of Us Research Program is well-suited for studying allostatic load using the Allostatic Load Five and psychological stress using PSS.
Conclusion: Improving biomarker data collection in All of Us will facilitate more nuanced examinations of allostatic load among US adults.
{"title":"An evaluation of the All of Us Research Program database to examine cumulative stress.","authors":"Shawna Beese, Demetrius A Abshire, Trey L DeJong, Jason T Carbone","doi":"10.1093/jamia/ocae201","DOIUrl":"https://doi.org/10.1093/jamia/ocae201","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the NIH All of Us Research Program database as a potential data source for studying allostatic load and stress among adults in the United States (US).</p><p><strong>Materials and methods: </strong>We evaluated the All of Us database to determine sample size significance for original-10 allostatic load biomarkers, Allostatic Load Index-5 (ALI-5), Allostatic Load Five, and Cohen's Perceived Stress Scale (PSS). We conducted a priori, post hoc, and sensitivity power analyses to determine sample sizes for conducting null hypothesis significance tests.</p><p><strong>Results: </strong>The maximum number of responses available for each measure is 21 participants for the original-10 allostatic load biomarkers, 150 for the ALI-5, 22 476 for Allostatic Load Five, and n = 90 583 for the PSS.</p><p><strong>Discussion: </strong>The NIH All of Us Research Program is well-suited for studying allostatic load using the Allostatic Load Five and psychological stress using PSS.</p><p><strong>Conclusion: </strong>Improving biomarker data collection in All of Us will facilitate more nuanced examinations of allostatic load among US adults.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141767874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Janna Ter Meer, Royan Kamyar, Christina Orlovsky, Ting-Yang Hung, Tamara Benrey, Ethan Dinh-Luong, Giorgio Quer, Julia Moore Vogel
Objective: Summaries of health research can be a complementary way to return value to participants. We assess how research participants engage with summaries via email communication and how this can be improved.
Materials and methods: We look at correlations between demographic subgroups and engagement in a longitudinal dataset of 305 626 participants (77% are classified as underrepresented in biomedical research) from the All of Us Research Program. We compare this against engagement with other program communications and use impact evaluations (N = 421 510) to measure the effect of tailoring communication by (1) eliciting content preferences, (2) Spanish focused content, (3) informational videos, and (4) article content in the email subject line.
Results: Between March 2020 and October 2021, research summaries reached 67% of enrolled participants, outperforming other program communication (60%) and return of results (31%), which have a high uptake rate but have been extended to a subset of eligible participants. While all demographic subgroups engage with research summaries, participants with higher income, educational attainment, White, and older than 45 years open and click content most often. Surfacing article content in the email subject line and Spanish focused content had negative effects on engagement. Video and social media content and eliciting preferences led to a small directional increase in clicks.
Discussion: Further individualization of tailoring efforts may be needed to drive larger engagement effects (eg, delivering multiple articles in line with stated preferences, expanding preference options). Our findings are likely a conservative representation of engagement effects, given the coarseness of our click rate measure.
Conclusions: Health research summaries show promise as a way to return value to research participants, especially if individual-level results cannot be returned. Personalization of communication requires testing to determine whether efforts are having the expected effect.
{"title":"Engagement with health research summaries via digital communication to All of Us participants.","authors":"Janna Ter Meer, Royan Kamyar, Christina Orlovsky, Ting-Yang Hung, Tamara Benrey, Ethan Dinh-Luong, Giorgio Quer, Julia Moore Vogel","doi":"10.1093/jamia/ocae185","DOIUrl":"https://doi.org/10.1093/jamia/ocae185","url":null,"abstract":"<p><strong>Objective: </strong>Summaries of health research can be a complementary way to return value to participants. We assess how research participants engage with summaries via email communication and how this can be improved.</p><p><strong>Materials and methods: </strong>We look at correlations between demographic subgroups and engagement in a longitudinal dataset of 305 626 participants (77% are classified as underrepresented in biomedical research) from the All of Us Research Program. We compare this against engagement with other program communications and use impact evaluations (N = 421 510) to measure the effect of tailoring communication by (1) eliciting content preferences, (2) Spanish focused content, (3) informational videos, and (4) article content in the email subject line.</p><p><strong>Results: </strong>Between March 2020 and October 2021, research summaries reached 67% of enrolled participants, outperforming other program communication (60%) and return of results (31%), which have a high uptake rate but have been extended to a subset of eligible participants. While all demographic subgroups engage with research summaries, participants with higher income, educational attainment, White, and older than 45 years open and click content most often. Surfacing article content in the email subject line and Spanish focused content had negative effects on engagement. Video and social media content and eliciting preferences led to a small directional increase in clicks.</p><p><strong>Discussion: </strong>Further individualization of tailoring efforts may be needed to drive larger engagement effects (eg, delivering multiple articles in line with stated preferences, expanding preference options). Our findings are likely a conservative representation of engagement effects, given the coarseness of our click rate measure.</p><p><strong>Conclusions: </strong>Health research summaries show promise as a way to return value to research participants, especially if individual-level results cannot be returned. Personalization of communication requires testing to determine whether efforts are having the expected effect.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141762142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objectives: Despite easy-to-use tools like the Cohort Builder, using All of Us Research Program data for complex research questions requires a relatively high level of technical expertise. We aimed to increase research and training capacity and reduce barriers to entry for the All of Us community through an R package, allofus. In this article, we describe functions that address common challenges we encountered while working with All of Us Research Program data, and we demonstrate this functionality with an example of creating a cohort of All of Us participants by synthesizing electronic health record and survey data with time dependencies.
Target audience: All of Us Research Program data are widely available to health researchers. The allofus R package is aimed at a wide range of researchers who wish to conduct complex analyses using best practices for reproducibility and transparency, and who have a range of experience using R. Because the All of Us data are transformed into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), researchers familiar with existing OMOP CDM tools or who wish to conduct network studies in conjunction with other OMOP CDM data will also find value in the package.
Scope: We developed an initial set of functions that solve problems we experienced across survey and electronic health record data in our own research and in mentoring student projects. The package will continue to grow and develop with the All of Us Research Program. The allofus R package can help build community research capacity by increasing access to the All of Us Research Program data, the efficiency of its use, and the rigor and reproducibility of the resulting research.
目标:尽管有队列生成器等易于使用的工具,但使用 "我们所有人 "研究计划的数据来解决复杂的研究问题需要相对较高的专业技术水平。我们的目标是通过 R 软件包 allofus 提高研究和培训能力,减少 "我们所有人 "社区的准入门槛。在本文中,我们将介绍一些功能,这些功能可解决我们在使用我们所有人研究计划数据时遇到的常见难题,我们还将以通过综合电子健康记录和调查数据来创建我们所有人参与者队列的例子来演示这些功能:我们所有人研究计划的数据可供健康研究人员广泛使用。allofus R 软件包的目标受众是希望使用可重复性和透明度方面的最佳实践进行复杂分析,并具有一定 R 使用经验的广大研究人员。由于 All of Us 数据已转化为观察性医疗结果合作组织通用数据模型(OMOP CDM),因此熟悉现有 OMOP CDM 工具或希望结合其他 OMOP CDM 数据进行网络研究的研究人员也会发现该软件包的价值:我们开发了一套初步功能,以解决我们在自己的研究和指导学生项目中遇到的调查和电子健康记录数据问题。该软件包将随着 "我们所有人 "研究计划继续成长和发展。allofus R 软件包可以提高对 "我们所有人研究计划 "数据的访问、使用效率以及研究的严谨性和可重复性,从而帮助提高社区研究能力。
{"title":"allofus: an R package to facilitate use of the All of Us Researcher Workbench.","authors":"Louisa H Smith, Robert Cavanaugh","doi":"10.1093/jamia/ocae198","DOIUrl":"https://doi.org/10.1093/jamia/ocae198","url":null,"abstract":"<p><strong>Objectives: </strong>Despite easy-to-use tools like the Cohort Builder, using All of Us Research Program data for complex research questions requires a relatively high level of technical expertise. We aimed to increase research and training capacity and reduce barriers to entry for the All of Us community through an R package, allofus. In this article, we describe functions that address common challenges we encountered while working with All of Us Research Program data, and we demonstrate this functionality with an example of creating a cohort of All of Us participants by synthesizing electronic health record and survey data with time dependencies.</p><p><strong>Target audience: </strong>All of Us Research Program data are widely available to health researchers. The allofus R package is aimed at a wide range of researchers who wish to conduct complex analyses using best practices for reproducibility and transparency, and who have a range of experience using R. Because the All of Us data are transformed into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), researchers familiar with existing OMOP CDM tools or who wish to conduct network studies in conjunction with other OMOP CDM data will also find value in the package.</p><p><strong>Scope: </strong>We developed an initial set of functions that solve problems we experienced across survey and electronic health record data in our own research and in mentoring student projects. The package will continue to grow and develop with the All of Us Research Program. The allofus R package can help build community research capacity by increasing access to the All of Us Research Program data, the efficiency of its use, and the rigor and reproducibility of the resulting research.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}