Louisa A Stark, Kristin E Fenker, Harini Krishnan, Molly Malone, Rebecca J Peterson, Regina Cowan, Jeremy Ensrud, Hector Gamboa, Crstina Gayed, Patricia Refino, Tia Tolk, Teresa Walters, Yong Crosby, Rubin Baskir
Objectives: We describe new curriculum materials for engaging secondary school students in exploring the "big data" in the NIH All of Us Research Program's Public Data Browser and the co-design processes used to collaboratively develop the materials. We also describe the methods used to develop and validate assessment items for studying the efficacy of the materials for student learning as well as preliminary findings from these studies.
Materials and methods: Secondary-level biology teachers from across the United States participated in a 2.5-day Co-design Summer Institute. After learning about the All of Us Research Program and its Data Browser, they collaboratively developed learning objectives and initial ideas for learning experiences related to exploring the Data Browser and big data. The Genetic Science Learning Center team at the University of Utah further developed the educators' ideas. Additional teachers and their students participated in classroom pilot studies to validate a 22-item instrument that assesses students' knowledge. Educators completed surveys about the materials and their experiences.
Results: The "Exploring Big Data with the All of Us Data Browser" curriculum module includes 3 data exploration guides that engage students in using the Data Browser, 3 related multimedia pieces, and teacher support materials. Pilot testing showed substantial growth in students' understanding of key big data concepts and research applications.
Discussion and conclusion: Our co-design process provides a model for educator engagement. The new curriculum module serves as a model for introducing secondary students to big data and precision medicine research by exploring diverse real-world datasets.
{"title":"Research to classrooms: a co-designed curriculum brings All of Us data to secondary schools.","authors":"Louisa A Stark, Kristin E Fenker, Harini Krishnan, Molly Malone, Rebecca J Peterson, Regina Cowan, Jeremy Ensrud, Hector Gamboa, Crstina Gayed, Patricia Refino, Tia Tolk, Teresa Walters, Yong Crosby, Rubin Baskir","doi":"10.1093/jamia/ocae167","DOIUrl":"10.1093/jamia/ocae167","url":null,"abstract":"<p><strong>Objectives: </strong>We describe new curriculum materials for engaging secondary school students in exploring the \"big data\" in the NIH All of Us Research Program's Public Data Browser and the co-design processes used to collaboratively develop the materials. We also describe the methods used to develop and validate assessment items for studying the efficacy of the materials for student learning as well as preliminary findings from these studies.</p><p><strong>Materials and methods: </strong>Secondary-level biology teachers from across the United States participated in a 2.5-day Co-design Summer Institute. After learning about the All of Us Research Program and its Data Browser, they collaboratively developed learning objectives and initial ideas for learning experiences related to exploring the Data Browser and big data. The Genetic Science Learning Center team at the University of Utah further developed the educators' ideas. Additional teachers and their students participated in classroom pilot studies to validate a 22-item instrument that assesses students' knowledge. Educators completed surveys about the materials and their experiences.</p><p><strong>Results: </strong>The \"Exploring Big Data with the All of Us Data Browser\" curriculum module includes 3 data exploration guides that engage students in using the Data Browser, 3 related multimedia pieces, and teacher support materials. Pilot testing showed substantial growth in students' understanding of key big data concepts and research applications.</p><p><strong>Discussion and conclusion: </strong>Our co-design process provides a model for educator engagement. The new curriculum module serves as a model for introducing secondary students to big data and precision medicine research by exploring diverse real-world datasets.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141564952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objectives: To highlight the use of calibration weighting to improve the precision of estimates obtained from All of Us data and increase the return of value to communities from the All of Us Research Program.
Materials and methods: We used All of Us (2017-2022) data and raking to obtain prevalence estimates in two examples: discrimination in medical settings (N = 41 875) and food insecurity (N = 82 266). Weights were constructed using known population proportions (age, sex, race/ethnicity, region of residence, annual household income, and home ownership) from the 2020 National Health Interview Survey.
Results: About 37% of adults experienced discrimination in a medical setting. About 20% of adults who had not seen a doctor reported being food insecure compared with 14% of adults who regularly saw a doctor.
Conclusions: Calibration using raking is cost-effective and may lead to more precise estimates when analyzing All of Us data.
{"title":"Use of calibration to improve the precision of estimates obtained from All of Us data.","authors":"Vivian Hsing-Chun Wang, Julie Holm, José A Pagán","doi":"10.1093/jamia/ocae181","DOIUrl":"https://doi.org/10.1093/jamia/ocae181","url":null,"abstract":"<p><strong>Objectives: </strong>To highlight the use of calibration weighting to improve the precision of estimates obtained from All of Us data and increase the return of value to communities from the All of Us Research Program.</p><p><strong>Materials and methods: </strong>We used All of Us (2017-2022) data and raking to obtain prevalence estimates in two examples: discrimination in medical settings (N = 41 875) and food insecurity (N = 82 266). Weights were constructed using known population proportions (age, sex, race/ethnicity, region of residence, annual household income, and home ownership) from the 2020 National Health Interview Survey.</p><p><strong>Results: </strong>About 37% of adults experienced discrimination in a medical setting. About 20% of adults who had not seen a doctor reported being food insecure compared with 14% of adults who regularly saw a doctor.</p><p><strong>Conclusions: </strong>Calibration using raking is cost-effective and may lead to more precise estimates when analyzing All of Us data.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141564953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rachel M Ancona, Benjamin P Cooper, Randi Foraker, Taylor Kaser, Opeolu Adeoye, Kristen L Mueller
Objectives: To improve firearm injury encounter classification (new vs follow-up) using machine learning (ML) and compare our ML model to other common approaches.
Materials and methods: This retrospective study used data from the St Louis region-wide hospital-based violence intervention program data repository (2010-2020). We randomly selected 500 patients with a firearm injury diagnosis for inclusion, with 808 total firearm injury encounters split (70/30) for training and testing. We trained a least absolute shrinkage and selection operator (LASSO) regression model with the following predictors: admission type, time between firearm injury visits, number of prior firearm injury emergency department (ED) visits, encounter type (ED or other), and diagnostic codes. Our gold standard for new firearm injury encounter classification was manual chart review. We then used our test data to compare the performance of our ML model to other commonly used approaches (proxy measures of ED visits and time between firearm injury encounters, and diagnostic code encounter type designation [initial vs subsequent or sequela]). Performance metrics included area under the curve (AUC), sensitivity, and specificity with 95% confidence intervals (CIs).
Results: The ML model had excellent discrimination (0.92, 0.88-0.96) with high sensitivity (0.95, 0.90-0.98) and specificity (0.89, 0.81-0.95). AUC was significantly higher than time-based outcomes, sensitivity was slightly (but not significantly) lower than other approaches, and specificity was higher than all other methods.
Discussion: ML successfully delineated new firearm injury encounters, outperforming other approaches in ruling out encounters for follow-up.
Conclusion: ML can be used to identify new firearm injury encounters and may be particularly useful in studies assessing re-injuries.
{"title":"Machine learning classification of new firearm injury encounters in the St Louis region: 2010-2020.","authors":"Rachel M Ancona, Benjamin P Cooper, Randi Foraker, Taylor Kaser, Opeolu Adeoye, Kristen L Mueller","doi":"10.1093/jamia/ocae173","DOIUrl":"https://doi.org/10.1093/jamia/ocae173","url":null,"abstract":"<p><strong>Objectives: </strong>To improve firearm injury encounter classification (new vs follow-up) using machine learning (ML) and compare our ML model to other common approaches.</p><p><strong>Materials and methods: </strong>This retrospective study used data from the St Louis region-wide hospital-based violence intervention program data repository (2010-2020). We randomly selected 500 patients with a firearm injury diagnosis for inclusion, with 808 total firearm injury encounters split (70/30) for training and testing. We trained a least absolute shrinkage and selection operator (LASSO) regression model with the following predictors: admission type, time between firearm injury visits, number of prior firearm injury emergency department (ED) visits, encounter type (ED or other), and diagnostic codes. Our gold standard for new firearm injury encounter classification was manual chart review. We then used our test data to compare the performance of our ML model to other commonly used approaches (proxy measures of ED visits and time between firearm injury encounters, and diagnostic code encounter type designation [initial vs subsequent or sequela]). Performance metrics included area under the curve (AUC), sensitivity, and specificity with 95% confidence intervals (CIs).</p><p><strong>Results: </strong>The ML model had excellent discrimination (0.92, 0.88-0.96) with high sensitivity (0.95, 0.90-0.98) and specificity (0.89, 0.81-0.95). AUC was significantly higher than time-based outcomes, sensitivity was slightly (but not significantly) lower than other approaches, and specificity was higher than all other methods.</p><p><strong>Discussion: </strong>ML successfully delineated new firearm injury encounters, outperforming other approaches in ruling out encounters for follow-up.</p><p><strong>Conclusion: </strong>ML can be used to identify new firearm injury encounters and may be particularly useful in studies assessing re-injuries.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141560218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jifan Gao, Philip Mar, Zheng-Zheng Tang, Guanhua Chen
Objective: This study aims to develop machine learning models that provide both accurate and equitable predictions of 2-year stroke risk for patients with atrial fibrillation across diverse racial groups.
Materials and methods: Our study utilized structured electronic health records (EHR) data from the All of Us Research Program. Machine learning models (LightGBM) were utilized to capture the relations between stroke risks and the predictors used by the widely recognized CHADS2 and CHA2DS2-VASc scores. We mitigated the racial disparity by creating a representative tuning set, customizing tuning criteria, and setting binary thresholds separately for subgroups. We constructed a hold-out test set that not only supports temporal validation but also includes a larger proportion of Black/African Americans for fairness validation.
Results: Compared to the original CHADS2 and CHA2DS2-VASc scores, significant improvements were achieved by modeling their predictors using machine learning models (Area Under the Receiver Operating Characteristic curve from near 0.70 to above 0.80). Furthermore, applying our disparity mitigation strategies can effectively enhance model fairness compared to the conventional cross-validation approach.
Discussion: Modeling CHADS2 and CHA2DS2-VASc risk factors with LightGBM and our disparity mitigation strategies achieved decent discriminative performance and excellent fairness performance. In addition, this approach can provide a complete interpretation of each predictor. These highlight its potential utility in clinical practice.
Conclusions: Our research presents a practical example of addressing clinical challenges through the All of Us Research Program data. The disparity mitigation framework we proposed is adaptable across various models and data modalities, demonstrating broad potential in clinical informatics.
{"title":"Fair prediction of 2-year stroke risk in patients with atrial fibrillation.","authors":"Jifan Gao, Philip Mar, Zheng-Zheng Tang, Guanhua Chen","doi":"10.1093/jamia/ocae170","DOIUrl":"https://doi.org/10.1093/jamia/ocae170","url":null,"abstract":"<p><strong>Objective: </strong>This study aims to develop machine learning models that provide both accurate and equitable predictions of 2-year stroke risk for patients with atrial fibrillation across diverse racial groups.</p><p><strong>Materials and methods: </strong>Our study utilized structured electronic health records (EHR) data from the All of Us Research Program. Machine learning models (LightGBM) were utilized to capture the relations between stroke risks and the predictors used by the widely recognized CHADS2 and CHA2DS2-VASc scores. We mitigated the racial disparity by creating a representative tuning set, customizing tuning criteria, and setting binary thresholds separately for subgroups. We constructed a hold-out test set that not only supports temporal validation but also includes a larger proportion of Black/African Americans for fairness validation.</p><p><strong>Results: </strong>Compared to the original CHADS2 and CHA2DS2-VASc scores, significant improvements were achieved by modeling their predictors using machine learning models (Area Under the Receiver Operating Characteristic curve from near 0.70 to above 0.80). Furthermore, applying our disparity mitigation strategies can effectively enhance model fairness compared to the conventional cross-validation approach.</p><p><strong>Discussion: </strong>Modeling CHADS2 and CHA2DS2-VASc risk factors with LightGBM and our disparity mitigation strategies achieved decent discriminative performance and excellent fairness performance. In addition, this approach can provide a complete interpretation of each predictor. These highlight its potential utility in clinical practice.</p><p><strong>Conclusions: </strong>Our research presents a practical example of addressing clinical challenges through the All of Us Research Program data. The disparity mitigation framework we proposed is adaptable across various models and data modalities, demonstrating broad potential in clinical informatics.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141499494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Balu Bhasuran, Katharina Schmolly, Yuvraaj Kapoor, Nanditha Lakshmi Jayakumar, Raymond Doan, Jigar Amin, Stephen Meninger, Nathan Cheng, Robert Deering, Karl Anderson, Simon W Beaven, Bruce Wang, Vivek A Rudrapatna
Background: Acute hepatic porphyria (AHP) is a group of rare but treatable conditions associated with diagnostic delays of 15 years on average. The advent of electronic health records (EHR) data and machine learning (ML) may improve the timely recognition of rare diseases like AHP. However, prediction models can be difficult to train given the limited case numbers, unstructured EHR data, and selection biases intrinsic to healthcare delivery. We sought to train and characterize models for identifying patients with AHP.
Methods: This diagnostic study used structured and notes-based EHR data from 2 centers at the University of California, UCSF (2012-2022) and UCLA (2019-2022). The data were split into 2 cohorts (referral and diagnosis) and used to develop models that predict (1) who will be referred for testing of acute porphyria, among those who presented with abdominal pain (a cardinal symptom of AHP), and (2) who will test positive, among those referred. The referral cohort consisted of 747 patients referred for testing and 99 849 contemporaneous patients who were not. The diagnosis cohort consisted of 72 confirmed AHP cases and 347 patients who tested negative. The case cohort was 81% female and 6-75 years old at the time of diagnosis. Candidate models used a range of architectures. Feature selection was semi-automated and incorporated publicly available data from knowledge graphs. Our primary outcome was the F-score on an outcome-stratified test set.
Results: The best center-specific referral models achieved an F-score of 86%-91%. The best diagnosis model achieved an F-score of 92%. To further test our model, we contacted 372 current patients who lack an AHP diagnosis but were predicted by our models as potentially having it (≥10% probability of referral, ≥50% of testing positive). However, we were only able to recruit 10 of these patients for biochemical testing, all of whom were negative. Nonetheless, post hoc evaluations suggested that these models could identify 71% of cases earlier than their diagnosis date, saving 1.2 years.
Conclusions: ML can reduce diagnostic delays in AHP and other rare diseases. Robust recruitment strategies and multicenter coordination will be needed to validate these models before they can be deployed.
{"title":"Reducing diagnostic delays in acute hepatic porphyria using health records data and machine learning.","authors":"Balu Bhasuran, Katharina Schmolly, Yuvraaj Kapoor, Nanditha Lakshmi Jayakumar, Raymond Doan, Jigar Amin, Stephen Meninger, Nathan Cheng, Robert Deering, Karl Anderson, Simon W Beaven, Bruce Wang, Vivek A Rudrapatna","doi":"10.1093/jamia/ocae141","DOIUrl":"10.1093/jamia/ocae141","url":null,"abstract":"<p><strong>Background: </strong>Acute hepatic porphyria (AHP) is a group of rare but treatable conditions associated with diagnostic delays of 15 years on average. The advent of electronic health records (EHR) data and machine learning (ML) may improve the timely recognition of rare diseases like AHP. However, prediction models can be difficult to train given the limited case numbers, unstructured EHR data, and selection biases intrinsic to healthcare delivery. We sought to train and characterize models for identifying patients with AHP.</p><p><strong>Methods: </strong>This diagnostic study used structured and notes-based EHR data from 2 centers at the University of California, UCSF (2012-2022) and UCLA (2019-2022). The data were split into 2 cohorts (referral and diagnosis) and used to develop models that predict (1) who will be referred for testing of acute porphyria, among those who presented with abdominal pain (a cardinal symptom of AHP), and (2) who will test positive, among those referred. The referral cohort consisted of 747 patients referred for testing and 99 849 contemporaneous patients who were not. The diagnosis cohort consisted of 72 confirmed AHP cases and 347 patients who tested negative. The case cohort was 81% female and 6-75 years old at the time of diagnosis. Candidate models used a range of architectures. Feature selection was semi-automated and incorporated publicly available data from knowledge graphs. Our primary outcome was the F-score on an outcome-stratified test set.</p><p><strong>Results: </strong>The best center-specific referral models achieved an F-score of 86%-91%. The best diagnosis model achieved an F-score of 92%. To further test our model, we contacted 372 current patients who lack an AHP diagnosis but were predicted by our models as potentially having it (≥10% probability of referral, ≥50% of testing positive). However, we were only able to recruit 10 of these patients for biochemical testing, all of whom were negative. Nonetheless, post hoc evaluations suggested that these models could identify 71% of cases earlier than their diagnosis date, saving 1.2 years.</p><p><strong>Conclusions: </strong>ML can reduce diagnostic delays in AHP and other rare diseases. Robust recruitment strategies and multicenter coordination will be needed to validate these models before they can be deployed.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141472084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emma A M Stanley, Raissa Souza, Anthony J Winder, Vedant Gulve, Kimberly Amador, Matthias Wilms, Nils D Forkert
Objective: Artificial intelligence (AI) models trained using medical images for clinical tasks often exhibit bias in the form of subgroup performance disparities. However, since not all sources of bias in real-world medical imaging data are easily identifiable, it is challenging to comprehensively assess their impacts. In this article, we introduce an analysis framework for systematically and objectively investigating the impact of biases in medical images on AI models.
Materials and methods: Our framework utilizes synthetic neuroimages with known disease effects and sources of bias. We evaluated the impact of bias effects and the efficacy of 3 bias mitigation strategies in counterfactual data scenarios on a convolutional neural network (CNN) classifier.
Results: The analysis revealed that training a CNN model on the datasets containing bias effects resulted in expected subgroup performance disparities. Moreover, reweighing was the most successful bias mitigation strategy for this setup. Finally, we demonstrated that explainable AI methods can aid in investigating the manifestation of bias in the model using this framework.
Discussion: The value of this framework is showcased in our findings on the impact of bias scenarios and efficacy of bias mitigation in a deep learning model pipeline. This systematic analysis can be easily expanded to conduct further controlled in silico trials in other investigations of bias in medical imaging AI.
Conclusion: Our novel methodology for objectively studying bias in medical imaging AI can help support the development of clinical decision-support tools that are robust and responsible.
{"title":"Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging.","authors":"Emma A M Stanley, Raissa Souza, Anthony J Winder, Vedant Gulve, Kimberly Amador, Matthias Wilms, Nils D Forkert","doi":"10.1093/jamia/ocae165","DOIUrl":"https://doi.org/10.1093/jamia/ocae165","url":null,"abstract":"<p><strong>Objective: </strong>Artificial intelligence (AI) models trained using medical images for clinical tasks often exhibit bias in the form of subgroup performance disparities. However, since not all sources of bias in real-world medical imaging data are easily identifiable, it is challenging to comprehensively assess their impacts. In this article, we introduce an analysis framework for systematically and objectively investigating the impact of biases in medical images on AI models.</p><p><strong>Materials and methods: </strong>Our framework utilizes synthetic neuroimages with known disease effects and sources of bias. We evaluated the impact of bias effects and the efficacy of 3 bias mitigation strategies in counterfactual data scenarios on a convolutional neural network (CNN) classifier.</p><p><strong>Results: </strong>The analysis revealed that training a CNN model on the datasets containing bias effects resulted in expected subgroup performance disparities. Moreover, reweighing was the most successful bias mitigation strategy for this setup. Finally, we demonstrated that explainable AI methods can aid in investigating the manifestation of bias in the model using this framework.</p><p><strong>Discussion: </strong>The value of this framework is showcased in our findings on the impact of bias scenarios and efficacy of bias mitigation in a deep learning model pipeline. This systematic analysis can be easily expanded to conduct further controlled in silico trials in other investigations of bias in medical imaging AI.</p><p><strong>Conclusion: </strong>Our novel methodology for objectively studying bias in medical imaging AI can help support the development of clinical decision-support tools that are robust and responsible.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141472085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maura Beaton, Xinzhuo Jiang, Elise Minto, Chun Yee Lau, Lennon Turner, George Hripcsak, Kanchan Chaudhari, Karthik Natarajan
Objective: To evaluate the use of patient portal messaging to recruit individuals historically underrepresented in biomedical research (UBR) to the All of Us Research Program (AoURP) at a single recruitment site.
Materials and methods: Patient portal-based recruitment was implemented at Columbia University Irving Medical Center. Patient engagement was assessed using patient's electronic health record (EHR) at four recruitment stages: Consenting to be contacted, opening messages, responding to messages, and showing interest in participating. Demographic and socioeconomic data were also collected from patient's EHR and univariate logistic regression analyses were conducted to assess patient engagement.
Results: Between October 2022 and November 2023, a total of 59 592 patients received patient portal messages inviting them to join the AoURP. Among them, 24 445 (41.0%) opened the message, 8983 (15.1%) responded, and 3765 (6.3%) showed interest in joining the program. Though we were unable to link enrollment data with EHR data, we estimate about 2% of patients contacted ultimately enrolled in the AoURP. Patients from underrepresented race and ethnicity communities had lower odds of consenting to be contacted and opening messages, but higher odds of showing interest after responding.
Discussion: Patient portal messaging provided both patients and recruitment staff with a more efficient approach to outreach, but patterns of engagement varied across UBR groups.
Conclusion: Patient portal-based recruitment enables researchers to contact a substantial number of participants from diverse communities. However, more effort is needed to improve engagement from underrepresented racial and ethnic groups at the early stages of the recruitment process.
{"title":"Using patient portals for large-scale recruitment of individuals underrepresented in biomedical research: an evaluation of engagement patterns throughout the patient portal recruitment process at a single site within the All of Us Research Program.","authors":"Maura Beaton, Xinzhuo Jiang, Elise Minto, Chun Yee Lau, Lennon Turner, George Hripcsak, Kanchan Chaudhari, Karthik Natarajan","doi":"10.1093/jamia/ocae135","DOIUrl":"https://doi.org/10.1093/jamia/ocae135","url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the use of patient portal messaging to recruit individuals historically underrepresented in biomedical research (UBR) to the All of Us Research Program (AoURP) at a single recruitment site.</p><p><strong>Materials and methods: </strong>Patient portal-based recruitment was implemented at Columbia University Irving Medical Center. Patient engagement was assessed using patient's electronic health record (EHR) at four recruitment stages: Consenting to be contacted, opening messages, responding to messages, and showing interest in participating. Demographic and socioeconomic data were also collected from patient's EHR and univariate logistic regression analyses were conducted to assess patient engagement.</p><p><strong>Results: </strong>Between October 2022 and November 2023, a total of 59 592 patients received patient portal messages inviting them to join the AoURP. Among them, 24 445 (41.0%) opened the message, 8983 (15.1%) responded, and 3765 (6.3%) showed interest in joining the program. Though we were unable to link enrollment data with EHR data, we estimate about 2% of patients contacted ultimately enrolled in the AoURP. Patients from underrepresented race and ethnicity communities had lower odds of consenting to be contacted and opening messages, but higher odds of showing interest after responding.</p><p><strong>Discussion: </strong>Patient portal messaging provided both patients and recruitment staff with a more efficient approach to outreach, but patterns of engagement varied across UBR groups.</p><p><strong>Conclusion: </strong>Patient portal-based recruitment enables researchers to contact a substantial number of participants from diverse communities. However, more effort is needed to improve engagement from underrepresented racial and ethnic groups at the early stages of the recruitment process.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141452054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fred Willie Zametkin LaPolla, Marco Barber Grossi, Sharon Chen, Tai Wei Guo, Kathryn Havranek, Olivia Jebb, Minh Thu Nguyen, Sneha Panganamamula, Noah Smith, Shree Sundaresh, Jonathan Yu, Gabrielle Mayer
Objectives: The goal of this case report is to detail experiences and challenges experienced in the training of Primary Care residents in secondary analysis using All of Us Researcher Workbench. At our large, urban safety net hospital, Primary Care/Internal Medicine residents in their third year undergo a research intensive block, the Research Practicum, where they work as a team to conduct secondary data analysis on a dataset with faculty facilitation. In 2023, this research block focused on use of the All of Us Researcher Workbench for secondary data analysis.
Materials and methods: Two groups of 5 residents underwent training to access the All of Us Researcher Workbench, and each group explored available data with a faculty facilitator and generated original research questions. Two blocks of residents successfully completed their research blocks and created original presentations on "social isolation and A1C" levels and "medical discrimination and diabetes management."
Results: Departmental faculty were satisfied with the depth of learning and data exploration. In focus groups, some residents noted that for those without interest in performing research, the activity felt extraneous to their career goals, while others were glad for the opportunity to publish. In both blocks, residents highlighted dissatisfaction with the degree to which the All of Us Researcher Workbench was representative of patients they encounter in a large safety net hospital.
Discussion: Using the All of Us Researcher Workbench provided residents with an opportunity to explore novel questions in a massive data source. Many residents however noted that because the population described in the All of Us Researcher Workbench appeared to be more highly educated and less racially diverse than patients they encounter in their practice, research may be hard to generalize in a community health context. Additionally, given that the data required knowledge of 1 of 2 code-based data analysis languages (R or Python) and work within an idiosyncratic coding environment, residents were heavily reliant on a faculty facilitator to assist with analysis.
Conclusion: Using the All of Us Researcher Workbench for research training allowed residents to explore novel questions and gain first-hand exposure to opportunities and challenges in secondary data analysis.
{"title":"All of whom? Limitations encountered using All of Us Researcher Workbench in a Primary Care residents secondary data analysis research training block.","authors":"Fred Willie Zametkin LaPolla, Marco Barber Grossi, Sharon Chen, Tai Wei Guo, Kathryn Havranek, Olivia Jebb, Minh Thu Nguyen, Sneha Panganamamula, Noah Smith, Shree Sundaresh, Jonathan Yu, Gabrielle Mayer","doi":"10.1093/jamia/ocae162","DOIUrl":"https://doi.org/10.1093/jamia/ocae162","url":null,"abstract":"<p><strong>Objectives: </strong>The goal of this case report is to detail experiences and challenges experienced in the training of Primary Care residents in secondary analysis using All of Us Researcher Workbench. At our large, urban safety net hospital, Primary Care/Internal Medicine residents in their third year undergo a research intensive block, the Research Practicum, where they work as a team to conduct secondary data analysis on a dataset with faculty facilitation. In 2023, this research block focused on use of the All of Us Researcher Workbench for secondary data analysis.</p><p><strong>Materials and methods: </strong>Two groups of 5 residents underwent training to access the All of Us Researcher Workbench, and each group explored available data with a faculty facilitator and generated original research questions. Two blocks of residents successfully completed their research blocks and created original presentations on \"social isolation and A1C\" levels and \"medical discrimination and diabetes management.\"</p><p><strong>Results: </strong>Departmental faculty were satisfied with the depth of learning and data exploration. In focus groups, some residents noted that for those without interest in performing research, the activity felt extraneous to their career goals, while others were glad for the opportunity to publish. In both blocks, residents highlighted dissatisfaction with the degree to which the All of Us Researcher Workbench was representative of patients they encounter in a large safety net hospital.</p><p><strong>Discussion: </strong>Using the All of Us Researcher Workbench provided residents with an opportunity to explore novel questions in a massive data source. Many residents however noted that because the population described in the All of Us Researcher Workbench appeared to be more highly educated and less racially diverse than patients they encounter in their practice, research may be hard to generalize in a community health context. Additionally, given that the data required knowledge of 1 of 2 code-based data analysis languages (R or Python) and work within an idiosyncratic coding environment, residents were heavily reliant on a faculty facilitator to assist with analysis.</p><p><strong>Conclusion: </strong>Using the All of Us Researcher Workbench for research training allowed residents to explore novel questions and gain first-hand exposure to opportunities and challenges in secondary data analysis.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141452050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zuotian Li, Xiang Liu, Ziyang Tang, Nanxin Jin, Pengyue Zhang, Michael T Eadon, Qianqian Song, Yingjie V Chen, Jing Su
Objective: Our objective is to develop and validate TrajVis, an interactive tool that assists clinicians in using artificial intelligence (AI) models to leverage patients' longitudinal electronic medical records (EMRs) for personalized precision management of chronic disease progression.
Materials and methods: We first perform requirement analysis with clinicians and data scientists to determine the visual analytics tasks of the TrajVis system as well as its design and functionalities. A graph AI model for chronic kidney disease (CKD) trajectory inference named DisEase PrOgression Trajectory (DEPOT) is used for system development and demonstration. TrajVis is implemented as a full-stack web application with synthetic EMR data derived from the Atrium Health Wake Forest Baptist Translational Data Warehouse and the Indiana Network for Patient Care research database. A case study with a nephrologist and a user experience survey of clinicians and data scientists are conducted to evaluate the TrajVis system.
Results: The TrajVis clinical information system is composed of 4 panels: the Patient View for demographic and clinical information, the Trajectory View to visualize the DEPOT-derived CKD trajectories in latent space, the Clinical Indicator View to elucidate longitudinal patterns of clinical features and interpret DEPOT predictions, and the Analysis View to demonstrate personal CKD progression trajectories. System evaluations suggest that TrajVis supports clinicians in summarizing clinical data, identifying individualized risk predictors, and visualizing patient disease progression trajectories, overcoming the barriers of AI implementation in healthcare.
Discussion: The TrajVis system provides a novel visualization solution which is complimentary to other risk estimators such as the Kidney Failure Risk Equations.
Conclusion: TrajVis bridges the gap between the fast-growing AI/ML modeling and the clinical use of such models for personalized and precision management of chronic diseases.
目标:我们的目标是开发并验证 TrajVis,这是一种交互式工具,可协助临床医生使用人工智能(AI)模型,利用患者的纵向电子病历(EMR)对慢性疾病进展进行个性化精准管理:我们首先与临床医生和数据科学家一起进行了需求分析,以确定 TrajVis 系统的可视化分析任务及其设计和功能。用于慢性肾脏病(CKD)轨迹推断的图人工智能模型被命名为 "疾病进展轨迹"(DEPOT),用于系统开发和演示。TrajVis 是作为一个全栈网络应用程序实施的,其合成 EMR 数据来自 Atrium Health Wake Forest Baptist Translational Data Warehouse 和 Indiana Network for Patient Care 研究数据库。为了评估 TrajVis 系统,我们对一名肾病专家进行了案例研究,并对临床医生和数据科学家进行了用户体验调查:TrajVis 临床信息系统由 4 个面板组成:患者视图用于显示人口统计学和临床信息;轨迹视图用于显示潜空间中 DEPOT 衍生的 CKD 轨迹;临床指标视图用于阐明临床特征的纵向模式并解释 DEPOT 预测;分析视图用于展示个人 CKD 进展轨迹。系统评估表明,TrajVis 支持临床医生总结临床数据、识别个体化风险预测因素和可视化患者疾病进展轨迹,克服了在医疗保健领域实施人工智能的障碍:讨论:TrajVis 系统提供了一种新颖的可视化解决方案,与肾衰竭风险方程等其他风险评估工具相辅相成:TrajVis弥补了快速发展的人工智能/ML建模与临床使用此类模型进行个性化和精准慢性病管理之间的差距。
{"title":"TrajVis: a visual clinical decision support system to translate artificial intelligence trajectory models in the precision management of chronic kidney disease.","authors":"Zuotian Li, Xiang Liu, Ziyang Tang, Nanxin Jin, Pengyue Zhang, Michael T Eadon, Qianqian Song, Yingjie V Chen, Jing Su","doi":"10.1093/jamia/ocae158","DOIUrl":"https://doi.org/10.1093/jamia/ocae158","url":null,"abstract":"<p><strong>Objective: </strong>Our objective is to develop and validate TrajVis, an interactive tool that assists clinicians in using artificial intelligence (AI) models to leverage patients' longitudinal electronic medical records (EMRs) for personalized precision management of chronic disease progression.</p><p><strong>Materials and methods: </strong>We first perform requirement analysis with clinicians and data scientists to determine the visual analytics tasks of the TrajVis system as well as its design and functionalities. A graph AI model for chronic kidney disease (CKD) trajectory inference named DisEase PrOgression Trajectory (DEPOT) is used for system development and demonstration. TrajVis is implemented as a full-stack web application with synthetic EMR data derived from the Atrium Health Wake Forest Baptist Translational Data Warehouse and the Indiana Network for Patient Care research database. A case study with a nephrologist and a user experience survey of clinicians and data scientists are conducted to evaluate the TrajVis system.</p><p><strong>Results: </strong>The TrajVis clinical information system is composed of 4 panels: the Patient View for demographic and clinical information, the Trajectory View to visualize the DEPOT-derived CKD trajectories in latent space, the Clinical Indicator View to elucidate longitudinal patterns of clinical features and interpret DEPOT predictions, and the Analysis View to demonstrate personal CKD progression trajectories. System evaluations suggest that TrajVis supports clinicians in summarizing clinical data, identifying individualized risk predictors, and visualizing patient disease progression trajectories, overcoming the barriers of AI implementation in healthcare.</p><p><strong>Discussion: </strong>The TrajVis system provides a novel visualization solution which is complimentary to other risk estimators such as the Kidney Failure Risk Equations.</p><p><strong>Conclusion: </strong>TrajVis bridges the gap between the fast-growing AI/ML modeling and the clinical use of such models for personalized and precision management of chronic diseases.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kiana L Martinez, Andrew Klein, Jennifer R Martin, Chinwuwanuju U Sampson, Jason B Giles, Madison L Beck, Krupa Bhakta, Gino Quatraro, Juvie Farol, Jason H Karnes
Objectives: ABO blood types have widespread clinical use and robust associations with disease. The purpose of this study is to evaluate the portability and suitability of tag single-nucleotide polymorphisms (tSNPs) used to determine ABO alleles and blood types across diverse populations in published literature.
Materials and methods: Bibliographic databases were searched for studies using tSNPs to determine ABO alleles. We calculated linkage between tSNPs and functional variants across inferred continental ancestry groups from 1000 Genomes. We compared r2 across ancestry and assessed real-world consequences by comparing tSNP-derived blood types to serology in a diverse population from the All of Us Research Program.
Results: Linkage between functional variants and O allele tSNPs was significantly lower in African (median r2 = 0.443) compared to East Asian (r2 = 0.946, P = 1.1 × 10-5) and European (r2 = 0.869, P = .023) populations. In All of Us, discordance between tSNP-derived blood types and serology was high across all SNPs in African ancestry individuals and linkage was strongly correlated with discordance across all ancestries (ρ = -0.90, P = 3.08 × 10-23).
Discussion: Many studies determine ABO blood types using tSNPs. However, tSNPs with low linkage disequilibrium promote misinference of ABO blood types, particularly in diverse populations. We observe common use of inappropriate tSNPs to determine ABO blood type, particularly for O alleles and with some tSNPs mistyping up to 58% of individuals.
Conclusion: Our results highlight the lack of transferability of tSNPs across ancestries and potential exacerbation of disparities in genomic research for underrepresented populations. This is especially relevant as more diverse cohorts are made publicly available.
{"title":"Disparities in ABO blood type determination across diverse ancestries: a systematic review and validation in the All of Us Research Program.","authors":"Kiana L Martinez, Andrew Klein, Jennifer R Martin, Chinwuwanuju U Sampson, Jason B Giles, Madison L Beck, Krupa Bhakta, Gino Quatraro, Juvie Farol, Jason H Karnes","doi":"10.1093/jamia/ocae161","DOIUrl":"https://doi.org/10.1093/jamia/ocae161","url":null,"abstract":"<p><strong>Objectives: </strong>ABO blood types have widespread clinical use and robust associations with disease. The purpose of this study is to evaluate the portability and suitability of tag single-nucleotide polymorphisms (tSNPs) used to determine ABO alleles and blood types across diverse populations in published literature.</p><p><strong>Materials and methods: </strong>Bibliographic databases were searched for studies using tSNPs to determine ABO alleles. We calculated linkage between tSNPs and functional variants across inferred continental ancestry groups from 1000 Genomes. We compared r2 across ancestry and assessed real-world consequences by comparing tSNP-derived blood types to serology in a diverse population from the All of Us Research Program.</p><p><strong>Results: </strong>Linkage between functional variants and O allele tSNPs was significantly lower in African (median r2 = 0.443) compared to East Asian (r2 = 0.946, P = 1.1 × 10-5) and European (r2 = 0.869, P = .023) populations. In All of Us, discordance between tSNP-derived blood types and serology was high across all SNPs in African ancestry individuals and linkage was strongly correlated with discordance across all ancestries (ρ = -0.90, P = 3.08 × 10-23).</p><p><strong>Discussion: </strong>Many studies determine ABO blood types using tSNPs. However, tSNPs with low linkage disequilibrium promote misinference of ABO blood types, particularly in diverse populations. We observe common use of inappropriate tSNPs to determine ABO blood type, particularly for O alleles and with some tSNPs mistyping up to 58% of individuals.</p><p><strong>Conclusion: </strong>Our results highlight the lack of transferability of tSNPs across ancestries and potential exacerbation of disparities in genomic research for underrepresented populations. This is especially relevant as more diverse cohorts are made publicly available.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141452052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}