Pub Date : 2023-11-14eCollection Date: 2023-12-01DOI: 10.1093/jamiaopen/ooad098
Prashila Dullabh, Krysta K Heaney-Huls, Andrew B Chiao, Melissa G Callaham, Priyanka Desai, Nicole A Gauthreaux, Nitu Kashyap, David F Lobach, Aziz Boxwala
Remote monitoring of women experiencing hypertensive disorders of pregnancy (HDP) can provide timely life-saving data, particularly if these data are integrated into existing patient and clinical workflows. This pilot intervention of a smartphone application (app) for postpartum monitoring of hypertensive disorders integrates patient-contributed data into electronic health records (EHRs) to support monitoring and clinical decision-making. Results from the evaluation of the pilot highlight the resources needed when implementing the app, challenges for integrating an app into the EHR, and the usability and utility of the HDP monitoring app for patient and clinician users. The implementation team's key observations included the importance of a local clinical champion, more robust patient involvement and support for the remote patient monitoring program, an impetus for EHR developers to adopt data integration standards, and a need to expand the capabilities of the standards to support interventions using patient-contributed data.
{"title":"Implementation and evaluation of an electronic health record-integrated app for postpartum monitoring of hypertensive disorders of pregnancy using patient-contributed data collection.","authors":"Prashila Dullabh, Krysta K Heaney-Huls, Andrew B Chiao, Melissa G Callaham, Priyanka Desai, Nicole A Gauthreaux, Nitu Kashyap, David F Lobach, Aziz Boxwala","doi":"10.1093/jamiaopen/ooad098","DOIUrl":"10.1093/jamiaopen/ooad098","url":null,"abstract":"<p><p>Remote monitoring of women experiencing hypertensive disorders of pregnancy (HDP) can provide timely life-saving data, particularly if these data are integrated into existing patient and clinical workflows. This pilot intervention of a smartphone application (app) for postpartum monitoring of hypertensive disorders integrates patient-contributed data into electronic health records (EHRs) to support monitoring and clinical decision-making. Results from the evaluation of the pilot highlight the resources needed when implementing the app, challenges for integrating an app into the EHR, and the usability and utility of the HDP monitoring app for patient and clinician users. The implementation team's key observations included the importance of a local clinical champion, more robust patient involvement and support for the remote patient monitoring program, an impetus for EHR developers to adopt data integration standards, and a need to expand the capabilities of the standards to support interventions using patient-contributed data.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 4","pages":"ooad098"},"PeriodicalIF":2.1,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646567/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138463163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-08eCollection Date: 2023-12-01DOI: 10.1093/jamiaopen/ooad093
Matthew Cannon, James Stevenson, Kori Kuzma, Susanna Kiwala, Jeremy L Warner, Obi L Griffith, Malachi Griffith, Alex H Wagner
Objective: The diversity of nomenclature and naming strategies makes therapeutic terminology difficult to manage and harmonize. As the number and complexity of available therapeutic ontologies continues to increase, the need for harmonized cross-resource mappings is becoming increasingly apparent. This study creates harmonized concept mappings that enable the linking together of like-concepts despite source-dependent differences in data structure or semantic representation.
Materials and methods: For this study, we created Thera-Py, a Python package and web API that constructs searchable concepts for drugs and therapeutic terminologies using 9 public resources and thesauri. By using a directed graph approach, Thera-Py captures commonly used aliases, trade names, annotations, and associations for any given therapeutic and combines them under a single concept record.
Results: We highlight the creation of 16 069 unique merged therapeutic concepts from 9 distinct sources using Thera-Py and observe an increase in overlap of therapeutic concepts in 2 or more knowledge bases after harmonization using Thera-Py (9.8%-41.8%).
Conclusion: We observe that Thera-Py tends to normalize therapeutic concepts to their underlying active ingredients (excluding nondrug therapeutics, eg, radiation therapy, biologics), and unifies all available descriptors regardless of ontological origin.
{"title":"Normalization of drug and therapeutic concepts with Thera-Py.","authors":"Matthew Cannon, James Stevenson, Kori Kuzma, Susanna Kiwala, Jeremy L Warner, Obi L Griffith, Malachi Griffith, Alex H Wagner","doi":"10.1093/jamiaopen/ooad093","DOIUrl":"10.1093/jamiaopen/ooad093","url":null,"abstract":"<p><strong>Objective: </strong>The diversity of nomenclature and naming strategies makes therapeutic terminology difficult to manage and harmonize. As the number and complexity of available therapeutic ontologies continues to increase, the need for harmonized cross-resource mappings is becoming increasingly apparent. This study creates harmonized concept mappings that enable the linking together of like-concepts despite source-dependent differences in data structure or semantic representation.</p><p><strong>Materials and methods: </strong>For this study, we created Thera-Py, a Python package and web API that constructs searchable concepts for drugs and therapeutic terminologies using 9 public resources and thesauri. By using a directed graph approach, Thera-Py captures commonly used aliases, trade names, annotations, and associations for any given therapeutic and combines them under a single concept record.</p><p><strong>Results: </strong>We highlight the creation of 16 069 unique merged therapeutic concepts from 9 distinct sources using Thera-Py and observe an increase in overlap of therapeutic concepts in 2 or more knowledge bases after harmonization using Thera-Py (9.8%-41.8%).</p><p><strong>Conclusion: </strong>We observe that Thera-Py tends to normalize therapeutic concepts to their underlying active ingredients (excluding nondrug therapeutics, eg, radiation therapy, biologics), and unifies all available descriptors regardless of ontological origin.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 4","pages":"ooad093"},"PeriodicalIF":2.1,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10637840/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89719861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-02eCollection Date: 2023-12-01DOI: 10.1093/jamiaopen/ooad092
Majid Afshar, Madeline Oguss, Thomas A Callaci, Timothy Gruenloh, Preeti Gupta, Claire Sun, Askar Safipour Afshar, Joseph Cavanaugh, Matthew M Churpek, Edwin Nyakoe-Nyasani, Huong Nguyen-Hilfiger, Ryan Westergaard, Elizabeth Salisbury-Afshar, Megan Gussick, Brian Patterson, Claire Manneh, Jomol Mathew, Anoop Mayampurath
Objectives: Substance misuse is a complex and heterogeneous set of conditions associated with high mortality and regional/demographic variations. Existing data systems are siloed and have been ineffective in curtailing the substance misuse epidemic. Therefore, we aimed to build a novel informatics platform, the Substance Misuse Data Commons (SMDC), by integrating multiple data modalities to provide a unified record of information crucial to improving outcomes in substance misuse patients.
Materials and methods: The SMDC was created by linking electronic health record (EHR) data from adult cases of substance (alcohol, opioid, nonopioid drug) misuse at the University of Wisconsin hospitals to socioeconomic and state agency data. To ensure private and secure data exchange, Privacy-Preserving Record Linkage (PPRL) and Honest Broker services were utilized. The overlap in mortality reporting among the EHR, state Vital Statistics, and a commercial national data source was assessed.
Results: The SMDC included data from 36 522 patients experiencing 62 594 healthcare encounters. Over half of patients were linked to the statewide ambulance database and prescription drug monitoring program. Chronic diseases accounted for most underlying causes of death, while drug-related overdoses constituted 8%. Our analysis of mortality revealed a 49.1% overlap across the 3 data sources. Nonoverlapping deaths were associated with poor socioeconomic indicators.
Discussion: Through PPRL, the SMDC enabled the longitudinal integration of multimodal data. Combining death data from local, state, and national sources enhanced mortality tracking and exposed disparities.
Conclusion: The SMDC provides a comprehensive resource for clinical providers and policymakers to inform interventions targeting substance misuse-related hospitalizations, overdoses, and death.
{"title":"Creation of a data commons for substance misuse related health research through privacy-preserving patient record linkage between hospitals and state agencies.","authors":"Majid Afshar, Madeline Oguss, Thomas A Callaci, Timothy Gruenloh, Preeti Gupta, Claire Sun, Askar Safipour Afshar, Joseph Cavanaugh, Matthew M Churpek, Edwin Nyakoe-Nyasani, Huong Nguyen-Hilfiger, Ryan Westergaard, Elizabeth Salisbury-Afshar, Megan Gussick, Brian Patterson, Claire Manneh, Jomol Mathew, Anoop Mayampurath","doi":"10.1093/jamiaopen/ooad092","DOIUrl":"10.1093/jamiaopen/ooad092","url":null,"abstract":"<p><strong>Objectives: </strong>Substance misuse is a complex and heterogeneous set of conditions associated with high mortality and regional/demographic variations. Existing data systems are siloed and have been ineffective in curtailing the substance misuse epidemic. Therefore, we aimed to build a novel informatics platform, the Substance Misuse Data Commons (SMDC), by integrating multiple data modalities to provide a unified record of information crucial to improving outcomes in substance misuse patients.</p><p><strong>Materials and methods: </strong>The SMDC was created by linking electronic health record (EHR) data from adult cases of substance (alcohol, opioid, nonopioid drug) misuse at the University of Wisconsin hospitals to socioeconomic and state agency data. To ensure private and secure data exchange, Privacy-Preserving Record Linkage (PPRL) and Honest Broker services were utilized. The overlap in mortality reporting among the EHR, state Vital Statistics, and a commercial national data source was assessed.</p><p><strong>Results: </strong>The SMDC included data from 36 522 patients experiencing 62 594 healthcare encounters. Over half of patients were linked to the statewide ambulance database and prescription drug monitoring program. Chronic diseases accounted for most underlying causes of death, while drug-related overdoses constituted 8%. Our analysis of mortality revealed a 49.1% overlap across the 3 data sources. Nonoverlapping deaths were associated with poor socioeconomic indicators.</p><p><strong>Discussion: </strong>Through PPRL, the SMDC enabled the longitudinal integration of multimodal data. Combining death data from local, state, and national sources enhanced mortality tracking and exposed disparities.</p><p><strong>Conclusion: </strong>The SMDC provides a comprehensive resource for clinical providers and policymakers to inform interventions targeting substance misuse-related hospitalizations, overdoses, and death.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 4","pages":"ooad092"},"PeriodicalIF":2.1,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10629613/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71522848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-27eCollection Date: 2023-12-01DOI: 10.1093/jamiaopen/ooad090
Kamil Can Kural, Ilya Mazo, Mark Walderhaug, Luis Santana-Quintero, Konstantinos Karagiannis, Elaine E Thompson, Jeffrey A Kelman, Ravi Goud
Objective: Anaphylaxis is a severe life-threatening allergic reaction, and its accurate identification in healthcare databases can harness the potential of "Big Data" for healthcare or public health purposes.
Methods: This study used claims data obtained between October 1, 2015 and February 28, 2019 from the CMS database to examine the utility of machine learning in identifying incident anaphylaxis cases. We created a feature selection pipeline to identify critical features between different datasets. Then a variety of unsupervised and supervised methods were used (eg, Sammon mapping and eXtreme Gradient Boosting) to train models on datasets of differing data quality, which reflects the varying availability and potential rarity of ground truth data in medical databases.
Results: Resulting machine learning model accuracies ranged between 47.7% and 94.4% when tested on ground truth data. Finally, we found new features to help experts enhance existing case-finding algorithms.
Discussion: Developing precise algorithms to detect medical outcomes in claims can be a laborious and expensive process, particularly for conditions presented and coded diversely. We found it beneficial to filter out highly potent codes used for data curation to identify underlying patterns and features. To improve rule-based algorithms where necessary, researchers could use model explainers to determine noteworthy features, which could then be shared with experts and included in the algorithm.
Conclusion: Our work suggests machine learning models can perform at similar levels as a previously published expert case-finding algorithm, while also having the potential to improve performance or streamline algorithm construction processes by identifying new relevant features for algorithm construction.
{"title":"Using machine learning to improve anaphylaxis case identification in medical claims data.","authors":"Kamil Can Kural, Ilya Mazo, Mark Walderhaug, Luis Santana-Quintero, Konstantinos Karagiannis, Elaine E Thompson, Jeffrey A Kelman, Ravi Goud","doi":"10.1093/jamiaopen/ooad090","DOIUrl":"10.1093/jamiaopen/ooad090","url":null,"abstract":"<p><strong>Objective: </strong>Anaphylaxis is a severe life-threatening allergic reaction, and its accurate identification in healthcare databases can harness the potential of \"Big Data\" for healthcare or public health purposes.</p><p><strong>Methods: </strong>This study used claims data obtained between October 1, 2015 and February 28, 2019 from the CMS database to examine the utility of machine learning in identifying incident anaphylaxis cases. We created a feature selection pipeline to identify critical features between different datasets. Then a variety of unsupervised and supervised methods were used (eg, Sammon mapping and eXtreme Gradient Boosting) to train models on datasets of differing data quality, which reflects the varying availability and potential rarity of ground truth data in medical databases.</p><p><strong>Results: </strong>Resulting machine learning model accuracies ranged between 47.7% and 94.4% when tested on ground truth data. Finally, we found new features to help experts enhance existing case-finding algorithms.</p><p><strong>Discussion: </strong>Developing precise algorithms to detect medical outcomes in claims can be a laborious and expensive process, particularly for conditions presented and coded diversely. We found it beneficial to filter out highly potent codes used for data curation to identify underlying patterns and features. To improve rule-based algorithms where necessary, researchers could use model explainers to determine noteworthy features, which could then be shared with experts and included in the algorithm.</p><p><strong>Conclusion: </strong>Our work suggests machine learning models can perform at similar levels as a previously published expert case-finding algorithm, while also having the potential to improve performance or streamline algorithm construction processes by identifying new relevant features for algorithm construction.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 4","pages":"ooad090"},"PeriodicalIF":2.5,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10611436/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71414454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-24eCollection Date: 2023-12-01DOI: 10.1093/jamiaopen/ooad087
Boguang Sun, Pui Ying Yew, Chih-Lin Chi, Meijia Song, Matt Loth, Rui Zhang, Robert J Straka
Importance: Statins are widely prescribed cholesterol-lowering medications in the United States, but their clinical benefits can be diminished by statin-associated muscle symptoms (SAMS), leading to discontinuation.
Objectives: In this study, we aimed to develop and validate a pharmacological SAMS clinical phenotyping algorithm using electronic health records (EHRs) data from Minnesota Fairview.
Materials and methods: We retrieved structured and unstructured EHR data of statin users and manually ascertained a gold standard set of SAMS cases and controls using the published SAMS-Clinical Index tool from clinical notes in 200 patients. We developed machine learning algorithms and rule-based algorithms that incorporated various criteria, including ICD codes, statin allergy, creatine kinase elevation, and keyword mentions in clinical notes. We applied the best-performing algorithm to the statin cohort to identify SAMS.
Results: We identified 16 889 patients who started statins in the Fairview EHR system from 2010 to 2020. The combined rule-based (CRB) algorithm, which utilized both clinical notes and structured data criteria, achieved similar performance compared to machine learning algorithms with a precision of 0.85, recall of 0.71, and F1 score of 0.77 against the gold standard set. Applying the CRB algorithm to the statin cohort, we identified the pharmacological SAMS prevalence to be 1.9% and selective risk factors which included female gender, coronary artery disease, hypothyroidism, and use of immunosuppressants or fibrates.
Discussion and conclusion: Our study developed and validated a simple pharmacological SAMS phenotyping algorithm that can be used to create SAMS case/control cohort to enable further analysis which can lead to the development of a SAMS risk prediction model.
{"title":"Development and application of pharmacological statin-associated muscle symptoms phenotyping algorithms using structured and unstructured electronic health records data.","authors":"Boguang Sun, Pui Ying Yew, Chih-Lin Chi, Meijia Song, Matt Loth, Rui Zhang, Robert J Straka","doi":"10.1093/jamiaopen/ooad087","DOIUrl":"10.1093/jamiaopen/ooad087","url":null,"abstract":"<p><strong>Importance: </strong>Statins are widely prescribed cholesterol-lowering medications in the United States, but their clinical benefits can be diminished by statin-associated muscle symptoms (SAMS), leading to discontinuation.</p><p><strong>Objectives: </strong>In this study, we aimed to develop and validate a pharmacological SAMS clinical phenotyping algorithm using electronic health records (EHRs) data from Minnesota Fairview.</p><p><strong>Materials and methods: </strong>We retrieved structured and unstructured EHR data of statin users and manually ascertained a gold standard set of SAMS cases and controls using the published SAMS-Clinical Index tool from clinical notes in 200 patients. We developed machine learning algorithms and rule-based algorithms that incorporated various criteria, including ICD codes, statin allergy, creatine kinase elevation, and keyword mentions in clinical notes. We applied the best-performing algorithm to the statin cohort to identify SAMS.</p><p><strong>Results: </strong>We identified 16 889 patients who started statins in the Fairview EHR system from 2010 to 2020. The combined rule-based (CRB) algorithm, which utilized both clinical notes and structured data criteria, achieved similar performance compared to machine learning algorithms with a precision of 0.85, recall of 0.71, and F1 score of 0.77 against the gold standard set. Applying the CRB algorithm to the statin cohort, we identified the pharmacological SAMS prevalence to be 1.9% and selective risk factors which included female gender, coronary artery disease, hypothyroidism, and use of immunosuppressants or fibrates.</p><p><strong>Discussion and conclusion: </strong>Our study developed and validated a simple pharmacological SAMS phenotyping algorithm that can be used to create SAMS case/control cohort to enable further analysis which can lead to the development of a SAMS risk prediction model.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 4","pages":"ooad087"},"PeriodicalIF":2.1,"publicationDate":"2023-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/de/c5/ooad087.PMC10597587.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50163081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-17eCollection Date: 2023-12-01DOI: 10.1093/jamiaopen/ooad088
Terika McCall, Meagan Foster, Holly R Tomlin, Todd A Schwartz
Objectives: This study aimed to understand Black American women's attitudes toward seeking mental health services and using mobile technology to receive support for managing anxiety.
Methods: A self-administered web-based questionnaire was launched in October 2019 and closed in January 2020. Women who identified as Black/African American were eligible to participate. The survey consisted of approximately 70 questions and covered topics such as, attitudes toward seeking professional psychological help, acceptability of using a mobile phone to receive mental health care, and screening for anxiety.
Results: The findings of the study (N = 395) showed that younger Black women were more likely to have greater severity of anxiety than their older counterparts. Respondents were most comfortable with the use of a voice call or video call to communicate with a professional to receive support to manage anxiety in comparison to text messaging or mobile app. Younger age, higher income, and greater scores for psychological openness and help-seeking propensity increased odds of indicating agreement with using mobile technology to communicate with a professional. Black women in the Southern region of the United States had twice the odds of agreeing to the use of mobile apps than women in the Midwest and Northeast regions.
Discussion: Black American women, in general, have favorable views toward the use of mobile technology to receive support to manage anxiety.
Conclusion: Preferences and cultural appropriateness of resources should be assessed on an individual basis to increase likelihood of adoption and engagement with digital mental health interventions for management of anxiety.
{"title":"Black American women's attitudes toward seeking mental health services and use of mobile technology to support the management of anxiety.","authors":"Terika McCall, Meagan Foster, Holly R Tomlin, Todd A Schwartz","doi":"10.1093/jamiaopen/ooad088","DOIUrl":"10.1093/jamiaopen/ooad088","url":null,"abstract":"<p><strong>Objectives: </strong>This study aimed to understand Black American women's attitudes toward seeking mental health services and using mobile technology to receive support for managing anxiety.</p><p><strong>Methods: </strong>A self-administered web-based questionnaire was launched in October 2019 and closed in January 2020. Women who identified as Black/African American were eligible to participate. The survey consisted of approximately 70 questions and covered topics such as, attitudes toward seeking professional psychological help, acceptability of using a mobile phone to receive mental health care, and screening for anxiety.</p><p><strong>Results: </strong>The findings of the study (<i>N</i> = 395) showed that younger Black women were more likely to have greater severity of anxiety than their older counterparts. Respondents were most comfortable with the use of a voice call or video call to communicate with a professional to receive support to manage anxiety in comparison to text messaging or mobile app. Younger age, higher income, and greater scores for psychological openness and help-seeking propensity increased odds of indicating agreement with using mobile technology to communicate with a professional. Black women in the Southern region of the United States had twice the odds of agreeing to the use of mobile apps than women in the Midwest and Northeast regions.</p><p><strong>Discussion: </strong>Black American women, in general, have favorable views toward the use of mobile technology to receive support to manage anxiety.</p><p><strong>Conclusion: </strong>Preferences and cultural appropriateness of resources should be assessed on an individual basis to increase likelihood of adoption and engagement with digital mental health interventions for management of anxiety.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 4","pages":"ooad088"},"PeriodicalIF":2.1,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10582519/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49683052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-17eCollection Date: 2023-12-01DOI: 10.1093/jamiaopen/ooad084
Muntaha Samad, Mirana Angel, Joseph Rinehart, Yuzo Kanomata, Pierre Baldi, Maxime Cannesson
Objectives: Artificial intelligence (AI) holds great promise for transforming the healthcare industry. However, despite its potential, AI is yet to see widespread deployment in clinical settings in significant part due to the lack of publicly available clinical data and the lack of transparency in the published AI algorithms. There are few clinical data repositories publicly accessible to researchers to train and test AI algorithms, and even fewer that contain specialized data from the perioperative setting. To address this gap, we present and release the Medical Informatics Operating Room Vitals and Events Repository (MOVER).
Materials and methods: This first release of MOVER includes adult patients who underwent surgery at the University of California, Irvine Medical Center from 2015 to 2022. Data for patients who underwent surgery were captured from 2 different sources: High-fidelity physiological waveforms from all of the operating rooms were captured in real time and matched with electronic medical record data.
Results: MOVER includes data from 58 799 unique patients and 83 468 surgeries. MOVER is available for download at https://doi.org/10.24432/C5VS5G, it can be downloaded by anyone who signs a data usage agreement (DUA), to restrict traffic to legitimate researchers.
Discussion: To the best of our knowledge MOVER is the only freely available public data repository that contains electronic health record and high-fidelity physiological waveforms data for patients undergoing surgery.
Conclusion: MOVER is freely available to all researchers who sign a DUA, and we hope that it will accelerate the integration of AI into healthcare settings, ultimately leading to improved patient outcomes.
{"title":"Medical Informatics Operating Room Vitals and Events Repository (MOVER): a public-access operating room database.","authors":"Muntaha Samad, Mirana Angel, Joseph Rinehart, Yuzo Kanomata, Pierre Baldi, Maxime Cannesson","doi":"10.1093/jamiaopen/ooad084","DOIUrl":"10.1093/jamiaopen/ooad084","url":null,"abstract":"<p><strong>Objectives: </strong>Artificial intelligence (AI) holds great promise for transforming the healthcare industry. However, despite its potential, AI is yet to see widespread deployment in clinical settings in significant part due to the lack of publicly available clinical data and the lack of transparency in the published AI algorithms. There are few clinical data repositories publicly accessible to researchers to train and test AI algorithms, and even fewer that contain specialized data from the perioperative setting. To address this gap, we present and release the Medical Informatics Operating Room Vitals and Events Repository (MOVER).</p><p><strong>Materials and methods: </strong>This first release of MOVER includes adult patients who underwent surgery at the University of California, Irvine Medical Center from 2015 to 2022. Data for patients who underwent surgery were captured from 2 different sources: High-fidelity physiological waveforms from all of the operating rooms were captured in real time and matched with electronic medical record data.</p><p><strong>Results: </strong>MOVER includes data from 58 799 unique patients and 83 468 surgeries. MOVER is available for download at https://doi.org/10.24432/C5VS5G, it can be downloaded by anyone who signs a data usage agreement (DUA), to restrict traffic to legitimate researchers.</p><p><strong>Discussion: </strong>To the best of our knowledge MOVER is the only freely available public data repository that contains electronic health record and high-fidelity physiological waveforms data for patients undergoing surgery.</p><p><strong>Conclusion: </strong>MOVER is freely available to all researchers who sign a DUA, and we hope that it will accelerate the integration of AI into healthcare settings, ultimately leading to improved patient outcomes.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 4","pages":"ooad084"},"PeriodicalIF":2.1,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10582520/pdf/ooad084.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49683054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-09eCollection Date: 2023-12-01DOI: 10.1093/jamiaopen/ooad086
Barrett W Jones, Warren D Taylor, Colin G Walsh
Objectives: We evaluated autoencoders as a feature engineering and pretraining technique to improve major depressive disorder (MDD) prognostic risk prediction. Autoencoders can represent temporal feature relationships not identified by aggregate features. The predictive performance of autoencoders of multiple sequential structures was evaluated as feature engineering and pretraining strategies on an array of prediction tasks and compared to a restricted Boltzmann machine (RBM) and random forests as a benchmark.
Materials and methods: We study MDD patients from Vanderbilt University Medical Center. Autoencoder models with Attention and long-short-term memory (LSTM) layers were trained to create latent representations of the input data. Predictive performance was evaluated temporally by fitting random forest models to predict future outcomes with engineered features as input and using autoencoder weights to initialize neural network layers. We evaluated area under the precision-recall curve (AUPRC) trends and variation over the study population's treatment course.
Results: The pretrained LSTM model improved predictive performance over pretrained Attention models and benchmarks in 3 of 4 outcomes including self-harm/suicide attempt (AUPRCs, LSTM pretrained = 0.012, Attention pretrained = 0.010, RBM = 0.009, random forest = 0.005). The use of autoencoders for feature engineering had varied results, with benchmarks outperforming LSTM and Attention encodings on the self-harm/suicide attempt outcome (AUPRCs, LSTM encodings = 0.003, Attention encodings = 0.004, RBM = 0.009, random forest = 0.005).
Discussion: Improvement in prediction resulting from pretraining has the potential for increased clinical impact of MDD risk models. We did not find evidence that the use of temporal feature encodings was additive to predictive performance in the study population. This suggests that predictive information retained by model weights may be lost during encoding. LSTM pretrained model predictive performance is shown to be clinically useful and improves over state-of-the-art predictors in the MDD phenotype. LSTM model performance warrants consideration of use in future related studies.
Conclusion: LSTM models with pretrained weights from autoencoders were able to outperform the benchmark and a pretrained Attention model. Future researchers developing risk models in MDD may benefit from the use of LSTM autoencoder pretrained weights.
{"title":"Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction.","authors":"Barrett W Jones, Warren D Taylor, Colin G Walsh","doi":"10.1093/jamiaopen/ooad086","DOIUrl":"10.1093/jamiaopen/ooad086","url":null,"abstract":"<p><strong>Objectives: </strong>We evaluated autoencoders as a feature engineering and pretraining technique to improve major depressive disorder (MDD) prognostic risk prediction. Autoencoders can represent temporal feature relationships not identified by aggregate features. The predictive performance of autoencoders of multiple sequential structures was evaluated as feature engineering and pretraining strategies on an array of prediction tasks and compared to a restricted Boltzmann machine (RBM) and random forests as a benchmark.</p><p><strong>Materials and methods: </strong>We study MDD patients from Vanderbilt University Medical Center. Autoencoder models with Attention and long-short-term memory (LSTM) layers were trained to create latent representations of the input data. Predictive performance was evaluated temporally by fitting random forest models to predict future outcomes with engineered features as input and using autoencoder weights to initialize neural network layers. We evaluated area under the precision-recall curve (AUPRC) trends and variation over the study population's treatment course.</p><p><strong>Results: </strong>The pretrained LSTM model improved predictive performance over pretrained Attention models and benchmarks in 3 of 4 outcomes including self-harm/suicide attempt (AUPRCs, LSTM pretrained = 0.012, Attention pretrained = 0.010, RBM = 0.009, random forest = 0.005). The use of autoencoders for feature engineering had varied results, with benchmarks outperforming LSTM and Attention encodings on the self-harm/suicide attempt outcome (AUPRCs, LSTM encodings = 0.003, Attention encodings = 0.004, RBM = 0.009, random forest = 0.005).</p><p><strong>Discussion: </strong>Improvement in prediction resulting from pretraining has the potential for increased clinical impact of MDD risk models. We did not find evidence that the use of temporal feature encodings was additive to predictive performance in the study population. This suggests that predictive information retained by model weights may be lost during encoding. LSTM pretrained model predictive performance is shown to be clinically useful and improves over state-of-the-art predictors in the MDD phenotype. LSTM model performance warrants consideration of use in future related studies.</p><p><strong>Conclusion: </strong>LSTM models with pretrained weights from autoencoders were able to outperform the benchmark and a pretrained Attention model. Future researchers developing risk models in MDD may benefit from the use of LSTM autoencoder pretrained weights.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 4","pages":"ooad086"},"PeriodicalIF":2.1,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10561992/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41214963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-04eCollection Date: 2023-12-01DOI: 10.1093/jamiaopen/ooad085
Geoffrey M Gray, Ayah Zirikly, Luis M Ahumada, Masoud Rouhizadeh, Thomas Richards, Christopher Kitchen, Iman Foroughmand, Elham Hatef
Objectives: To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs).
Materials and methods: We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and F1 score.
Results: The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and F1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric.
Discussion: The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system.
Conclusion: The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system.
{"title":"Application of natural language processing to identify social needs from patient medical notes: development and assessment of a scalable, performant, and rule-based model in an integrated healthcare delivery system.","authors":"Geoffrey M Gray, Ayah Zirikly, Luis M Ahumada, Masoud Rouhizadeh, Thomas Richards, Christopher Kitchen, Iman Foroughmand, Elham Hatef","doi":"10.1093/jamiaopen/ooad085","DOIUrl":"10.1093/jamiaopen/ooad085","url":null,"abstract":"<p><strong>Objectives: </strong>To develop and test a scalable, performant, and rule-based model for identifying 3 major domains of social needs (residential instability, food insecurity, and transportation issues) from the unstructured data in electronic health records (EHRs).</p><p><strong>Materials and methods: </strong>We included patients aged 18 years or older who received care at the Johns Hopkins Health System (JHHS) between July 2016 and June 2021 and had at least 1 unstructured (free-text) note in their EHR during the study period. We used a combination of manual lexicon curation and semiautomated lexicon creation for feature development. We developed an initial rules-based pipeline (Match Pipeline) using 2 keyword sets for each social needs domain. We performed rule-based keyword matching for distinct lexicons and tested the algorithm using an annotated dataset comprising 192 patients. Starting with a set of expert-identified keywords, we tested the adjustments by evaluating false positives and negatives identified in the labeled dataset. We assessed the performance of the algorithm using measures of precision, recall, and <i>F</i>1 score.</p><p><strong>Results: </strong>The algorithm for identifying residential instability had the best overall performance, with a weighted average for precision, recall, and <i>F</i>1 score of 0.92, 0.84, and 0.92 for identifying patients with homelessness and 0.84, 0.82, and 0.79 for identifying patients with housing insecurity. Metrics for the food insecurity algorithm were high but the transportation issues algorithm was the lowest overall performing metric.</p><p><strong>Discussion: </strong>The NLP algorithm in identifying social needs at JHHS performed relatively well and would provide the opportunity for implementation in a healthcare system.</p><p><strong>Conclusion: </strong>The NLP approach developed in this project could be adapted and potentially operationalized in the routine data processes of a healthcare system.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 4","pages":"ooad085"},"PeriodicalIF":2.1,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/2e/eb/ooad085.PMC10550267.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41168703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-04DOI: 10.1093/jamiaopen/ooad107
Stephanie Teeple, Aria G. Smith, Matthew F. Toerper, Scott Levin, Scott Halpern, Oluwakemi Badaki‐Makun, J. Hinson
To investigate how missing data in the patient problem list may impact racial disparities in the predictive performance of a machine learning (ML) model for emergency department (ED) triage. Racial disparities may exist in the missingness of EHR data (eg, systematic differences in access, testing, and/or treatment) that can impact model predictions across racialized patient groups. We use an ML model that predicts patients’ risk for adverse events to produce triage-level recommendations, patterned after a clinical decision support tool deployed at multiple EDs. We compared the model’s predictive performance on sets of observed (problem list data at the point of triage) versus manipulated (updated to the more complete problem list at the end of the encounter) test data. These differences were compared between Black and non-Hispanic White patient groups using multiple performance measures relevant to health equity. There were modest, but significant, changes in predictive performance comparing the observed to manipulated models across both Black and non-Hispanic White patient groups; c-statistic improvement ranged between 0.027 and 0.058. The manipulation produced no between-group differences in c-statistic by race. However, there were small between-group differences in other performance measures, with greater change for non-Hispanic White patients. Problem list missingness impacted model performance for both patient groups, with marginal differences detected by race. Further exploration is needed to examine how missingness may contribute to racial disparities in clinical model predictions across settings. The novel manipulation method demonstrated may aid future research.
目的:研究患者问题清单中的缺失数据如何影响急诊科(ED)分诊机器学习(ML)模型预测性能中的种族差异。 电子病历数据的缺失可能存在种族差异(例如,就诊、检测和/或治疗方面的系统性差异),这会影响模型对不同种族患者群体的预测。我们使用了一个预测患者不良事件风险的 ML 模型,以多个急诊室部署的临床决策支持工具为蓝本,提出分诊建议。我们比较了该模型在观察数据集(分诊时的问题列表数据)和操作数据集(就诊结束时更新为更完整的问题列表)上的预测性能。使用与健康公平相关的多种绩效指标,比较了黑人和非西班牙裔白人患者群体之间的差异。 在黑人和非西班牙裔白人患者群体中,将观察到的模型与操作模型进行比较,预测性能发生了适度但显著的变化;c 统计量的提高幅度在 0.027 和 0.058 之间。操纵模型在不同种族的 c 统计量上没有组间差异。但是,在其他绩效指标方面,组间差异较小,非西班牙裔白人患者的变化更大。 问题列表缺失对两组患者的模型性能都有影响,种族间的差异微乎其微。 我们还需要进一步研究遗漏是如何导致不同环境下临床模型预测的种族差异的。所展示的新颖操作方法可能有助于未来的研究。
{"title":"Exploring the impact of missingness on racial disparities in predictive performance of a machine learning model for emergency department triage","authors":"Stephanie Teeple, Aria G. Smith, Matthew F. Toerper, Scott Levin, Scott Halpern, Oluwakemi Badaki‐Makun, J. Hinson","doi":"10.1093/jamiaopen/ooad107","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooad107","url":null,"abstract":"To investigate how missing data in the patient problem list may impact racial disparities in the predictive performance of a machine learning (ML) model for emergency department (ED) triage. Racial disparities may exist in the missingness of EHR data (eg, systematic differences in access, testing, and/or treatment) that can impact model predictions across racialized patient groups. We use an ML model that predicts patients’ risk for adverse events to produce triage-level recommendations, patterned after a clinical decision support tool deployed at multiple EDs. We compared the model’s predictive performance on sets of observed (problem list data at the point of triage) versus manipulated (updated to the more complete problem list at the end of the encounter) test data. These differences were compared between Black and non-Hispanic White patient groups using multiple performance measures relevant to health equity. There were modest, but significant, changes in predictive performance comparing the observed to manipulated models across both Black and non-Hispanic White patient groups; c-statistic improvement ranged between 0.027 and 0.058. The manipulation produced no between-group differences in c-statistic by race. However, there were small between-group differences in other performance measures, with greater change for non-Hispanic White patients. Problem list missingness impacted model performance for both patient groups, with marginal differences detected by race. Further exploration is needed to examine how missingness may contribute to racial disparities in clinical model predictions across settings. The novel manipulation method demonstrated may aid future research.","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"13 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139323583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}