Tao Wang, David Codling, Yamiko Joseph Msosa, Matthew Broadbent, Daisy Kornblum, Catherine Polling, Thomas Searle, Claire Delaney-Pope, Barbara Arroyo, Stuart MacLellan, Zoe Keddie, Mary Docherty, Angus Roberts, Robert Stewart, Philip McGuire, Richard Dobson, Robert Harland
Objective: A proof-of-concept study aimed at designing and implementing Visual & Interactive Engagement With Electronic Records (VIEWER), a versatile toolkit for visual analytics of clinical data, and systematically evaluating its effectiveness across various clinical applications while gathering feedback for iterative improvements.
Materials and methods: VIEWER is an open-source and extensible toolkit that employs natural language processing and interactive visualization techniques to facilitate the rapid design, development, and deployment of clinical information retrieval, analysis, and visualization at the point of care. Through an iterative and collaborative participatory design approach, VIEWER was designed and implemented in one of the United Kingdom's largest National Health Services mental health Trusts, where its clinical utility and effectiveness were assessed using both quantitative and qualitative methods.
Results: VIEWER provides interactive, problem-focused, and comprehensive views of longitudinal patient data (n = 409 870) from a combination of structured clinical data and unstructured clinical notes. Despite a relatively short adoption period and users' initial unfamiliarity, VIEWER significantly improved performance and task completion speed compared to the standard clinical information system. More than 1000 users and partners in the hospital tested and used VIEWER, reporting high satisfaction and expressed strong interest in incorporating VIEWER into their daily practice.
Discussion: VIEWER provides a cost-effective enhancement to the functionalities of standard clinical information systems, with evaluation offering valuable feedback for future improvements.
Conclusion: VIEWER was developed to improve data accessibility and representation across various aspects of healthcare delivery, including population health management and patient monitoring. The deployment of VIEWER highlights the benefits of collaborative refinement in optimizing health informatics solutions for enhanced patient care.
{"title":"VIEWER: an extensible visual analytics framework for enhancing mental healthcare.","authors":"Tao Wang, David Codling, Yamiko Joseph Msosa, Matthew Broadbent, Daisy Kornblum, Catherine Polling, Thomas Searle, Claire Delaney-Pope, Barbara Arroyo, Stuart MacLellan, Zoe Keddie, Mary Docherty, Angus Roberts, Robert Stewart, Philip McGuire, Richard Dobson, Robert Harland","doi":"10.1093/jamia/ocaf010","DOIUrl":"https://doi.org/10.1093/jamia/ocaf010","url":null,"abstract":"<p><strong>Objective: </strong>A proof-of-concept study aimed at designing and implementing Visual & Interactive Engagement With Electronic Records (VIEWER), a versatile toolkit for visual analytics of clinical data, and systematically evaluating its effectiveness across various clinical applications while gathering feedback for iterative improvements.</p><p><strong>Materials and methods: </strong>VIEWER is an open-source and extensible toolkit that employs natural language processing and interactive visualization techniques to facilitate the rapid design, development, and deployment of clinical information retrieval, analysis, and visualization at the point of care. Through an iterative and collaborative participatory design approach, VIEWER was designed and implemented in one of the United Kingdom's largest National Health Services mental health Trusts, where its clinical utility and effectiveness were assessed using both quantitative and qualitative methods.</p><p><strong>Results: </strong>VIEWER provides interactive, problem-focused, and comprehensive views of longitudinal patient data (n = 409 870) from a combination of structured clinical data and unstructured clinical notes. Despite a relatively short adoption period and users' initial unfamiliarity, VIEWER significantly improved performance and task completion speed compared to the standard clinical information system. More than 1000 users and partners in the hospital tested and used VIEWER, reporting high satisfaction and expressed strong interest in incorporating VIEWER into their daily practice.</p><p><strong>Discussion: </strong>VIEWER provides a cost-effective enhancement to the functionalities of standard clinical information systems, with evaluation offering valuable feedback for future improvements.</p><p><strong>Conclusion: </strong>VIEWER was developed to improve data accessibility and representation across various aspects of healthcare delivery, including population health management and patient monitoring. The deployment of VIEWER highlights the benefits of collaborative refinement in optimizing health informatics solutions for enhanced patient care.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Madelena Y Ng, Jarrod Helzer, Michael A Pfeffer, Tina Seto, Tina Hernandez-Boussard
Background: Generative AI, particularly large language models (LLMs), holds great potential for improving patient care and operational efficiency in healthcare. However, the use of LLMs is complicated by regulatory concerns around data security and patient privacy. This study aimed to develop and evaluate a secure infrastructure that allows researchers to safely leverage LLMs in healthcare while ensuring HIPAA compliance and promoting equitable AI.
Materials and methods: We implemented a private Azure OpenAI Studio deployment with secure API-enabled endpoints for researchers. Two use cases were explored, detecting falls from electronic health records (EHR) notes and evaluating bias in mental health prediction using fairness-aware prompts.
Results: The framework provided secure, HIPAA-compliant API access to LLMs, allowing researchers to handle sensitive data safely. Both use cases highlighted the secure infrastructure's capacity to protect sensitive patient data while supporting innovation.
Discussion and conclusion: This centralized platform presents a scalable, secure, and HIPAA-compliant solution for healthcare institutions aiming to integrate LLMs into clinical research.
{"title":"Development of secure infrastructure for advancing generative artificial intelligence research in healthcare at an academic medical center.","authors":"Madelena Y Ng, Jarrod Helzer, Michael A Pfeffer, Tina Seto, Tina Hernandez-Boussard","doi":"10.1093/jamia/ocaf005","DOIUrl":"https://doi.org/10.1093/jamia/ocaf005","url":null,"abstract":"<p><strong>Background: </strong>Generative AI, particularly large language models (LLMs), holds great potential for improving patient care and operational efficiency in healthcare. However, the use of LLMs is complicated by regulatory concerns around data security and patient privacy. This study aimed to develop and evaluate a secure infrastructure that allows researchers to safely leverage LLMs in healthcare while ensuring HIPAA compliance and promoting equitable AI.</p><p><strong>Materials and methods: </strong>We implemented a private Azure OpenAI Studio deployment with secure API-enabled endpoints for researchers. Two use cases were explored, detecting falls from electronic health records (EHR) notes and evaluating bias in mental health prediction using fairness-aware prompts.</p><p><strong>Results: </strong>The framework provided secure, HIPAA-compliant API access to LLMs, allowing researchers to handle sensitive data safely. Both use cases highlighted the secure infrastructure's capacity to protect sensitive patient data while supporting innovation.</p><p><strong>Discussion and conclusion: </strong>This centralized platform presents a scalable, secure, and HIPAA-compliant solution for healthcare institutions aiming to integrate LLMs into clinical research.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143015016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Ali Khan, Umair Ayub, Syed Arsalan Ahmed Naqvi, Kaneez Zahra Rubab Khakwani, Zaryab Bin Riaz Sipra, Ammad Raina, Sihan Zhou, Huan He, Amir Saeidi, Bashar Hasan, Robert Bryan Rumble, Danielle S Bitterman, Jeremy L Warner, Jia Zou, Amye J Tevaarwerk, Konstantinos Leventakos, Kenneth L Kehl, Jeanne M Palmer, Mohammad Hassan Murad, Chitta Baral, Irbaz Bin Riaz
Objective: Data extraction from the published literature is the most laborious step in conducting living systematic reviews (LSRs). We aim to build a generalizable, automated data extraction workflow leveraging large language models (LLMs) that mimics the real-world 2-reviewer process.
Materials and methods: A dataset of 10 trials (22 publications) from a published LSR was used, focusing on 23 variables related to trial, population, and outcomes data. The dataset was split into prompt development (n = 5) and held-out test sets (n = 17). GPT-4-turbo and Claude-3-Opus were used for data extraction. Responses from the 2 LLMs were considered concordant if they were the same for a given variable. The discordant responses from each LLM were provided to the other LLM for cross-critique. Accuracy, ie, the total number of correct responses divided by the total number of responses, was computed to assess performance.
Results: In the prompt development set, 110 (96%) responses were concordant, achieving an accuracy of 0.99 against the gold standard. In the test set, 342 (87%) responses were concordant. The accuracy of the concordant responses was 0.94. The accuracy of the discordant responses was 0.41 for GPT-4-turbo and 0.50 for Claude-3-Opus. Of the 49 discordant responses, 25 (51%) became concordant after cross-critique, increasing accuracy to 0.76.
Discussion: Concordant responses by the LLMs are likely to be accurate. In instances of discordant responses, cross-critique can further increase the accuracy.
Conclusion: Large language models, when simulated in a collaborative, 2-reviewer workflow, can extract data with reasonable performance, enabling truly "living" systematic reviews.
{"title":"Collaborative large language models for automated data extraction in living systematic reviews.","authors":"Muhammad Ali Khan, Umair Ayub, Syed Arsalan Ahmed Naqvi, Kaneez Zahra Rubab Khakwani, Zaryab Bin Riaz Sipra, Ammad Raina, Sihan Zhou, Huan He, Amir Saeidi, Bashar Hasan, Robert Bryan Rumble, Danielle S Bitterman, Jeremy L Warner, Jia Zou, Amye J Tevaarwerk, Konstantinos Leventakos, Kenneth L Kehl, Jeanne M Palmer, Mohammad Hassan Murad, Chitta Baral, Irbaz Bin Riaz","doi":"10.1093/jamia/ocae325","DOIUrl":"https://doi.org/10.1093/jamia/ocae325","url":null,"abstract":"<p><strong>Objective: </strong>Data extraction from the published literature is the most laborious step in conducting living systematic reviews (LSRs). We aim to build a generalizable, automated data extraction workflow leveraging large language models (LLMs) that mimics the real-world 2-reviewer process.</p><p><strong>Materials and methods: </strong>A dataset of 10 trials (22 publications) from a published LSR was used, focusing on 23 variables related to trial, population, and outcomes data. The dataset was split into prompt development (n = 5) and held-out test sets (n = 17). GPT-4-turbo and Claude-3-Opus were used for data extraction. Responses from the 2 LLMs were considered concordant if they were the same for a given variable. The discordant responses from each LLM were provided to the other LLM for cross-critique. Accuracy, ie, the total number of correct responses divided by the total number of responses, was computed to assess performance.</p><p><strong>Results: </strong>In the prompt development set, 110 (96%) responses were concordant, achieving an accuracy of 0.99 against the gold standard. In the test set, 342 (87%) responses were concordant. The accuracy of the concordant responses was 0.94. The accuracy of the discordant responses was 0.41 for GPT-4-turbo and 0.50 for Claude-3-Opus. Of the 49 discordant responses, 25 (51%) became concordant after cross-critique, increasing accuracy to 0.76.</p><p><strong>Discussion: </strong>Concordant responses by the LLMs are likely to be accurate. In instances of discordant responses, cross-critique can further increase the accuracy.</p><p><strong>Conclusion: </strong>Large language models, when simulated in a collaborative, 2-reviewer workflow, can extract data with reasonable performance, enabling truly \"living\" systematic reviews.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143015012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agata Foryciarz, Nicole Gladish, David H Rehkopf, Sherri Rose
Objectives: The inclusion of social drivers of health (SDOH) into predictive algorithms of health outcomes has potential for improving algorithm interpretation, performance, generalizability, and transportability. However, there are limitations in the availability, understanding, and quality of SDOH variables, as well as a lack of guidance on how to incorporate them into algorithms when appropriate to do so. As such, few published algorithms include SDOH, and there is substantial methodological variability among those that do. We argue that practitioners should consider the use of social indices and factors-a class of area-level measurements-given their accessibility, transparency, and quality.
Results: We illustrate the process of using such indices in predictive algorithms, which includes the selection of appropriate indices for the outcome, measurement time, and geographic level, in a demonstrative example with the Kidney Failure Risk Equation.
Discussion: Identifying settings where incorporating SDOH may be beneficial and incorporating them rigorously can help validate algorithms and assess generalizability.
{"title":"Incorporating area-level social drivers of health in predictive algorithms using electronic health record data.","authors":"Agata Foryciarz, Nicole Gladish, David H Rehkopf, Sherri Rose","doi":"10.1093/jamia/ocaf009","DOIUrl":"https://doi.org/10.1093/jamia/ocaf009","url":null,"abstract":"<p><strong>Objectives: </strong>The inclusion of social drivers of health (SDOH) into predictive algorithms of health outcomes has potential for improving algorithm interpretation, performance, generalizability, and transportability. However, there are limitations in the availability, understanding, and quality of SDOH variables, as well as a lack of guidance on how to incorporate them into algorithms when appropriate to do so. As such, few published algorithms include SDOH, and there is substantial methodological variability among those that do. We argue that practitioners should consider the use of social indices and factors-a class of area-level measurements-given their accessibility, transparency, and quality.</p><p><strong>Results: </strong>We illustrate the process of using such indices in predictive algorithms, which includes the selection of appropriate indices for the outcome, measurement time, and geographic level, in a demonstrative example with the Kidney Failure Risk Equation.</p><p><strong>Discussion: </strong>Identifying settings where incorporating SDOH may be beneficial and incorporating them rigorously can help validate algorithms and assess generalizability.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143015033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective: Extracting PICO elements-Participants, Intervention, Comparison, and Outcomes-from clinical trial literature is essential for clinical evidence retrieval, appraisal, and synthesis. Existing approaches do not distinguish the attributes of PICO entities. This study aims to develop a named entity recognition (NER) model to extract PICO entities with fine granularities.
Materials and methods: Using a corpus of 2511 abstracts with PICO mentions from 4 public datasets, we developed a semi-supervised method to facilitate the training of a NER model, FinePICO, by combining limited annotated data of PICO entities and abundant unlabeled data. For evaluation, we divided the entire dataset into 2 subsets: a smaller group with annotations and a larger group without annotations. We then established the theoretical lower and upper performance bounds based on the performance of supervised learning models trained solely on the small, annotated subset and on the entire set with complete annotations, respectively. Finally, we evaluated FinePICO on both the smaller annotated subset and the larger, initially unannotated subset. We measured the performance of FinePICO using precision, recall, and F1.
Results: Our method achieved precision/recall/F1 of 0.567/0.636/0.60, respectively, using a small set of annotated samples, outperforming the baseline model (F1: 0.437) by more than 16%. The model demonstrates generalizability to a different PICO framework and to another corpus, which consistently outperforms the benchmark in diverse experimental settings (P-value < .001).
Discussion: We developed FinePICO to recognize fine-grained PICO entities from text and validated its performance across diverse experimental settings, highlighting the feasibility of using semi-supervised learning (SSL) techniques to enhance PICO entities extraction. Future work can focus on optimizing SSL algorithms to improve efficiency and reduce computational costs.
Conclusion: This study contributes a generalizable and effective semi-supervised approach leveraging large unlabeled data together with small, annotated data for fine-grained PICO extraction.
{"title":"Semi-supervised learning from small annotated data and large unlabeled data for fine-grained Participants, Intervention, Comparison, and Outcomes entity recognition.","authors":"Fangyi Chen, Gongbo Zhang, Yilu Fang, Yifan Peng, Chunhua Weng","doi":"10.1093/jamia/ocae326","DOIUrl":"https://doi.org/10.1093/jamia/ocae326","url":null,"abstract":"<p><strong>Objective: </strong>Extracting PICO elements-Participants, Intervention, Comparison, and Outcomes-from clinical trial literature is essential for clinical evidence retrieval, appraisal, and synthesis. Existing approaches do not distinguish the attributes of PICO entities. This study aims to develop a named entity recognition (NER) model to extract PICO entities with fine granularities.</p><p><strong>Materials and methods: </strong>Using a corpus of 2511 abstracts with PICO mentions from 4 public datasets, we developed a semi-supervised method to facilitate the training of a NER model, FinePICO, by combining limited annotated data of PICO entities and abundant unlabeled data. For evaluation, we divided the entire dataset into 2 subsets: a smaller group with annotations and a larger group without annotations. We then established the theoretical lower and upper performance bounds based on the performance of supervised learning models trained solely on the small, annotated subset and on the entire set with complete annotations, respectively. Finally, we evaluated FinePICO on both the smaller annotated subset and the larger, initially unannotated subset. We measured the performance of FinePICO using precision, recall, and F1.</p><p><strong>Results: </strong>Our method achieved precision/recall/F1 of 0.567/0.636/0.60, respectively, using a small set of annotated samples, outperforming the baseline model (F1: 0.437) by more than 16%. The model demonstrates generalizability to a different PICO framework and to another corpus, which consistently outperforms the benchmark in diverse experimental settings (P-value < .001).</p><p><strong>Discussion: </strong>We developed FinePICO to recognize fine-grained PICO entities from text and validated its performance across diverse experimental settings, highlighting the feasibility of using semi-supervised learning (SSL) techniques to enhance PICO entities extraction. Future work can focus on optimizing SSL algorithms to improve efficiency and reduce computational costs.</p><p><strong>Conclusion: </strong>This study contributes a generalizable and effective semi-supervised approach leveraging large unlabeled data together with small, annotated data for fine-grained PICO extraction.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143015036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael R Cauley, Richard J Boland, S Trent Rosenbloom
Objective: To develop a framework that models the impact of electronic health record (EHR) systems on healthcare professionals' well-being and their relationships with patients, using interdisciplinary insights to guide machine learning in identifying value patterns important to healthcare professionals in EHR systems.
Materials and methods: A theoretical framework of EHR systems' implementation was developed using interdisciplinary literature from healthcare, information systems, and management science focusing on the systems approach, clinical decision-making, and interface terminologies.
Observations: Healthcare professionals balance personal norms of narrative and data-driven communication in knowledge creation for EHRs by integrating detailed patient stories with structured data. This integration forms 2 learning loops that create tension in the healthcare professional-patient relationship, shaping how healthcare professionals apply their values in care delivery. The manifestation of this value tension in EHRs directly affects the well-being of healthcare professionals.
Discussion: Understanding the value tension learning loop between structured data and narrative forms lays the groundwork for future studies of how healthcare professionals use EHRs to deliver care, emphasizing their well-being and patient relationships through a sociotechnical lens.
Conclusion: EHR systems can improve the healthcare professional-patient relationship and healthcare professional well-being by integrating norms and values into pattern recognition of narrative and data communication forms.
{"title":"Interdisciplinary systems may restore the healthcare professional-patient relationship in electronic health systems.","authors":"Michael R Cauley, Richard J Boland, S Trent Rosenbloom","doi":"10.1093/jamia/ocaf001","DOIUrl":"https://doi.org/10.1093/jamia/ocaf001","url":null,"abstract":"<p><strong>Objective: </strong>To develop a framework that models the impact of electronic health record (EHR) systems on healthcare professionals' well-being and their relationships with patients, using interdisciplinary insights to guide machine learning in identifying value patterns important to healthcare professionals in EHR systems.</p><p><strong>Materials and methods: </strong>A theoretical framework of EHR systems' implementation was developed using interdisciplinary literature from healthcare, information systems, and management science focusing on the systems approach, clinical decision-making, and interface terminologies.</p><p><strong>Observations: </strong>Healthcare professionals balance personal norms of narrative and data-driven communication in knowledge creation for EHRs by integrating detailed patient stories with structured data. This integration forms 2 learning loops that create tension in the healthcare professional-patient relationship, shaping how healthcare professionals apply their values in care delivery. The manifestation of this value tension in EHRs directly affects the well-being of healthcare professionals.</p><p><strong>Discussion: </strong>Understanding the value tension learning loop between structured data and narrative forms lays the groundwork for future studies of how healthcare professionals use EHRs to deliver care, emphasizing their well-being and patient relationships through a sociotechnical lens.</p><p><strong>Conclusion: </strong>EHR systems can improve the healthcare professional-patient relationship and healthcare professional well-being by integrating norms and values into pattern recognition of narrative and data communication forms.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143015034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Helena Klara Jambor, Julian Ketges, Anna Lea Otto, Malte von Bonin, Karolin Trautmann-Grill, Raphael Teipel, Jan Moritz Middeke, Maria Uhlig, Martin Eichler, Sebastian Pannasch, Martin Bornhäuser
Objective: This study evaluated the legibility, comprehension, and clinical usability of visual timelines for communicating cancer treatment paths. We examined how these visual aids enhance participants' and patients' understanding of their treatment plans.
Materials and methods: The study included 2 online surveys and 1 in-person survey with hematology cancer patients. The online surveys involved 306 and 160 participants, respectively, while the clinical evaluation included 30 patients (11 re-surveyed) and 24 medical doctors. Participants were assessed on their ability to understand treatment paths provided with audio information alone or with visual aids. The study also evaluated the comprehensibility of key treatment terms and the ability of patients to recall their cancer treatment paths.
Results: Visual representations effectively communicated treatment terms, with 7 out of 8 terms achieving over 85% transparency as pictograms, compared to 5 out of 8 for comics and 4 out of 8 for photos. Visual treatment timelines improved the proportion of correct responses, increased confidence, and were rated higher in information quality than audio-only information. In the clinical evaluation, patients showed good comprehension (mean proportion correct: 0.82) and recall (mean proportion correct: 0.71 after several weeks), and both patients and physicians found the visual aids helpful.
Discussion: We discuss that visual timelines enhance patient comprehension and confidence in cancer communication. We also discuss limitations of the online surveys and clinical evaluation. The importance of accessible visual aids in patient consultations is emphasized, with potential benefits for diverse patient populations.
Conclusion: Visual aids in the form of treatment timelines improve the legibility and comprehension of cancer treatment paths. Both patients and physicians support integrating these tools into cancer treatment communication.
{"title":"Communicating cancer treatment with pictogram-based timeline visualizations.","authors":"Helena Klara Jambor, Julian Ketges, Anna Lea Otto, Malte von Bonin, Karolin Trautmann-Grill, Raphael Teipel, Jan Moritz Middeke, Maria Uhlig, Martin Eichler, Sebastian Pannasch, Martin Bornhäuser","doi":"10.1093/jamia/ocae319","DOIUrl":"https://doi.org/10.1093/jamia/ocae319","url":null,"abstract":"<p><strong>Objective: </strong>This study evaluated the legibility, comprehension, and clinical usability of visual timelines for communicating cancer treatment paths. We examined how these visual aids enhance participants' and patients' understanding of their treatment plans.</p><p><strong>Materials and methods: </strong>The study included 2 online surveys and 1 in-person survey with hematology cancer patients. The online surveys involved 306 and 160 participants, respectively, while the clinical evaluation included 30 patients (11 re-surveyed) and 24 medical doctors. Participants were assessed on their ability to understand treatment paths provided with audio information alone or with visual aids. The study also evaluated the comprehensibility of key treatment terms and the ability of patients to recall their cancer treatment paths.</p><p><strong>Results: </strong>Visual representations effectively communicated treatment terms, with 7 out of 8 terms achieving over 85% transparency as pictograms, compared to 5 out of 8 for comics and 4 out of 8 for photos. Visual treatment timelines improved the proportion of correct responses, increased confidence, and were rated higher in information quality than audio-only information. In the clinical evaluation, patients showed good comprehension (mean proportion correct: 0.82) and recall (mean proportion correct: 0.71 after several weeks), and both patients and physicians found the visual aids helpful.</p><p><strong>Discussion: </strong>We discuss that visual timelines enhance patient comprehension and confidence in cancer communication. We also discuss limitations of the online surveys and clinical evaluation. The importance of accessible visual aids in patient consultations is emphasized, with potential benefits for diverse patient populations.</p><p><strong>Conclusion: </strong>Visual aids in the form of treatment timelines improve the legibility and comprehension of cancer treatment paths. Both patients and physicians support integrating these tools into cancer treatment communication.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143015014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew Steven Farmer, Mihail Popescu, Kimberly Powell
Objective: This study aimed to explore the utilization of a fine-tuned language model to extract expressions related to the Age-Friendly Health Systems 4M Framework (What Matters, Medication, Mentation, and Mobility) from nursing home worker text messages, deploy automated mapping of these expressions to a taxonomy, and explore the created expressions and relationships.
Materials and methods: The dataset included 21 357 text messages from healthcare workers in 12 Missouri nursing homes. A sample of 860 messages was annotated by clinical experts to form a "Gold Standard" dataset. Model performance was evaluated using classification metrics including Cohen's Kappa (κ), with κ ≥ 0.60 as the performance threshold. The selected model was fine-tuned. Extractions were clustered, labeled, and arranged into a structured taxonomy for exploration.
Results: The fine-tuned model demonstrated improved extraction of 4M content (κ = 0.73). Extractions were clustered and labeled, revealing large groups of expressions related to care preferences, medication adjustments, cognitive changes, and mobility issues.
Discussion: The preliminary development of the 4M model and 4M taxonomy enables knowledge extraction from clinical text messages and aids future development of a 4M ontology. Results compliment themes and findings in other 4M research.
Conclusion: This research underscores the need for consensus building in ontology creation and the role of language models in developing ontologies, while acknowledging their limitations in logical reasoning and ontological commitments. Further development and context expansion with expert involvement of a 4M ontology are necessary.
{"title":"Development and evaluation of a 4M taxonomy from nursing home staff text messages using a fine-tuned generative language model.","authors":"Matthew Steven Farmer, Mihail Popescu, Kimberly Powell","doi":"10.1093/jamia/ocaf006","DOIUrl":"https://doi.org/10.1093/jamia/ocaf006","url":null,"abstract":"<p><strong>Objective: </strong>This study aimed to explore the utilization of a fine-tuned language model to extract expressions related to the Age-Friendly Health Systems 4M Framework (What Matters, Medication, Mentation, and Mobility) from nursing home worker text messages, deploy automated mapping of these expressions to a taxonomy, and explore the created expressions and relationships.</p><p><strong>Materials and methods: </strong>The dataset included 21 357 text messages from healthcare workers in 12 Missouri nursing homes. A sample of 860 messages was annotated by clinical experts to form a \"Gold Standard\" dataset. Model performance was evaluated using classification metrics including Cohen's Kappa (κ), with κ ≥ 0.60 as the performance threshold. The selected model was fine-tuned. Extractions were clustered, labeled, and arranged into a structured taxonomy for exploration.</p><p><strong>Results: </strong>The fine-tuned model demonstrated improved extraction of 4M content (κ = 0.73). Extractions were clustered and labeled, revealing large groups of expressions related to care preferences, medication adjustments, cognitive changes, and mobility issues.</p><p><strong>Discussion: </strong>The preliminary development of the 4M model and 4M taxonomy enables knowledge extraction from clinical text messages and aids future development of a 4M ontology. Results compliment themes and findings in other 4M research.</p><p><strong>Conclusion: </strong>This research underscores the need for consensus building in ontology creation and the role of language models in developing ontologies, while acknowledging their limitations in logical reasoning and ontological commitments. Further development and context expansion with expert involvement of a 4M ontology are necessary.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective: The objectives of this study are to synthesize findings from recent research of retrieval-augmented generation (RAG) and large language models (LLMs) in biomedicine and provide clinical development guidelines to improve effectiveness.
Materials and methods: We conducted a systematic literature review and a meta-analysis. The report was created in adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 analysis. Searches were performed in 3 databases (PubMed, Embase, PsycINFO) using terms related to "retrieval augmented generation" and "large language model," for articles published in 2023 and 2024. We selected studies that compared baseline LLM performance with RAG performance. We developed a random-effect meta-analysis model, using odds ratio as the effect size.
Results: Among 335 studies, 20 were included in this literature review. The pooled effect size was 1.35, with a 95% confidence interval of 1.19-1.53, indicating a statistically significant effect (P = .001). We reported clinical tasks, baseline LLMs, retrieval sources and strategies, as well as evaluation methods.
Discussion: Building on our literature review, we developed Guidelines for Unified Implementation and Development of Enhanced LLM Applications with RAG in Clinical Settings to inform clinical applications using RAG.
Conclusion: Overall, RAG implementation showed a 1.35 odds ratio increase in performance compared to baseline LLMs. Future research should focus on (1) system-level enhancement: the combination of RAG and agent, (2) knowledge-level enhancement: deep integration of knowledge into LLM, and (3) integration-level enhancement: integrating RAG systems within electronic health records.
{"title":"Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines.","authors":"Siru Liu, Allison B McCoy, Adam Wright","doi":"10.1093/jamia/ocaf008","DOIUrl":"https://doi.org/10.1093/jamia/ocaf008","url":null,"abstract":"<p><strong>Objective: </strong>The objectives of this study are to synthesize findings from recent research of retrieval-augmented generation (RAG) and large language models (LLMs) in biomedicine and provide clinical development guidelines to improve effectiveness.</p><p><strong>Materials and methods: </strong>We conducted a systematic literature review and a meta-analysis. The report was created in adherence to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 analysis. Searches were performed in 3 databases (PubMed, Embase, PsycINFO) using terms related to \"retrieval augmented generation\" and \"large language model,\" for articles published in 2023 and 2024. We selected studies that compared baseline LLM performance with RAG performance. We developed a random-effect meta-analysis model, using odds ratio as the effect size.</p><p><strong>Results: </strong>Among 335 studies, 20 were included in this literature review. The pooled effect size was 1.35, with a 95% confidence interval of 1.19-1.53, indicating a statistically significant effect (P = .001). We reported clinical tasks, baseline LLMs, retrieval sources and strategies, as well as evaluation methods.</p><p><strong>Discussion: </strong>Building on our literature review, we developed Guidelines for Unified Implementation and Development of Enhanced LLM Applications with RAG in Clinical Settings to inform clinical applications using RAG.</p><p><strong>Conclusion: </strong>Overall, RAG implementation showed a 1.35 odds ratio increase in performance compared to baseline LLMs. Future research should focus on (1) system-level enhancement: the combination of RAG and agent, (2) knowledge-level enhancement: deep integration of knowledge into LLM, and (3) integration-level enhancement: integrating RAG systems within electronic health records.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabiana Cristina Dos Santos, D Scott Batey, Emma S Kay, Haomiao Jia, Olivia R Wood, Joseph A Abua, Susan A Olender, Rebecca Schnall
Objective: To identify demographic, social, and clinical factors associated with HIV self-management and evaluate whether the CHAMPS intervention is associated with changes in an individual's HIV self-management.
Method: This study was a secondary data analysis from a randomized controlled trial evaluating the effects of the CHAMPS, a mHealth intervention with community health worker sessions, on HIV self-management in New York City (NYC) and Birmingham. Group comparisons and linear regression analyses identified demographic, social, and clinical factors associated with HIV self-management. We calculated interactions between groups (CHAMPS intervention and standard of care) over time (6 and 12 months) following the baseline observation, indicating a difference in the outcome scores from baseline to each time across groups.
Results: Our findings indicate that missing medical appointments, uncertainty about accessing care, and lack of adherence to antiretroviral therapy are associated with lower HIV self-management. For the NYC site, the CHAMPS showed a statistically significant positive effect on daily HIV self-management (estimate = 0.149, SE = 0.069, 95% CI [0.018 to 0.289]). However, no significant effects were observed for social support or the chronic nature of HIV self-management. At the Birmingham site, the CHAMPS did not yield statistically significant effects on HIV self-management outcomes.
Discussion: Our study suggests that CHAMPS intervention enhances daily self-management activities for people with HIV at the NYC site, indicating a promising improvement in routine HIV care.
Conclusion: Further research is necessary to explore how various factors influence HIV self-management over time across different regions.
{"title":"The effect of a combined mHealth and community health worker intervention on HIV self-management.","authors":"Fabiana Cristina Dos Santos, D Scott Batey, Emma S Kay, Haomiao Jia, Olivia R Wood, Joseph A Abua, Susan A Olender, Rebecca Schnall","doi":"10.1093/jamia/ocae322","DOIUrl":"https://doi.org/10.1093/jamia/ocae322","url":null,"abstract":"<p><strong>Objective: </strong>To identify demographic, social, and clinical factors associated with HIV self-management and evaluate whether the CHAMPS intervention is associated with changes in an individual's HIV self-management.</p><p><strong>Method: </strong>This study was a secondary data analysis from a randomized controlled trial evaluating the effects of the CHAMPS, a mHealth intervention with community health worker sessions, on HIV self-management in New York City (NYC) and Birmingham. Group comparisons and linear regression analyses identified demographic, social, and clinical factors associated with HIV self-management. We calculated interactions between groups (CHAMPS intervention and standard of care) over time (6 and 12 months) following the baseline observation, indicating a difference in the outcome scores from baseline to each time across groups.</p><p><strong>Results: </strong>Our findings indicate that missing medical appointments, uncertainty about accessing care, and lack of adherence to antiretroviral therapy are associated with lower HIV self-management. For the NYC site, the CHAMPS showed a statistically significant positive effect on daily HIV self-management (estimate = 0.149, SE = 0.069, 95% CI [0.018 to 0.289]). However, no significant effects were observed for social support or the chronic nature of HIV self-management. At the Birmingham site, the CHAMPS did not yield statistically significant effects on HIV self-management outcomes.</p><p><strong>Discussion: </strong>Our study suggests that CHAMPS intervention enhances daily self-management activities for people with HIV at the NYC site, indicating a promising improvement in routine HIV care.</p><p><strong>Conclusion: </strong>Further research is necessary to explore how various factors influence HIV self-management over time across different regions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}