Kevin B Johnson, Basam Alasaly, Kuk Jin Jang, Eric Eaton, Sriharsha Mopidevi, Ross Koppel
Objective: To support ambulatory care innovation, we created Observer, a multimodal dataset comprising videotaped outpatient visits, electronic health record (EHR) data, and structured surveys. This paper describes the data collection procedures and summarizes the clinical and contextual features of the dataset.
Materials and methods: A multistakeholder steering group shaped recruitment strategies, survey design, and privacy-preserving design. Consented patients and primary care providers (PCPs) were recorded using room-view and egocentric cameras. EHR data, metadata, and audit logs were also captured. A custom de-identification pipeline, combining transcript redaction, voice masking, and facial blurring, ensured video and EHR HIPAA compliance.
Results: We report on the first 100 visits in this continually growing dataset. Thirteen PCPs from 4 clinics participated. Recording the first 100 visits required approaching 210 patients, from which 129 consented (61%), with 29 patients missing their scheduled encounter after consenting. Visit lengths ranged from 5 to 100 minutes, covering preventive care to chronic disease management. Survey responses revealed high satisfaction: 4.24/5 (patients) and 3.94/5 (PCPs). Visit experience was unaffected by the presence of video recording technology.
Discussion: We demonstrate the feasibility of capturing rich, real-world primary care interactions using scalable, privacy-sensitive methods. Room layout and camera placement were key influences on recorded communication and are now added to the dataset. The Observer dataset enables future clinical AI research/development, communication studies, and informatics education among public and private user groups.
Conclusion: Observer is a new, shareable, real-world clinic encounter research and teaching resource with a representative sample of adult primary care data.
{"title":"Observer: creation of a novel multimodal dataset for outpatient care research.","authors":"Kevin B Johnson, Basam Alasaly, Kuk Jin Jang, Eric Eaton, Sriharsha Mopidevi, Ross Koppel","doi":"10.1093/jamia/ocaf182","DOIUrl":"10.1093/jamia/ocaf182","url":null,"abstract":"<p><strong>Objective: </strong>To support ambulatory care innovation, we created Observer, a multimodal dataset comprising videotaped outpatient visits, electronic health record (EHR) data, and structured surveys. This paper describes the data collection procedures and summarizes the clinical and contextual features of the dataset.</p><p><strong>Materials and methods: </strong>A multistakeholder steering group shaped recruitment strategies, survey design, and privacy-preserving design. Consented patients and primary care providers (PCPs) were recorded using room-view and egocentric cameras. EHR data, metadata, and audit logs were also captured. A custom de-identification pipeline, combining transcript redaction, voice masking, and facial blurring, ensured video and EHR HIPAA compliance.</p><p><strong>Results: </strong>We report on the first 100 visits in this continually growing dataset. Thirteen PCPs from 4 clinics participated. Recording the first 100 visits required approaching 210 patients, from which 129 consented (61%), with 29 patients missing their scheduled encounter after consenting. Visit lengths ranged from 5 to 100 minutes, covering preventive care to chronic disease management. Survey responses revealed high satisfaction: 4.24/5 (patients) and 3.94/5 (PCPs). Visit experience was unaffected by the presence of video recording technology.</p><p><strong>Discussion: </strong>We demonstrate the feasibility of capturing rich, real-world primary care interactions using scalable, privacy-sensitive methods. Room layout and camera placement were key influences on recorded communication and are now added to the dataset. The Observer dataset enables future clinical AI research/development, communication studies, and informatics education among public and private user groups.</p><p><strong>Conclusion: </strong>Observer is a new, shareable, real-world clinic encounter research and teaching resource with a representative sample of adult primary care data.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"424-433"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12844583/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145379301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aileen P Wright, Carolynn K Nall, Jacob J H Franklin, Sara N Horst, Yaa A Kumah-Crystal, Adam T Wright, Dara E Mize
Objectives: To report on the feasibility of a simultaneous, enterprise-wide deployment of EHR-integrated ambient scribe technology across a large academic health system.
Materials and methods: On January 15, 2025, ambient scribing was made available to over 2400 ambulatory and emergency department clinicians. We tracked utilization rates, technical support needs, and user feedback.
Results: By March 31, 2025, 20.1% of visit notes incorporated ambient scribing, and 1223 clinicians had used ambient scribing. Among 209 respondents (22.1% of 947 surveyed), 90.9% would be disappointed if they lost access to ambient scribing, and 84.7% reported a positive training experience.
Discussion: Enterprise-wide simultaneous deployment combined with a low-barrier training model enabled immediate access for clinicians and reduced administrative burden by concentrating go-live efforts. Support needs were manageable.
Conclusion: Simultaneous enterprise-wide deployment of ambient scribing was feasible and provided immediate access for clinicians.
{"title":"Enterprise-wide simultaneous deployment of ambient scribe technology: lessons learned from an academic health system.","authors":"Aileen P Wright, Carolynn K Nall, Jacob J H Franklin, Sara N Horst, Yaa A Kumah-Crystal, Adam T Wright, Dara E Mize","doi":"10.1093/jamia/ocaf186","DOIUrl":"10.1093/jamia/ocaf186","url":null,"abstract":"<p><strong>Objectives: </strong>To report on the feasibility of a simultaneous, enterprise-wide deployment of EHR-integrated ambient scribe technology across a large academic health system.</p><p><strong>Materials and methods: </strong>On January 15, 2025, ambient scribing was made available to over 2400 ambulatory and emergency department clinicians. We tracked utilization rates, technical support needs, and user feedback.</p><p><strong>Results: </strong>By March 31, 2025, 20.1% of visit notes incorporated ambient scribing, and 1223 clinicians had used ambient scribing. Among 209 respondents (22.1% of 947 surveyed), 90.9% would be disappointed if they lost access to ambient scribing, and 84.7% reported a positive training experience.</p><p><strong>Discussion: </strong>Enterprise-wide simultaneous deployment combined with a low-barrier training model enabled immediate access for clinicians and reduced administrative burden by concentrating go-live efforts. Support needs were manageable.</p><p><strong>Conclusion: </strong>Simultaneous enterprise-wide deployment of ambient scribing was feasible and provided immediate access for clinicians.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"457-461"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12844588/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145426580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kelly Mohe Li, Jenna Marie Reps, Akihiko Nishimura, Martijn J Schuemie, Marc A Suchard
Objective: To develop a transfer-learning Bayesian sparse logistic regression model that transfers information learned from one dataset to another by using an informed prior to facilitate model fitting in small-sample clinical patient-level prediction problems that suffer from a lack of available information.
Methods: We propose a Bayesian framework for prediction using logistic regression that aims to conduct transfer-learning on regression coefficient information from a larger dataset model (order 105-106 patients by 105 features) into a small-sample model (order 103 patients). Our approach imposes an informed, hierarchical prior on each regression coefficient defined as a discrete mixture of the Bayesian Bridge shrinkage prior and an informed normal distribution. Performance of the informed model is compared against traditional methods, primarily measured by area under the curve, calibration, bias, and sparsity using both simulations and a real-world problem.
Results: Across all experiments, transfer-learning outperformed the traditional L1-regularized model across discrimination, calibration, bias, and sparsity. In fact, even using only a continuous shrinkage prior without the informed prior increased model performance when compared to L1-regularization.
Conclusion: Transfer-learning using informed priors can help fine-tune prediction models in small datasets suffering from a lack of information. One large benefit is in that the prior is not dependent on patient-level information, such that we can conduct transfer-learning without violating privacy. In future work, the model can be applied for learning between disparate databases, or similar lack-of-information cases such as rare outcome prediction.
{"title":"Transfer-learning on federated observational healthcare data for prediction models using Bayesian sparse logistic regression with informed priors.","authors":"Kelly Mohe Li, Jenna Marie Reps, Akihiko Nishimura, Martijn J Schuemie, Marc A Suchard","doi":"10.1093/jamia/ocaf146","DOIUrl":"10.1093/jamia/ocaf146","url":null,"abstract":"<p><strong>Objective: </strong>To develop a transfer-learning Bayesian sparse logistic regression model that transfers information learned from one dataset to another by using an informed prior to facilitate model fitting in small-sample clinical patient-level prediction problems that suffer from a lack of available information.</p><p><strong>Methods: </strong>We propose a Bayesian framework for prediction using logistic regression that aims to conduct transfer-learning on regression coefficient information from a larger dataset model (order 105-106 patients by 105 features) into a small-sample model (order 103 patients). Our approach imposes an informed, hierarchical prior on each regression coefficient defined as a discrete mixture of the Bayesian Bridge shrinkage prior and an informed normal distribution. Performance of the informed model is compared against traditional methods, primarily measured by area under the curve, calibration, bias, and sparsity using both simulations and a real-world problem.</p><p><strong>Results: </strong>Across all experiments, transfer-learning outperformed the traditional L1-regularized model across discrimination, calibration, bias, and sparsity. In fact, even using only a continuous shrinkage prior without the informed prior increased model performance when compared to L1-regularization.</p><p><strong>Conclusion: </strong>Transfer-learning using informed priors can help fine-tune prediction models in small datasets suffering from a lack of information. One large benefit is in that the prior is not dependent on patient-level information, such that we can conduct transfer-learning without violating privacy. In future work, the model can be applied for learning between disparate databases, or similar lack-of-information cases such as rare outcome prediction.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"409-423"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12844582/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145379288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Austin Eliazar, James Thomas Brown, Sara Cinamon, Murat Kantarcioglu, Bradley Malin
Objective: Privacy preserving record linkage (PPRL) refers to techniques used to identify which records refer to the same person across disparate datasets while safeguarding their identities. PPRL is increasingly relied upon to facilitate biomedical research. A common strategy encodes personally identifying information for comparison without disclosing underlying identifiers. As the scale of research datasets expands, it becomes crucial to reassess the privacy risks associated with these encodings. This paper highlights the potential re-identification risks of some of these encodings, demonstrating an attack that exploits encoding repetition across patients.
Materials and methods: The attack leverages repeated PPRL encoding values combined with common demographics shared during PPRL in the clear (e.g., 3-digit ZIP code) to distinguish encodings from one another and ultimately link them to identities in a reference dataset. Using US Census statistics and voter registries, we empirically estimate encodings' re-identification risk against such an attack, while varying multiple factors that influence the risk.
Results: Re-identification risk for PPRL encodings increases with population size, number of distinct encodings per patient, and amount of demographic information available. Commonly used encodings typically grow from <1% re-identification rate for datasets under one million individuals to 10%-20% for 250 million individuals.
Discussion and conclusion: Re-identification risk often remains low in smaller populations, but increases significantly at the larger scales increasingly encountered today. These risks are common in many PPRL implementations, although, as our work shows, they are avoidable. Choosing better tokens or matching tokens through a third party without the underlying demographics effectively eliminates these risks.
{"title":"Re-identification risk for common privacy preserving patient matching strategies when shared with de-identified demographics.","authors":"Austin Eliazar, James Thomas Brown, Sara Cinamon, Murat Kantarcioglu, Bradley Malin","doi":"10.1093/jamia/ocaf183","DOIUrl":"10.1093/jamia/ocaf183","url":null,"abstract":"<p><strong>Objective: </strong>Privacy preserving record linkage (PPRL) refers to techniques used to identify which records refer to the same person across disparate datasets while safeguarding their identities. PPRL is increasingly relied upon to facilitate biomedical research. A common strategy encodes personally identifying information for comparison without disclosing underlying identifiers. As the scale of research datasets expands, it becomes crucial to reassess the privacy risks associated with these encodings. This paper highlights the potential re-identification risks of some of these encodings, demonstrating an attack that exploits encoding repetition across patients.</p><p><strong>Materials and methods: </strong>The attack leverages repeated PPRL encoding values combined with common demographics shared during PPRL in the clear (e.g., 3-digit ZIP code) to distinguish encodings from one another and ultimately link them to identities in a reference dataset. Using US Census statistics and voter registries, we empirically estimate encodings' re-identification risk against such an attack, while varying multiple factors that influence the risk.</p><p><strong>Results: </strong>Re-identification risk for PPRL encodings increases with population size, number of distinct encodings per patient, and amount of demographic information available. Commonly used encodings typically grow from <1% re-identification rate for datasets under one million individuals to 10%-20% for 250 million individuals.</p><p><strong>Discussion and conclusion: </strong>Re-identification risk often remains low in smaller populations, but increases significantly at the larger scales increasingly encountered today. These risks are common in many PPRL implementations, although, as our work shows, they are avoidable. Choosing better tokens or matching tokens through a third party without the underlying demographics effectively eliminates these risks.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"336-346"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12844594/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145314065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiming Li, Xueqing Peng, Suyuan Peng, Jianfu Li, Donghong Pei, Qin Zhang, Yiwei Lu, Yan Hu, Fang Li, Li Zhou, Yongqun He, Cui Tao, Hua Xu, Na Hong
Background: Acupuncture, a key modality in traditional Chinese medicine, is gaining global recognition as a complementary therapy and a subject of increasing scientific interest. However, fragmented and unstructured acupuncture knowledge spread across diverse sources poses challenges for semantic retrieval, reasoning, and in-depth analysis. To address this gap, we developed AcuKG, a comprehensive knowledge graph that systematically organizes acupuncture-related knowledge to support sharing, discovery, and artificial intelligence-driven innovation in the field.
Methods: AcuKG integrates data from multiple sources, including online resources, guidelines, PubMed literature, ClinicalTrials.gov, and multiple ontologies (SNOMED CT, UBERON, and MeSH). We employed entity recognition, relation extraction, and ontology mapping to establish AcuKG, with human-in-the-loop to ensure data quality. Two cases evaluated AcuKG's usability: (1) how AcuKG advances acupuncture research for obesity and (2) how AcuKG enhances large language model (LLM) application on acupuncture question-answering.
Results: AcuKG comprises 1839 entities and 11 527 relations, mapped to 1836 standard concepts in 3 ontologies. Two use cases demonstrated AcuKG's effectiveness and potential in advancing acupuncture research and supporting LLM applications. In the obesity use case, AcuKG identified highly relevant acupoints (eg, ST25, ST36) and uncovered novel research insights based on evidence from clinical trials and literature. When applied to LLMs in answering acupuncture-related questions, integrating AcuKG with GPT-4o and LLaMA 3 significantly improved accuracy (GPT-4o: 46% → 54%, P = .03; LLaMA 3: 17% → 28%, P = .01).
Conclusion: AcuKG is an open dataset that provides a structured and computational framework for acupuncture applications, bridging traditional practices with acupuncture research and cutting-edge LLM technologies.
{"title":"AcuKG: a comprehensive knowledge graph for medical acupuncture.","authors":"Yiming Li, Xueqing Peng, Suyuan Peng, Jianfu Li, Donghong Pei, Qin Zhang, Yiwei Lu, Yan Hu, Fang Li, Li Zhou, Yongqun He, Cui Tao, Hua Xu, Na Hong","doi":"10.1093/jamia/ocaf179","DOIUrl":"10.1093/jamia/ocaf179","url":null,"abstract":"<p><strong>Background: </strong>Acupuncture, a key modality in traditional Chinese medicine, is gaining global recognition as a complementary therapy and a subject of increasing scientific interest. However, fragmented and unstructured acupuncture knowledge spread across diverse sources poses challenges for semantic retrieval, reasoning, and in-depth analysis. To address this gap, we developed AcuKG, a comprehensive knowledge graph that systematically organizes acupuncture-related knowledge to support sharing, discovery, and artificial intelligence-driven innovation in the field.</p><p><strong>Methods: </strong>AcuKG integrates data from multiple sources, including online resources, guidelines, PubMed literature, ClinicalTrials.gov, and multiple ontologies (SNOMED CT, UBERON, and MeSH). We employed entity recognition, relation extraction, and ontology mapping to establish AcuKG, with human-in-the-loop to ensure data quality. Two cases evaluated AcuKG's usability: (1) how AcuKG advances acupuncture research for obesity and (2) how AcuKG enhances large language model (LLM) application on acupuncture question-answering.</p><p><strong>Results: </strong>AcuKG comprises 1839 entities and 11 527 relations, mapped to 1836 standard concepts in 3 ontologies. Two use cases demonstrated AcuKG's effectiveness and potential in advancing acupuncture research and supporting LLM applications. In the obesity use case, AcuKG identified highly relevant acupoints (eg, ST25, ST36) and uncovered novel research insights based on evidence from clinical trials and literature. When applied to LLMs in answering acupuncture-related questions, integrating AcuKG with GPT-4o and LLaMA 3 significantly improved accuracy (GPT-4o: 46% → 54%, P = .03; LLaMA 3: 17% → 28%, P = .01).</p><p><strong>Conclusion: </strong>AcuKG is an open dataset that provides a structured and computational framework for acupuncture applications, bridging traditional practices with acupuncture research and cutting-edge LLM technologies.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"359-370"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12844574/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michelle H Nguyen, Carolyn D Applegate, Brittney Murray, Ayah Zirikly, Crystal Tichnell, Catherine Gordon, Lisa R Yanek, Cynthia A James, Casey Overby Taylor
Objective: To build natural language processing (NLP) strategies to characterize measures of genetic counseling (GC) efficiency and classify measures according to phase of GC (pre- or post-genetic testing).
Materials and methods: This study selected and annotated 800 GC notes from 7 clinical specialties in a large academic medical center for NLP model development and validation. The NLP approaches extracted GC efficiency measures, including direct and indirect time and GC phase. The models were then applied to 24 102 GC notes collected from January 2016 through December 2023.
Results: NLP approaches performed well (F1 scores of 0.95 and 0.90 for direct time in GC and GC phase classification, respectively). Our findings showed median direct time in GC of 50 minutes, with significant differences in direct time distributions observed across clinical specialties, time periods (2016-2019 or 2020-2023), delivery modes (in person or telehealth), and GC phase.
Discussion: As referrals to GC increase, there is increasing pressure to improve efficiency. Our NLP strategy was used to generate and summarize real-world evidence of GC time for 7 clinical specialties. These approaches enable future research on the impact of interventions intended to improve GC efficiency.
Conclusion: This work demonstrated the practical value of NLP to provide a useful and scalable strategy to generate real world evidence of GC efficiency. Principles presented in this work may also be valuable for health services research in other practice areas.
{"title":"Assessing genetic counseling efficiency with natural language processing.","authors":"Michelle H Nguyen, Carolyn D Applegate, Brittney Murray, Ayah Zirikly, Crystal Tichnell, Catherine Gordon, Lisa R Yanek, Cynthia A James, Casey Overby Taylor","doi":"10.1093/jamia/ocaf190","DOIUrl":"10.1093/jamia/ocaf190","url":null,"abstract":"<p><strong>Objective: </strong>To build natural language processing (NLP) strategies to characterize measures of genetic counseling (GC) efficiency and classify measures according to phase of GC (pre- or post-genetic testing).</p><p><strong>Materials and methods: </strong>This study selected and annotated 800 GC notes from 7 clinical specialties in a large academic medical center for NLP model development and validation. The NLP approaches extracted GC efficiency measures, including direct and indirect time and GC phase. The models were then applied to 24 102 GC notes collected from January 2016 through December 2023.</p><p><strong>Results: </strong>NLP approaches performed well (F1 scores of 0.95 and 0.90 for direct time in GC and GC phase classification, respectively). Our findings showed median direct time in GC of 50 minutes, with significant differences in direct time distributions observed across clinical specialties, time periods (2016-2019 or 2020-2023), delivery modes (in person or telehealth), and GC phase.</p><p><strong>Discussion: </strong>As referrals to GC increase, there is increasing pressure to improve efficiency. Our NLP strategy was used to generate and summarize real-world evidence of GC time for 7 clinical specialties. These approaches enable future research on the impact of interventions intended to improve GC efficiency.</p><p><strong>Conclusion: </strong>This work demonstrated the practical value of NLP to provide a useful and scalable strategy to generate real world evidence of GC efficiency. Principles presented in this work may also be valuable for health services research in other practice areas.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"295-303"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12743353/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura K Wiley, Luke V Rasmussen, Rebecca T Levinson, Jennnifer Malinowski, Sheila M Manemann, Melissa P Wilson, Martin Chapman, Jennifer A Pacheco, Theresa L Walunas, Justin B Starren, Suzette J Bielinski, Rachel L Richesson
Background: Computational phenotyping from electronic health records (EHRs) is essential for clinical research, decision support, and quality/population health assessment, but the proliferation of algorithms for the same conditions makes it difficult to identify which algorithm is most appropriate for reuse.
Objective: To develop a framework for assessing phenotyping algorithm fitness for purpose and reuse.
Fitness for purpose: Phenotyping algorithms are fit for purpose when they identify the intended population with performance characteristics appropriate for the intended application.
Fitness for reuse: Phenotyping algorithms are fit for reuse when the algorithm is implementable and generalizable-that is, it identifies the same intended population with similar performance characteristics when applied to a new setting.
Conclusions: The PhenoFit framework provides a structured approach to evaluate and adapt phenotyping algorithms for new contexts increasing efficiency and consistency of identifying patient populations from EHRs.
{"title":"PhenoFit: a framework for determining computable phenotyping algorithm fitness for purpose and reuse.","authors":"Laura K Wiley, Luke V Rasmussen, Rebecca T Levinson, Jennnifer Malinowski, Sheila M Manemann, Melissa P Wilson, Martin Chapman, Jennifer A Pacheco, Theresa L Walunas, Justin B Starren, Suzette J Bielinski, Rachel L Richesson","doi":"10.1093/jamia/ocaf195","DOIUrl":"10.1093/jamia/ocaf195","url":null,"abstract":"<p><strong>Background: </strong>Computational phenotyping from electronic health records (EHRs) is essential for clinical research, decision support, and quality/population health assessment, but the proliferation of algorithms for the same conditions makes it difficult to identify which algorithm is most appropriate for reuse.</p><p><strong>Objective: </strong>To develop a framework for assessing phenotyping algorithm fitness for purpose and reuse.</p><p><strong>Fitness for purpose: </strong>Phenotyping algorithms are fit for purpose when they identify the intended population with performance characteristics appropriate for the intended application.</p><p><strong>Fitness for reuse: </strong>Phenotyping algorithms are fit for reuse when the algorithm is implementable and generalizable-that is, it identifies the same intended population with similar performance characteristics when applied to a new setting.</p><p><strong>Conclusions: </strong>The PhenoFit framework provides a structured approach to evaluate and adapt phenotyping algorithms for new contexts increasing efficiency and consistency of identifying patient populations from EHRs.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"536-542"},"PeriodicalIF":4.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12844593/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145507875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Santosh Purja Pun, Oliver Obst, Jim Basilakis, Jeewani Anupama Ginige
Objectives: Mapping clinical classification systems, such as the International Classification of Diseases (ICD), is essential yet challenging. While the manual mapping method remains labor-intensive and lacks scalability, existing embedding-based automatic mapping methods, particularly those leveraging transformer-based pretrained encoders, encounter 2 persistent challenges: (1) linguistic variation and (2) varying granular details in clinical conditions.
Materials and methods: We introduce an automatic mapping method that combines the representational power of pretrained encoders with the reasoning capability of large language models (LLMs). For each ICD code, we generate: (1) hierarchy-augmented (HA) and (2) LLM-generated (LG) descriptions to capture rich semantic nuances, addressing linguistic variation. Furthermore, we introduced a prompting framework (PR) that leverages LLM reasoning to handle granularity mismatches, including source-to-parent mappings.
Results: Chapterwise mappings were performed between ICD versions (ICD-9-CM↔ICD-10-CM and ICD-10-AM↔ICD-11) using multiple LLMs. The proposed approach consistently outperformed the baseline across all ICD pairs and chapters. For example, combining HA descriptions with Qwen3-8B-generated descriptions yielded an average top-1 accuracy improvement of 6.5% (0.065) across the mapping cases. A small-scale pilot study further indicated that HA+LG remains effective in more challenging one-to-many mappings.
Conclusions: Our findings demonstrate that integrating the representational power of pretrained encoders with LLM reasoning offers a robust, scalable strategy for automatic ICD mapping.
{"title":"On embedding-based automatic mapping of clinical classification system: handling linguistic variations and granular inconsistencies.","authors":"Santosh Purja Pun, Oliver Obst, Jim Basilakis, Jeewani Anupama Ginige","doi":"10.1093/jamia/ocag004","DOIUrl":"https://doi.org/10.1093/jamia/ocag004","url":null,"abstract":"<p><strong>Objectives: </strong>Mapping clinical classification systems, such as the International Classification of Diseases (ICD), is essential yet challenging. While the manual mapping method remains labor-intensive and lacks scalability, existing embedding-based automatic mapping methods, particularly those leveraging transformer-based pretrained encoders, encounter 2 persistent challenges: (1) linguistic variation and (2) varying granular details in clinical conditions.</p><p><strong>Materials and methods: </strong>We introduce an automatic mapping method that combines the representational power of pretrained encoders with the reasoning capability of large language models (LLMs). For each ICD code, we generate: (1) hierarchy-augmented (HA) and (2) LLM-generated (LG) descriptions to capture rich semantic nuances, addressing linguistic variation. Furthermore, we introduced a prompting framework (PR) that leverages LLM reasoning to handle granularity mismatches, including source-to-parent mappings.</p><p><strong>Results: </strong>Chapterwise mappings were performed between ICD versions (ICD-9-CM↔ICD-10-CM and ICD-10-AM↔ICD-11) using multiple LLMs. The proposed approach consistently outperformed the baseline across all ICD pairs and chapters. For example, combining HA descriptions with Qwen3-8B-generated descriptions yielded an average top-1 accuracy improvement of 6.5% (0.065) across the mapping cases. A small-scale pilot study further indicated that HA+LG remains effective in more challenging one-to-many mappings.</p><p><strong>Conclusions: </strong>Our findings demonstrate that integrating the representational power of pretrained encoders with LLM reasoning offers a robust, scalable strategy for automatic ICD mapping.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahanazuddin Syed, Muayad Hamidi, Manju Bikkanuri, Nicole Adele Dierschke, Haritha Vardhini Katragadda, Meredith Zozus, Antonio Lucio Teixeira
Objectives: To evaluate the performance of a locally deployed adaptation of TrialGPT, a large language model (LLM) system for identifying trial-eligible patients from unstructured electronic health record (EHR) data.
Materials and methods: TrialGPT was re-engineered for secure, deployment at UT Health San Antonio using a locally hosted LLM. It was optimized for real-world data needs through a longitudinal patient-encounter-note hierarchy mirroring EHR documentation. Performance was evaluated in two stages: (1) benchmarking against an expert-adjudicated gold corpus (n = 149) and (2) comparative validation against manual screening (n = 55).
Results: Against the expert-adjudicated corpus, the system achieved 81.8% sensitivity, 97.8% specificity, and a positive predictive value of 75.0%. Compared with manual screening, it identified more than twice as many truly eligible patients (81.8% vs 36.4%) while preserving equivalent specificity.
Conclusion: The adapted TrialGPT framework operationalizes trial matching, translating EHR data into actionable screening intelligence for efficient, scalable clinical trial recruitment.
目的:评估本地部署的TrialGPT适应性的性能,TrialGPT是一种大型语言模型(LLM)系统,用于从非结构化电子健康记录(EHR)数据中识别符合试验条件的患者。材料和方法:TrialGPT经过重新设计,在UT Health San Antonio使用本地托管的LLM进行安全部署。它通过纵向的病人-遇到-笔记层次结构镜像EHR文档,针对现实世界的数据需求进行了优化。性能评估分为两个阶段:(1)针对专家评审的黄金语料库(n = 149)和(2)针对人工筛选的比较验证(n = 55)进行基准测试。结果:针对专家判定的语料库,该系统的敏感性为81.8%,特异性为97.8%,阳性预测值为75.0%。与人工筛查相比,它识别出的真正符合条件的患者数量是人工筛查的两倍多(81.8%对36.4%),同时保留了相同的特异性。结论:经过调整的TrialGPT框架可实现试验匹配,将电子病历数据转化为可操作的筛选情报,以实现高效、可扩展的临床试验招募。
{"title":"Translating evidence into practice: adapting TrialGPT for real-world clinical trial eligibility screening.","authors":"Mahanazuddin Syed, Muayad Hamidi, Manju Bikkanuri, Nicole Adele Dierschke, Haritha Vardhini Katragadda, Meredith Zozus, Antonio Lucio Teixeira","doi":"10.1093/jamia/ocag006","DOIUrl":"https://doi.org/10.1093/jamia/ocag006","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the performance of a locally deployed adaptation of TrialGPT, a large language model (LLM) system for identifying trial-eligible patients from unstructured electronic health record (EHR) data.</p><p><strong>Materials and methods: </strong>TrialGPT was re-engineered for secure, deployment at UT Health San Antonio using a locally hosted LLM. It was optimized for real-world data needs through a longitudinal patient-encounter-note hierarchy mirroring EHR documentation. Performance was evaluated in two stages: (1) benchmarking against an expert-adjudicated gold corpus (n = 149) and (2) comparative validation against manual screening (n = 55).</p><p><strong>Results: </strong>Against the expert-adjudicated corpus, the system achieved 81.8% sensitivity, 97.8% specificity, and a positive predictive value of 75.0%. Compared with manual screening, it identified more than twice as many truly eligible patients (81.8% vs 36.4%) while preserving equivalent specificity.</p><p><strong>Conclusion: </strong>The adapted TrialGPT framework operationalizes trial matching, translating EHR data into actionable screening intelligence for efficient, scalable clinical trial recruitment.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146120277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huixue Zhou, Lisa Chow, Lisa Harnack, Satchidananda Panda, Emily N C Manoogian, Mingchen Li, Yongkang Xiao, Rui Zhang
Objectives: This study explores the use of advanced natural language processing (NLP) techniques to enhance food classification and dietary analysis using raw text input from a diet tracking app.
Materials and methods: The study was conducted in 3 stages: data collection, framework development, and application. Data were collected from a 12-week randomized controlled trial (RCT: NCT04259632), in which participants recorded their meals in free-text format using the myCircadianClock app. Only de-identified data were used. We developed nutrition-focused retrieval-augmented generation (NutriRAG), an NLP framework that uses a retrieval-augmented generation approach to enhance food classification from free-text inputs. The framework retrieves relevant examples from a curated database and then leverages large language models, such as GPT-4, to classify user-recorded food items into predefined categories without fine-tuning. NutriRAG was then applied to data from the RCT, which included 77 adults with obesity recruited from the Twin Cities metro area and randomized into 3 intervention groups: time-restricted eating (TRE, 8-hs eating window), caloric restriction (CR, 15% reduction), and unrestricted eating.
Results: NutriRAG significantly enhanced classification accuracy and helped to analyze dietary habits, as noted by the retrieval-augmented GPT-4 model achieving a micro-F1 score of 82.24. Both interventions showed dietary alterations: CR participants ate fewer snacks and sugary foods, while TRE participants reduced nighttime eating.
Conclusion: By using artificial intelligence, NutriRAG marks a substantial advancement in food classification and dietary analysis of nutritional assessments. The findings highlight NLP's potential to personalize nutrition and manage diet-related health issues, suggesting further research to expand these models for wider use.
{"title":"NutriRAG: unleashing the power of large language models for food identification and classification through retrieval methods.","authors":"Huixue Zhou, Lisa Chow, Lisa Harnack, Satchidananda Panda, Emily N C Manoogian, Mingchen Li, Yongkang Xiao, Rui Zhang","doi":"10.1093/jamia/ocag003","DOIUrl":"10.1093/jamia/ocag003","url":null,"abstract":"<p><strong>Objectives: </strong>This study explores the use of advanced natural language processing (NLP) techniques to enhance food classification and dietary analysis using raw text input from a diet tracking app.</p><p><strong>Materials and methods: </strong>The study was conducted in 3 stages: data collection, framework development, and application. Data were collected from a 12-week randomized controlled trial (RCT: NCT04259632), in which participants recorded their meals in free-text format using the myCircadianClock app. Only de-identified data were used. We developed nutrition-focused retrieval-augmented generation (NutriRAG), an NLP framework that uses a retrieval-augmented generation approach to enhance food classification from free-text inputs. The framework retrieves relevant examples from a curated database and then leverages large language models, such as GPT-4, to classify user-recorded food items into predefined categories without fine-tuning. NutriRAG was then applied to data from the RCT, which included 77 adults with obesity recruited from the Twin Cities metro area and randomized into 3 intervention groups: time-restricted eating (TRE, 8-hs eating window), caloric restriction (CR, 15% reduction), and unrestricted eating.</p><p><strong>Results: </strong>NutriRAG significantly enhanced classification accuracy and helped to analyze dietary habits, as noted by the retrieval-augmented GPT-4 model achieving a micro-F1 score of 82.24. Both interventions showed dietary alterations: CR participants ate fewer snacks and sugary foods, while TRE participants reduced nighttime eating.</p><p><strong>Conclusion: </strong>By using artificial intelligence, NutriRAG marks a substantial advancement in food classification and dietary analysis of nutritional assessments. The findings highlight NLP's potential to personalize nutrition and manage diet-related health issues, suggesting further research to expand these models for wider use.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}