Wanxin Li, Saad Ahmed, Yongjin P Park, Khanh Dao Duc
Objectives: Electronic Health Records (EHRs) sampled from different populations can introduce unwanted biases, limit individual-level data sharing, and make the data and fitted model hardly transferable across different population groups. In this context, our main goal is to design an effective method to transfer knowledge between population groups, with computable guarantees for suitability, and that can be applied to quantify treatment disparities.
Materials and methods: For a model trained in an embedded feature space of one subgroup, our proposed framework, Optimal Transport-based Transfer Learning for EHRs (OTTEHR), combines feature embedding of the data and unbalanced optimal transport (OT) for domain adaptation to another population group. To test our method, we processed and divided the MIMIC-III and MIMIC-IV databases into multiple population groups using ICD codes and multiple labels.
Results: We derive a theoretical bound for the generalization error of our method, and interpret it in terms of the Wasserstein distance, unbalancedness between the source and target domains, and labeling divergence, which can be used as a guide for assessing the suitability of binary classification and regression tasks. In general, our method achieves better accuracy and computational efficiency compared with standard and machine learning transfer learning methods on various tasks. Upon testing our method for populations with different insurance plans, we detect various levels of disparities in hospital duration stay between groups.
Discussion and conclusion: By leveraging tools from OT theory, our proposed framework allows to compare statistical models on EHR data between different population groups. As a potential application for clinical decision making, we quantify treatment disparities between different population groups. Future directions include applying OTTEHR to broader regression and classification tasks and extending the method to semi-supervised learning.
{"title":"Transport-based transfer learning on Electronic Health Records: application to detection of treatment disparities.","authors":"Wanxin Li, Saad Ahmed, Yongjin P Park, Khanh Dao Duc","doi":"10.1093/jamia/ocaf134","DOIUrl":"10.1093/jamia/ocaf134","url":null,"abstract":"<p><strong>Objectives: </strong>Electronic Health Records (EHRs) sampled from different populations can introduce unwanted biases, limit individual-level data sharing, and make the data and fitted model hardly transferable across different population groups. In this context, our main goal is to design an effective method to transfer knowledge between population groups, with computable guarantees for suitability, and that can be applied to quantify treatment disparities.</p><p><strong>Materials and methods: </strong>For a model trained in an embedded feature space of one subgroup, our proposed framework, Optimal Transport-based Transfer Learning for EHRs (OTTEHR), combines feature embedding of the data and unbalanced optimal transport (OT) for domain adaptation to another population group. To test our method, we processed and divided the MIMIC-III and MIMIC-IV databases into multiple population groups using ICD codes and multiple labels.</p><p><strong>Results: </strong>We derive a theoretical bound for the generalization error of our method, and interpret it in terms of the Wasserstein distance, unbalancedness between the source and target domains, and labeling divergence, which can be used as a guide for assessing the suitability of binary classification and regression tasks. In general, our method achieves better accuracy and computational efficiency compared with standard and machine learning transfer learning methods on various tasks. Upon testing our method for populations with different insurance plans, we detect various levels of disparities in hospital duration stay between groups.</p><p><strong>Discussion and conclusion: </strong>By leveraging tools from OT theory, our proposed framework allows to compare statistical models on EHR data between different population groups. As a potential application for clinical decision making, we quantify treatment disparities between different population groups. Future directions include applying OTTEHR to broader regression and classification tasks and extending the method to semi-supervised learning.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"15-25"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758479/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144994218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective: Accurately measuring patient similarity is essential for precision medicine, enabling personalized predictive modeling, disease subtyping, and individualized treatment by identifying patients with similar characteristics to an index patient. This study aims to develop an electronic health record-based patient similarity estimation framework to enhance personalized predictive modeling for Acute Kidney Injury (AKI), a complex and life-threatening condition where accurate prediction is critical for timely intervention.
Materials and methods: We introduce Similarity Measurement for Acute Kidney Injury Risk Tracking (SMART), a new patient similarity estimation framework with 3 key enhancements: (1) overlap weighting to adjust similarity scores; (2) distance measure optimization; and (3) feature type weight optimization. These enhancements were evaluated using internal and external validation datasets from 2 tertiary academic hospitals to predict AKI risk across varying group sizes of similar patients.
Results: The study analyzed data from 8637 patients in the reference patient pool and 8542 patients in each of the internal and external test sets. Each enhancement was independently evaluated while controlling for other variables to determine its impact on prediction performance. SMART consistently outperformed 3 baseline models on both the internal and external test sets (P<.05) and demonstrated improved performance in certain subpopulations with unique health profiles compared to a traditional machine learning approach.
Discussion: SMART improves the identification of high-quality similar patient groups, enhancing the accuracy of personalized AKI prediction across various group sizes. By accurately identifying clinically relevant similar patients, clinicians can tailor treatments more effectively, advancing personalized care.
{"title":"SMART: a new patient similarity estimation framework for enhanced predictive modeling in acute kidney injury.","authors":"Deyi Li, Alan S L Yu, Dana Y Fuhrman, Mei Liu","doi":"10.1093/jamia/ocaf125","DOIUrl":"10.1093/jamia/ocaf125","url":null,"abstract":"<p><strong>Objective: </strong>Accurately measuring patient similarity is essential for precision medicine, enabling personalized predictive modeling, disease subtyping, and individualized treatment by identifying patients with similar characteristics to an index patient. This study aims to develop an electronic health record-based patient similarity estimation framework to enhance personalized predictive modeling for Acute Kidney Injury (AKI), a complex and life-threatening condition where accurate prediction is critical for timely intervention.</p><p><strong>Materials and methods: </strong>We introduce Similarity Measurement for Acute Kidney Injury Risk Tracking (SMART), a new patient similarity estimation framework with 3 key enhancements: (1) overlap weighting to adjust similarity scores; (2) distance measure optimization; and (3) feature type weight optimization. These enhancements were evaluated using internal and external validation datasets from 2 tertiary academic hospitals to predict AKI risk across varying group sizes of similar patients.</p><p><strong>Results: </strong>The study analyzed data from 8637 patients in the reference patient pool and 8542 patients in each of the internal and external test sets. Each enhancement was independently evaluated while controlling for other variables to determine its impact on prediction performance. SMART consistently outperformed 3 baseline models on both the internal and external test sets (P<.05) and demonstrated improved performance in certain subpopulations with unique health profiles compared to a traditional machine learning approach.</p><p><strong>Discussion: </strong>SMART improves the identification of high-quality similar patient groups, enhancing the accuracy of personalized AKI prediction across various group sizes. By accurately identifying clinically relevant similar patients, clinicians can tailor treatments more effectively, advancing personalized care.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"37-48"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758465/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seth Russell, Peter E DeWitt, Laura Helmkamp, Kathryn Colborn, Charlotte Gray, Margaret Rebull, Yamila L Sierra, Rachel Greer, Lexi Petruccelli, Sara Shankman, Todd C Hankinson, Fuyong Xing, David J Albers, Tellen D Bennett
Objective: Clinicians currently make decisions about placing an intracranial pressure (ICP) monitor in children with traumatic brain injury (TBI) without the benefit of an accurate clinical decision support tool. The goal of this study was to develop and validate a model that predicts placement of an ICP monitor and updates as new information becomes available.
Materials and methods: A prospective observational cohort study was conducted from September 2014 to January 2024. The setting included one US hospital designated as an American College of Surgeons Level 1 Pediatric Trauma Center. Participants were 389 children with acute TBI admitted to the ICU who had at least one Glasgow Coma Scale (GCS) score ≤ 8 or intubation with at least one GCS-Motor ≤ 5. We excluded children who received ICP monitors prior to arrival, those with GCS = 3 and bilateral fixed, dilated pupils, and those with a do not resuscitate order.
Results: Of the 389 participants, 138 received ICP monitoring. Several machine learning models, including a recurrent neural network (RNN), were developed and validated using 4 combinations of input data. The best performing model, an RNN, achieved an F1 of 0.71 within 720 minutes of hospital arrival. The cumulative F1 of the RNN from minute 0 to 720 was 0.61. The best performing non-neural network model, standard logistic regression, achieved an F1 of 0.36 within 720 minutes of hospital arrival.
Conclusions: These findings will contribute to design and implementation of a multidisciplinary clinical decision support tool for ICP monitor placement in children with TBI.
{"title":"Predicting intracranial pressure monitor placement in children with traumatic brain injury: a prospective cohort study to develop a clinical decision support tool.","authors":"Seth Russell, Peter E DeWitt, Laura Helmkamp, Kathryn Colborn, Charlotte Gray, Margaret Rebull, Yamila L Sierra, Rachel Greer, Lexi Petruccelli, Sara Shankman, Todd C Hankinson, Fuyong Xing, David J Albers, Tellen D Bennett","doi":"10.1093/jamia/ocaf120","DOIUrl":"10.1093/jamia/ocaf120","url":null,"abstract":"<p><strong>Objective: </strong>Clinicians currently make decisions about placing an intracranial pressure (ICP) monitor in children with traumatic brain injury (TBI) without the benefit of an accurate clinical decision support tool. The goal of this study was to develop and validate a model that predicts placement of an ICP monitor and updates as new information becomes available.</p><p><strong>Materials and methods: </strong>A prospective observational cohort study was conducted from September 2014 to January 2024. The setting included one US hospital designated as an American College of Surgeons Level 1 Pediatric Trauma Center. Participants were 389 children with acute TBI admitted to the ICU who had at least one Glasgow Coma Scale (GCS) score ≤ 8 or intubation with at least one GCS-Motor ≤ 5. We excluded children who received ICP monitors prior to arrival, those with GCS = 3 and bilateral fixed, dilated pupils, and those with a do not resuscitate order.</p><p><strong>Results: </strong>Of the 389 participants, 138 received ICP monitoring. Several machine learning models, including a recurrent neural network (RNN), were developed and validated using 4 combinations of input data. The best performing model, an RNN, achieved an F1 of 0.71 within 720 minutes of hospital arrival. The cumulative F1 of the RNN from minute 0 to 720 was 0.61. The best performing non-neural network model, standard logistic regression, achieved an F1 of 0.36 within 720 minutes of hospital arrival.</p><p><strong>Conclusions: </strong>These findings will contribute to design and implementation of a multidisciplinary clinical decision support tool for ICP monitor placement in children with TBI.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"182-192"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758473/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144762166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alison W Xin, Dylan M Nielson, Karolin Rose Krause, Guilherme Fiorini, Nick Midgley, Francisco Pereira, Juan Antonio Lossio-Ventura
Objective: We aim to use large language models (LLMs) to detect mentions of nuanced psychotherapeutic outcomes and impacts than previously considered in transcripts of interviews with adolescent depression. Our clinical authors previously created a novel coding framework containing fine-grained therapy outcomes beyond the binary classification (eg, depression vs control) based on qualitative analysis embedded within a clinical study of depression. Moreover, we seek to demonstrate that embeddings from LLMs are informative enough to accurately label these experiences.
Materials and methods: Data were drawn from interviews, where text segments were annotated with different outcome labels. Five different open-source LLMs were evaluated to classify outcomes from the coding framework. Classification experiments were carried out in the original interview transcripts. Furthermore, we repeated those experiments for versions of the data produced by breaking those segments into conversation turns, or keeping non-interviewer utterances (monologues).
Results: We used classification models to predict 31 outcomes and 8 derived labels, for 3 different text segmentations. Area under the ROC curve scores ranged between 0.6 and 0.9 for the original segmentation and 0.7 and 1.0 for the monologues and turns.
Discussion: LLM-based classification models could identify outcomes important to adolescents, such as friendships or academic and vocational functioning, in text transcripts of patient interviews. By using clinical data, we also aim to better generalize to clinical settings compared to studies based on public social media data.
Conclusion: Our results demonstrate that fine-grained therapy outcome coding in psychotherapeutic text is feasible, and can be used to support the quantification of important outcomes for downstream uses.
{"title":"Using large language models to detect outcomes in qualitative studies of adolescent depression.","authors":"Alison W Xin, Dylan M Nielson, Karolin Rose Krause, Guilherme Fiorini, Nick Midgley, Francisco Pereira, Juan Antonio Lossio-Ventura","doi":"10.1093/jamia/ocae298","DOIUrl":"10.1093/jamia/ocae298","url":null,"abstract":"<p><strong>Objective: </strong>We aim to use large language models (LLMs) to detect mentions of nuanced psychotherapeutic outcomes and impacts than previously considered in transcripts of interviews with adolescent depression. Our clinical authors previously created a novel coding framework containing fine-grained therapy outcomes beyond the binary classification (eg, depression vs control) based on qualitative analysis embedded within a clinical study of depression. Moreover, we seek to demonstrate that embeddings from LLMs are informative enough to accurately label these experiences.</p><p><strong>Materials and methods: </strong>Data were drawn from interviews, where text segments were annotated with different outcome labels. Five different open-source LLMs were evaluated to classify outcomes from the coding framework. Classification experiments were carried out in the original interview transcripts. Furthermore, we repeated those experiments for versions of the data produced by breaking those segments into conversation turns, or keeping non-interviewer utterances (monologues).</p><p><strong>Results: </strong>We used classification models to predict 31 outcomes and 8 derived labels, for 3 different text segmentations. Area under the ROC curve scores ranged between 0.6 and 0.9 for the original segmentation and 0.7 and 1.0 for the monologues and turns.</p><p><strong>Discussion: </strong>LLM-based classification models could identify outcomes important to adolescents, such as friendships or academic and vocational functioning, in text transcripts of patient interviews. By using clinical data, we also aim to better generalize to clinical settings compared to studies based on public social media data.</p><p><strong>Conclusion: </strong>Our results demonstrate that fine-grained therapy outcome coding in psychotherapeutic text is feasible, and can be used to support the quantification of important outcomes for downstream uses.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"79-89"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758459/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background and significance: Ambient digital scribe (ADS) platforms, which combine ambient speech recognition and large language models to generate clinical documentation, are currently undergoing rapid clinical adoption. Early data suggest that ADS utilization may reduce documentation burden and improve provider efficiency; however, the ethical implications of this largely unregulated technology remain relatively unexamined.
Findings: In this article, we identify and explore 4 key ethical issues surrounding ADS technology-safety, bias, data ownership, and justice-from a range of stakeholder perspectives. We provide an overview of current international regulatory policies, highlighting the need for standardized evaluation and reporting guidelines.
Recommendations: Drawing on established ethical frameworks, we propose actionable recommendations for safe and equitable ADS implementation, including standardized evaluation metrics, regulatory oversight, and safeguards at institutional and end-user levels.
Conclusion: Ensuring the ethical implementation of ADS technology is essential for actualizing its potential benefits while upholding foundational principles of safety, equity, and transparency in clinical practice.
{"title":"Ethical considerations for clinical adoption of ambient digital scribe technology.","authors":"Taylor N Anderson, Vishnu Mohan, Jeffrey A Gold","doi":"10.1093/jamia/ocaf227","DOIUrl":"https://doi.org/10.1093/jamia/ocaf227","url":null,"abstract":"<p><strong>Background and significance: </strong>Ambient digital scribe (ADS) platforms, which combine ambient speech recognition and large language models to generate clinical documentation, are currently undergoing rapid clinical adoption. Early data suggest that ADS utilization may reduce documentation burden and improve provider efficiency; however, the ethical implications of this largely unregulated technology remain relatively unexamined.</p><p><strong>Findings: </strong>In this article, we identify and explore 4 key ethical issues surrounding ADS technology-safety, bias, data ownership, and justice-from a range of stakeholder perspectives. We provide an overview of current international regulatory policies, highlighting the need for standardized evaluation and reporting guidelines.</p><p><strong>Recommendations: </strong>Drawing on established ethical frameworks, we propose actionable recommendations for safe and equitable ADS implementation, including standardized evaluation metrics, regulatory oversight, and safeguards at institutional and end-user levels.</p><p><strong>Conclusion: </strong>Ensuring the ethical implementation of ADS technology is essential for actualizing its potential benefits while upholding foundational principles of safety, equity, and transparency in clinical practice.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145844472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin D Wissel, Zana Percy, Tanner J Zachem, Brett Beaulieu-Jones, Isaac S Kohane, Stuart L Goldstein, Emrah Gecili, Judith W Dexheimer
Objective: To understand the heterogeneous treatment effects of electronic alerts for acute kidney injury (AKI).
Materials and methods: Secondary analysis of individual patient data from 3 randomized controlled trials. Our outcome measure was 14-day all-cause mortality. Data from the ELAIA-1 trial were used to predict the individualized effect of alerts on mortality based on patients' phenotype. Results were internally validated on a holdout dataset and externally validated using data from 2 additional trials: UPenn and ELAIA-2. We used machine learning-based methods and performed a meta-analysis on individual patient data to identify patient subgroups whose risk of mortality was associated with alerts. In addition, provider actions following alerts were examined to explain how alerts impacted patient mortality.
Results: Compared to patients who were predicted to be harmed by an alert, patients predicted to benefit had a lower risk of death in both the internal validation cohort (n = 1809 patients; Pinteraction = .045) and both external validation cohorts (n = 7453 patients; Pinteraction < .0001). In external cohorts, 43 deaths may have been preventable if alerts were restricted to likely beneficiaries. Machine-learning based meta-analysis identified reduced mortality with alerts among patients with higher blood pressures (BP) and lower predicted risk, but increased mortality in non-urban and non-teaching hospitals. Provider responses to alerts differed across subgroups.
Discussion: Our findings indicate substantial heterogeneity in the effects of AKI alerts on patient mortality. Tailoring alert delivery based on predicted benefit may mitigate harm and enhance clinical outcomes.
Conclusion: Individualizing automated alerts may reduce all-cause mortality. A prospective trial of individualized alerts is needed to confirm these results.
Trial registration: https://clinicaltrials.gov/ct2/show/NCT02753751 and https://clinicaltrials.gov/ct2/show/NCT02771977.
{"title":"Heterogenous effect of automated alerts on mortality.","authors":"Benjamin D Wissel, Zana Percy, Tanner J Zachem, Brett Beaulieu-Jones, Isaac S Kohane, Stuart L Goldstein, Emrah Gecili, Judith W Dexheimer","doi":"10.1093/jamia/ocaf222","DOIUrl":"10.1093/jamia/ocaf222","url":null,"abstract":"<p><strong>Objective: </strong>To understand the heterogeneous treatment effects of electronic alerts for acute kidney injury (AKI).</p><p><strong>Materials and methods: </strong>Secondary analysis of individual patient data from 3 randomized controlled trials. Our outcome measure was 14-day all-cause mortality. Data from the ELAIA-1 trial were used to predict the individualized effect of alerts on mortality based on patients' phenotype. Results were internally validated on a holdout dataset and externally validated using data from 2 additional trials: UPenn and ELAIA-2. We used machine learning-based methods and performed a meta-analysis on individual patient data to identify patient subgroups whose risk of mortality was associated with alerts. In addition, provider actions following alerts were examined to explain how alerts impacted patient mortality.</p><p><strong>Results: </strong>Compared to patients who were predicted to be harmed by an alert, patients predicted to benefit had a lower risk of death in both the internal validation cohort (n = 1809 patients; Pinteraction = .045) and both external validation cohorts (n = 7453 patients; Pinteraction < .0001). In external cohorts, 43 deaths may have been preventable if alerts were restricted to likely beneficiaries. Machine-learning based meta-analysis identified reduced mortality with alerts among patients with higher blood pressures (BP) and lower predicted risk, but increased mortality in non-urban and non-teaching hospitals. Provider responses to alerts differed across subgroups.</p><p><strong>Discussion: </strong>Our findings indicate substantial heterogeneity in the effects of AKI alerts on patient mortality. Tailoring alert delivery based on predicted benefit may mitigate harm and enhance clinical outcomes.</p><p><strong>Conclusion: </strong>Individualizing automated alerts may reduce all-cause mortality. A prospective trial of individualized alerts is needed to confirm these results.</p><p><strong>Trial registration: </strong>https://clinicaltrials.gov/ct2/show/NCT02753751 and https://clinicaltrials.gov/ct2/show/NCT02771977.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145829031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah Conderino, H Lester Kirchner, Lorna E Thorpe, Jasmin Divers, Annemarie G Hirsch, Cara M Nordberg, Brian S Schwartz, Lu Zhang, Bo Cai, Caroline Rudisill, Jihad S Obeid, Angela Liese, Katie S Allen, Brian E Dixon, Tessa Crume, Dana Dabelea, Shawna Burgett, Anna Bellatorre, Hui Shao, Jiang Bian, Yi Guo, Sarah Bost, Tianchen Lyu, Kristi Reynolds, Matthew T Mefford, Hui Zhou, Matt Zhou, Eva Lustigova, Levon H Utidjian, Mitchell Maltenfort, Manmohan Kamboj, Eneida A Mendonca, Patrick Hanley, Ibrahim Zaganjor, Meda E Pavkov, Marc Rosenman, Andrea R Titus
Objective: We discuss implications of potential ascertainment biases for studies examining diabetes risk following SARS-CoV-2 infection using electronic health records (EHRs). We quantitatively explore sensitivity of results to misclassification of COVID-19 status using data from the U.S.-based Diabetes in Children, Adolescents and Young Adults (DiCAYA) Network on children (≤17 years) and young adults (18-44 years).
Materials and methods: In our retrospective case study from the DiCAYA Network, SARS-CoV-2 was identified using labs and diagnoses from 6/1/2020-12/31/2021. Patients were followed through 12/31/2022 for new diabetes diagnoses. Sites examined incident diabetes by COVID-19 status using Cox proportional hazards models. Results were pooled in meta-analyses. A bias analysis examined potential impact of COVID-19 misclassification scenarios on results, guided by hypotheses that sensitivity would be < 50% and would be higher among those who developed diabetes.
Results: Prevalence of documented COVID-19 was low overall and variable across sites (children: 4.4%-7.7%, young adults: 6.2%-22.7%). Individuals with documented COVID-19 were at higher risk of incident diabetes compared to those with no documented infection, but results were heterogeneous across sites. Findings were highly sensitive to COVID-19 misclassification assumptions. Observed results could be biased away from the null under several differential misclassification scenarios.
Discussion: Although EHR-based documentation of COVID-19 was associated with incident diabetes, COVID-19 phenotypes likely had low sensitivity, with considerable variation across sites. Misclassification assumptions strongly impacted interpretation of results.
Conclusion: Given the potential for low phenotype sensitivity and misclassification, caution is warranted when interpreting analyses of COVID-19 and incident diabetes using clinical or administrative databases.
{"title":"Multi-site analysis of COVID-19 and new-onset diabetes reveals need for improved sensitivity of EHR-based COVID-19 phenotypes-a DiCAYA network analysis.","authors":"Sarah Conderino, H Lester Kirchner, Lorna E Thorpe, Jasmin Divers, Annemarie G Hirsch, Cara M Nordberg, Brian S Schwartz, Lu Zhang, Bo Cai, Caroline Rudisill, Jihad S Obeid, Angela Liese, Katie S Allen, Brian E Dixon, Tessa Crume, Dana Dabelea, Shawna Burgett, Anna Bellatorre, Hui Shao, Jiang Bian, Yi Guo, Sarah Bost, Tianchen Lyu, Kristi Reynolds, Matthew T Mefford, Hui Zhou, Matt Zhou, Eva Lustigova, Levon H Utidjian, Mitchell Maltenfort, Manmohan Kamboj, Eneida A Mendonca, Patrick Hanley, Ibrahim Zaganjor, Meda E Pavkov, Marc Rosenman, Andrea R Titus","doi":"10.1093/jamia/ocaf229","DOIUrl":"10.1093/jamia/ocaf229","url":null,"abstract":"<p><strong>Objective: </strong>We discuss implications of potential ascertainment biases for studies examining diabetes risk following SARS-CoV-2 infection using electronic health records (EHRs). We quantitatively explore sensitivity of results to misclassification of COVID-19 status using data from the U.S.-based Diabetes in Children, Adolescents and Young Adults (DiCAYA) Network on children (≤17 years) and young adults (18-44 years).</p><p><strong>Materials and methods: </strong>In our retrospective case study from the DiCAYA Network, SARS-CoV-2 was identified using labs and diagnoses from 6/1/2020-12/31/2021. Patients were followed through 12/31/2022 for new diabetes diagnoses. Sites examined incident diabetes by COVID-19 status using Cox proportional hazards models. Results were pooled in meta-analyses. A bias analysis examined potential impact of COVID-19 misclassification scenarios on results, guided by hypotheses that sensitivity would be < 50% and would be higher among those who developed diabetes.</p><p><strong>Results: </strong>Prevalence of documented COVID-19 was low overall and variable across sites (children: 4.4%-7.7%, young adults: 6.2%-22.7%). Individuals with documented COVID-19 were at higher risk of incident diabetes compared to those with no documented infection, but results were heterogeneous across sites. Findings were highly sensitive to COVID-19 misclassification assumptions. Observed results could be biased away from the null under several differential misclassification scenarios.</p><p><strong>Discussion: </strong>Although EHR-based documentation of COVID-19 was associated with incident diabetes, COVID-19 phenotypes likely had low sensitivity, with considerable variation across sites. Misclassification assumptions strongly impacted interpretation of results.</p><p><strong>Conclusion: </strong>Given the potential for low phenotype sensitivity and misclassification, caution is warranted when interpreting analyses of COVID-19 and incident diabetes using clinical or administrative databases.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12884381/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145829065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Woo Yeon Park, Teri Sippel Schmidt, Gabriel Salvador, Kevin O'Donnell, Brad Genereaux, Kyulee Jeon, Seng Chan You, Blake E Dewey, Paul Nagy
{"title":"Response to \"toward semantic interoperability of imaging and clinical data: reflections on the DICOM-OMOP integration framework\".","authors":"Woo Yeon Park, Teri Sippel Schmidt, Gabriel Salvador, Kevin O'Donnell, Brad Genereaux, Kyulee Jeon, Seng Chan You, Blake E Dewey, Paul Nagy","doi":"10.1093/jamia/ocaf216","DOIUrl":"https://doi.org/10.1093/jamia/ocaf216","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145844527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Chen, Patrick Li, Ealia Khoshkish, Seungmin Lee, Tony Ning, Umair Tahir, Henry C Y Wong, Michael S F Lee, Srinivas Raman
Objectives: To develop AutoReporter, a large language model (LLM) system that automates evaluation of adherence to research reporting guidelines.
Materials and methods: Eight prompt-engineering and retrieval strategies coupled with reasoning and general-purpose LLMs were benchmarked on the SPIRIT-CONSORT-TM corpus. The top-performing approach, AutoReporter, was validated on BenchReport, a novel benchmark dataset of expert-rated reporting guideline assessments from 10 systematic reviews.
Results: AutoReporter, a zero-shot, no-retrieval prompt coupled with the o3-mini reasoning LLM, demonstrated strong accuracy (CONSORT 90.09%; SPIRIT: 92.07%), substantial agreement with humans (CONSORT Cohen's κ = 0.70, SPIRIT Cohen's κ = 0.77), runtime (CONSORT: 617.26 s; SPIRIT: 544.51 s), and cost (CONSORT: 0.68 USD; SPIRIT: 0.65 USD). AutoReporter achieved a mean accuracy of 91.8% and substantial agreement (Cohen's κ > 0.6) with expert ratings from the BenchReport benchmark.
Discussion: Structured prompting alone can match or exceed fine-tuned domain models while forgoing manually annotated corpora and computationally intensive training.
Conclusion: Large language models can feasibly automate reporting guideline adherence assessments for scalable quality control in scientific research reporting. AutoReporter is publicly accessible at https://autoreporter.streamlit.app.
{"title":"AutoReporter: development of an artificial intelligence tool for automated assessment of research reporting guideline adherence.","authors":"David Chen, Patrick Li, Ealia Khoshkish, Seungmin Lee, Tony Ning, Umair Tahir, Henry C Y Wong, Michael S F Lee, Srinivas Raman","doi":"10.1093/jamia/ocaf223","DOIUrl":"https://doi.org/10.1093/jamia/ocaf223","url":null,"abstract":"<p><strong>Objectives: </strong>To develop AutoReporter, a large language model (LLM) system that automates evaluation of adherence to research reporting guidelines.</p><p><strong>Materials and methods: </strong>Eight prompt-engineering and retrieval strategies coupled with reasoning and general-purpose LLMs were benchmarked on the SPIRIT-CONSORT-TM corpus. The top-performing approach, AutoReporter, was validated on BenchReport, a novel benchmark dataset of expert-rated reporting guideline assessments from 10 systematic reviews.</p><p><strong>Results: </strong>AutoReporter, a zero-shot, no-retrieval prompt coupled with the o3-mini reasoning LLM, demonstrated strong accuracy (CONSORT 90.09%; SPIRIT: 92.07%), substantial agreement with humans (CONSORT Cohen's κ = 0.70, SPIRIT Cohen's κ = 0.77), runtime (CONSORT: 617.26 s; SPIRIT: 544.51 s), and cost (CONSORT: 0.68 USD; SPIRIT: 0.65 USD). AutoReporter achieved a mean accuracy of 91.8% and substantial agreement (Cohen's κ > 0.6) with expert ratings from the BenchReport benchmark.</p><p><strong>Discussion: </strong>Structured prompting alone can match or exceed fine-tuned domain models while forgoing manually annotated corpora and computationally intensive training.</p><p><strong>Conclusion: </strong>Large language models can feasibly automate reporting guideline adherence assessments for scalable quality control in scientific research reporting. AutoReporter is publicly accessible at https://autoreporter.streamlit.app.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michelle Gomez, Ellen W Clayton, Colin G Walsh, Kim M Unertl
Objectives: Trafficked persons experience adverse health consequences and seek help, but many go unrecognized by health-care professionals. This study explored professionals' perspectives on current approaches toward identifying and supporting trafficked persons in health-care settings, highlighting current technology roles, gaps, and future directions.
Materials and methods: We developed an interview guide to investigate current human trafficking (HT) approaches, safety procedures, and HT education. Semistructured interviews were conducted via Zoom, iteratively coded in Dedoose, and analyzed using a thematic analysis approach.
Results: We interviewed 19 health-care and community group professionals and identified 3 themes: (1) participants described a responsibility to build trust with patients through compassionate communication, rapport, and trauma-informed approaches across different stages of care. (2) Technology played a dual role, as professionals navigated both benefits and challenges of tools such as Zoom, virtual interpreters, and cameras in trust building. (3) Safety and privacy concerns guided how participants documented patient encounters and shared community resources, ensuring confidentiality while supporting patient and community well-being.
Discussion: Technology can both support and hinder trust in health care, directly affecting trafficked patients and their safety. Informatics can improve care for trafficked persons, but further research is needed on technology-based interventions. We provide recommendations to strengthen trust, enhance safety, support trauma-informed care, and promote safe documentation practices.
Conclusion: Effective sociotechnical approaches rely on trust, safety, and mindful documentation to support trafficked patients. Future research directions include refining the role of informatics in trauma-informed care to strengthen trust and mitigate unintended consequences.
{"title":"Identifying and supporting trafficked individuals: provider and community organization perspectives on existing sociotechnical approaches.","authors":"Michelle Gomez, Ellen W Clayton, Colin G Walsh, Kim M Unertl","doi":"10.1093/jamia/ocaf220","DOIUrl":"https://doi.org/10.1093/jamia/ocaf220","url":null,"abstract":"<p><strong>Objectives: </strong>Trafficked persons experience adverse health consequences and seek help, but many go unrecognized by health-care professionals. This study explored professionals' perspectives on current approaches toward identifying and supporting trafficked persons in health-care settings, highlighting current technology roles, gaps, and future directions.</p><p><strong>Materials and methods: </strong>We developed an interview guide to investigate current human trafficking (HT) approaches, safety procedures, and HT education. Semistructured interviews were conducted via Zoom, iteratively coded in Dedoose, and analyzed using a thematic analysis approach.</p><p><strong>Results: </strong>We interviewed 19 health-care and community group professionals and identified 3 themes: (1) participants described a responsibility to build trust with patients through compassionate communication, rapport, and trauma-informed approaches across different stages of care. (2) Technology played a dual role, as professionals navigated both benefits and challenges of tools such as Zoom, virtual interpreters, and cameras in trust building. (3) Safety and privacy concerns guided how participants documented patient encounters and shared community resources, ensuring confidentiality while supporting patient and community well-being.</p><p><strong>Discussion: </strong>Technology can both support and hinder trust in health care, directly affecting trafficked patients and their safety. Informatics can improve care for trafficked persons, but further research is needed on technology-based interventions. We provide recommendations to strengthen trust, enhance safety, support trauma-informed care, and promote safe documentation practices.</p><p><strong>Conclusion: </strong>Effective sociotechnical approaches rely on trust, safety, and mindful documentation to support trafficked patients. Future research directions include refining the role of informatics in trauma-informed care to strengthen trust and mitigate unintended consequences.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}