Sarah Conderino, H Lester Kirchner, Lorna E Thorpe, Jasmin Divers, Annemarie G Hirsch, Cara M Nordberg, Brian S Schwartz, Lu Zhang, Bo Cai, Caroline Rudisill, Jihad S Obeid, Angela Liese, Katie S Allen, Brian E Dixon, Tessa Crume, Dana Dabelea, Shawna Burgett, Anna Bellatorre, Hui Shao, Jiang Bian, Yi Guo, Sarah Bost, Tianchen Lyu, Kristi Reynolds, Matthew T Mefford, Hui Zhou, Matt Zhou, Eva Lustigova, Levon H Utidjian, Mitchell Maltenfort, Manmohan Kamboj, Eneida A Mendonca, Patrick Hanley, Ibrahim Zaganjor, Meda E Pavkov, Marc Rosenman, Andrea R Titus
Objective: We discuss implications of potential ascertainment biases for studies examining diabetes risk following SARS-CoV-2 infection using electronic health records (EHRs). We quantitatively explore sensitivity of results to misclassification of COVID-19 status using data from the U.S.-based Diabetes in Children, Adolescents and Young Adults (DiCAYA) Network on children (≤17 years) and young adults (18-44 years).
Materials and methods: In our retrospective case study from the DiCAYA Network, SARS-CoV-2 was identified using labs and diagnoses from 6/1/2020-12/31/2021. Patients were followed through 12/31/2022 for new diabetes diagnoses. Sites examined incident diabetes by COVID-19 status using Cox proportional hazards models. Results were pooled in meta-analyses. A bias analysis examined potential impact of COVID-19 misclassification scenarios on results, guided by hypotheses that sensitivity would be < 50% and would be higher among those who developed diabetes.
Results: Prevalence of documented COVID-19 was low overall and variable across sites (children: 4.4%-7.7%, young adults: 6.2%-22.7%). Individuals with documented COVID-19 were at higher risk of incident diabetes compared to those with no documented infection, but results were heterogeneous across sites. Findings were highly sensitive to COVID-19 misclassification assumptions. Observed results could be biased away from the null under several differential misclassification scenarios.
Discussion: Although EHR-based documentation of COVID-19 was associated with incident diabetes, COVID-19 phenotypes likely had low sensitivity, with considerable variation across sites. Misclassification assumptions strongly impacted interpretation of results.
Conclusion: Given the potential for low phenotype sensitivity and misclassification, caution is warranted when interpreting analyses of COVID-19 and incident diabetes using clinical or administrative databases.
{"title":"Multi-site analysis of COVID-19 and new-onset diabetes reveals need for improved sensitivity of EHR-based COVID-19 phenotypes-a DiCAYA network analysis.","authors":"Sarah Conderino, H Lester Kirchner, Lorna E Thorpe, Jasmin Divers, Annemarie G Hirsch, Cara M Nordberg, Brian S Schwartz, Lu Zhang, Bo Cai, Caroline Rudisill, Jihad S Obeid, Angela Liese, Katie S Allen, Brian E Dixon, Tessa Crume, Dana Dabelea, Shawna Burgett, Anna Bellatorre, Hui Shao, Jiang Bian, Yi Guo, Sarah Bost, Tianchen Lyu, Kristi Reynolds, Matthew T Mefford, Hui Zhou, Matt Zhou, Eva Lustigova, Levon H Utidjian, Mitchell Maltenfort, Manmohan Kamboj, Eneida A Mendonca, Patrick Hanley, Ibrahim Zaganjor, Meda E Pavkov, Marc Rosenman, Andrea R Titus","doi":"10.1093/jamia/ocaf229","DOIUrl":"10.1093/jamia/ocaf229","url":null,"abstract":"<p><strong>Objective: </strong>We discuss implications of potential ascertainment biases for studies examining diabetes risk following SARS-CoV-2 infection using electronic health records (EHRs). We quantitatively explore sensitivity of results to misclassification of COVID-19 status using data from the U.S.-based Diabetes in Children, Adolescents and Young Adults (DiCAYA) Network on children (≤17 years) and young adults (18-44 years).</p><p><strong>Materials and methods: </strong>In our retrospective case study from the DiCAYA Network, SARS-CoV-2 was identified using labs and diagnoses from 6/1/2020-12/31/2021. Patients were followed through 12/31/2022 for new diabetes diagnoses. Sites examined incident diabetes by COVID-19 status using Cox proportional hazards models. Results were pooled in meta-analyses. A bias analysis examined potential impact of COVID-19 misclassification scenarios on results, guided by hypotheses that sensitivity would be < 50% and would be higher among those who developed diabetes.</p><p><strong>Results: </strong>Prevalence of documented COVID-19 was low overall and variable across sites (children: 4.4%-7.7%, young adults: 6.2%-22.7%). Individuals with documented COVID-19 were at higher risk of incident diabetes compared to those with no documented infection, but results were heterogeneous across sites. Findings were highly sensitive to COVID-19 misclassification assumptions. Observed results could be biased away from the null under several differential misclassification scenarios.</p><p><strong>Discussion: </strong>Although EHR-based documentation of COVID-19 was associated with incident diabetes, COVID-19 phenotypes likely had low sensitivity, with considerable variation across sites. Misclassification assumptions strongly impacted interpretation of results.</p><p><strong>Conclusion: </strong>Given the potential for low phenotype sensitivity and misclassification, caution is warranted when interpreting analyses of COVID-19 and incident diabetes using clinical or administrative databases.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12884381/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145829065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Woo Yeon Park, Teri Sippel Schmidt, Gabriel Salvador, Kevin O'Donnell, Brad Genereaux, Kyulee Jeon, Seng Chan You, Blake E Dewey, Paul Nagy
{"title":"Response to \"toward semantic interoperability of imaging and clinical data: reflections on the DICOM-OMOP integration framework\".","authors":"Woo Yeon Park, Teri Sippel Schmidt, Gabriel Salvador, Kevin O'Donnell, Brad Genereaux, Kyulee Jeon, Seng Chan You, Blake E Dewey, Paul Nagy","doi":"10.1093/jamia/ocaf216","DOIUrl":"https://doi.org/10.1093/jamia/ocaf216","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145844527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Chen, Patrick Li, Ealia Khoshkish, Seungmin Lee, Tony Ning, Umair Tahir, Henry C Y Wong, Michael S F Lee, Srinivas Raman
Objectives: To develop AutoReporter, a large language model (LLM) system that automates evaluation of adherence to research reporting guidelines.
Materials and methods: Eight prompt-engineering and retrieval strategies coupled with reasoning and general-purpose LLMs were benchmarked on the SPIRIT-CONSORT-TM corpus. The top-performing approach, AutoReporter, was validated on BenchReport, a novel benchmark dataset of expert-rated reporting guideline assessments from 10 systematic reviews.
Results: AutoReporter, a zero-shot, no-retrieval prompt coupled with the o3-mini reasoning LLM, demonstrated strong accuracy (CONSORT 90.09%; SPIRIT: 92.07%), substantial agreement with humans (CONSORT Cohen's κ = 0.70, SPIRIT Cohen's κ = 0.77), runtime (CONSORT: 617.26 s; SPIRIT: 544.51 s), and cost (CONSORT: 0.68 USD; SPIRIT: 0.65 USD). AutoReporter achieved a mean accuracy of 91.8% and substantial agreement (Cohen's κ > 0.6) with expert ratings from the BenchReport benchmark.
Discussion: Structured prompting alone can match or exceed fine-tuned domain models while forgoing manually annotated corpora and computationally intensive training.
Conclusion: Large language models can feasibly automate reporting guideline adherence assessments for scalable quality control in scientific research reporting. AutoReporter is publicly accessible at https://autoreporter.streamlit.app.
{"title":"AutoReporter: development of an artificial intelligence tool for automated assessment of research reporting guideline adherence.","authors":"David Chen, Patrick Li, Ealia Khoshkish, Seungmin Lee, Tony Ning, Umair Tahir, Henry C Y Wong, Michael S F Lee, Srinivas Raman","doi":"10.1093/jamia/ocaf223","DOIUrl":"https://doi.org/10.1093/jamia/ocaf223","url":null,"abstract":"<p><strong>Objectives: </strong>To develop AutoReporter, a large language model (LLM) system that automates evaluation of adherence to research reporting guidelines.</p><p><strong>Materials and methods: </strong>Eight prompt-engineering and retrieval strategies coupled with reasoning and general-purpose LLMs were benchmarked on the SPIRIT-CONSORT-TM corpus. The top-performing approach, AutoReporter, was validated on BenchReport, a novel benchmark dataset of expert-rated reporting guideline assessments from 10 systematic reviews.</p><p><strong>Results: </strong>AutoReporter, a zero-shot, no-retrieval prompt coupled with the o3-mini reasoning LLM, demonstrated strong accuracy (CONSORT 90.09%; SPIRIT: 92.07%), substantial agreement with humans (CONSORT Cohen's κ = 0.70, SPIRIT Cohen's κ = 0.77), runtime (CONSORT: 617.26 s; SPIRIT: 544.51 s), and cost (CONSORT: 0.68 USD; SPIRIT: 0.65 USD). AutoReporter achieved a mean accuracy of 91.8% and substantial agreement (Cohen's κ > 0.6) with expert ratings from the BenchReport benchmark.</p><p><strong>Discussion: </strong>Structured prompting alone can match or exceed fine-tuned domain models while forgoing manually annotated corpora and computationally intensive training.</p><p><strong>Conclusion: </strong>Large language models can feasibly automate reporting guideline adherence assessments for scalable quality control in scientific research reporting. AutoReporter is publicly accessible at https://autoreporter.streamlit.app.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michelle Gomez, Ellen W Clayton, Colin G Walsh, Kim M Unertl
Objectives: Trafficked persons experience adverse health consequences and seek help, but many go unrecognized by health-care professionals. This study explored professionals' perspectives on current approaches toward identifying and supporting trafficked persons in health-care settings, highlighting current technology roles, gaps, and future directions.
Materials and methods: We developed an interview guide to investigate current human trafficking (HT) approaches, safety procedures, and HT education. Semistructured interviews were conducted via Zoom, iteratively coded in Dedoose, and analyzed using a thematic analysis approach.
Results: We interviewed 19 health-care and community group professionals and identified 3 themes: (1) participants described a responsibility to build trust with patients through compassionate communication, rapport, and trauma-informed approaches across different stages of care. (2) Technology played a dual role, as professionals navigated both benefits and challenges of tools such as Zoom, virtual interpreters, and cameras in trust building. (3) Safety and privacy concerns guided how participants documented patient encounters and shared community resources, ensuring confidentiality while supporting patient and community well-being.
Discussion: Technology can both support and hinder trust in health care, directly affecting trafficked patients and their safety. Informatics can improve care for trafficked persons, but further research is needed on technology-based interventions. We provide recommendations to strengthen trust, enhance safety, support trauma-informed care, and promote safe documentation practices.
Conclusion: Effective sociotechnical approaches rely on trust, safety, and mindful documentation to support trafficked patients. Future research directions include refining the role of informatics in trauma-informed care to strengthen trust and mitigate unintended consequences.
{"title":"Identifying and supporting trafficked individuals: provider and community organization perspectives on existing sociotechnical approaches.","authors":"Michelle Gomez, Ellen W Clayton, Colin G Walsh, Kim M Unertl","doi":"10.1093/jamia/ocaf220","DOIUrl":"https://doi.org/10.1093/jamia/ocaf220","url":null,"abstract":"<p><strong>Objectives: </strong>Trafficked persons experience adverse health consequences and seek help, but many go unrecognized by health-care professionals. This study explored professionals' perspectives on current approaches toward identifying and supporting trafficked persons in health-care settings, highlighting current technology roles, gaps, and future directions.</p><p><strong>Materials and methods: </strong>We developed an interview guide to investigate current human trafficking (HT) approaches, safety procedures, and HT education. Semistructured interviews were conducted via Zoom, iteratively coded in Dedoose, and analyzed using a thematic analysis approach.</p><p><strong>Results: </strong>We interviewed 19 health-care and community group professionals and identified 3 themes: (1) participants described a responsibility to build trust with patients through compassionate communication, rapport, and trauma-informed approaches across different stages of care. (2) Technology played a dual role, as professionals navigated both benefits and challenges of tools such as Zoom, virtual interpreters, and cameras in trust building. (3) Safety and privacy concerns guided how participants documented patient encounters and shared community resources, ensuring confidentiality while supporting patient and community well-being.</p><p><strong>Discussion: </strong>Technology can both support and hinder trust in health care, directly affecting trafficked patients and their safety. Informatics can improve care for trafficked persons, but further research is needed on technology-based interventions. We provide recommendations to strengthen trust, enhance safety, support trauma-informed care, and promote safe documentation practices.</p><p><strong>Conclusion: </strong>Effective sociotechnical approaches rely on trust, safety, and mindful documentation to support trafficked patients. Future research directions include refining the role of informatics in trauma-informed care to strengthen trust and mitigate unintended consequences.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sanjay Basu, Sadiq Y Patel, Parth Sheth, Bhairavi Muralidharan, Namrata Elamaran, Aakriti Kinra, Rajaie Batniji
Objective: Population health management programs coordinate care for over 80 million Medicaid beneficiaries but lack systematic clinical decision support for determining when to intervene and which interventions to select for patients with complex conditions. Our objective was to develop and validate a clinical decision support system integrating acuity prediction and intervention selection models for population health management programs.
Materials and methods: We conducted a retrospective cohort study of 155 631 Medicaid patients enrolled in population health programs across Washington, Virginia, and Ohio (January 2023-July 2025). We developed integrated informatics workflows combining time-to-event prediction models for acute care events with heterogeneous treatment effect estimators for intervention selection. Models used structured electronic health record data, claims, and care management records. Performance was evaluated through clinical validation with 3 blinded physicians reviewing 200 cases.
Results: The integrated decision support system achieved 81.3% sensitivity (95% CI, 79.8%-82.8%) and 82.1% specificity (95% CI, 80.6%-83.6%) for 30-day acute care prediction. The intervention selection component demonstrated 1.59 percentage points absolute risk reduction compared with standard care (95% CI, 0.21-3.04), translating to preventing one acute event for every 63 patients receiving model-guided rather than standard care. Clinical validation revealed systematic differences: physicians relied on recent utilization patterns (explaining 75.8% of decision variance) while models integrated broader clinical signals, identifying intervention opportunities earlier in disease trajectories. Both approaches recommended similar intervention types, suggesting complementary rather than replacement roles.
Discussion: An integrated clinical decision support system can enhance population health management by providing actionable guidance on intervention timing and selection.
Conclusion: An integrated decision support system's ability to identify opportunities before high utilization manifests offers potential for shifting from reactive to preventive care delivery for vulnerable populations.
{"title":"Clinical decision support for population health management: development and validation of integrated acuity and intervention prediction models.","authors":"Sanjay Basu, Sadiq Y Patel, Parth Sheth, Bhairavi Muralidharan, Namrata Elamaran, Aakriti Kinra, Rajaie Batniji","doi":"10.1093/jamia/ocaf225","DOIUrl":"https://doi.org/10.1093/jamia/ocaf225","url":null,"abstract":"<p><strong>Objective: </strong>Population health management programs coordinate care for over 80 million Medicaid beneficiaries but lack systematic clinical decision support for determining when to intervene and which interventions to select for patients with complex conditions. Our objective was to develop and validate a clinical decision support system integrating acuity prediction and intervention selection models for population health management programs.</p><p><strong>Materials and methods: </strong>We conducted a retrospective cohort study of 155 631 Medicaid patients enrolled in population health programs across Washington, Virginia, and Ohio (January 2023-July 2025). We developed integrated informatics workflows combining time-to-event prediction models for acute care events with heterogeneous treatment effect estimators for intervention selection. Models used structured electronic health record data, claims, and care management records. Performance was evaluated through clinical validation with 3 blinded physicians reviewing 200 cases.</p><p><strong>Results: </strong>The integrated decision support system achieved 81.3% sensitivity (95% CI, 79.8%-82.8%) and 82.1% specificity (95% CI, 80.6%-83.6%) for 30-day acute care prediction. The intervention selection component demonstrated 1.59 percentage points absolute risk reduction compared with standard care (95% CI, 0.21-3.04), translating to preventing one acute event for every 63 patients receiving model-guided rather than standard care. Clinical validation revealed systematic differences: physicians relied on recent utilization patterns (explaining 75.8% of decision variance) while models integrated broader clinical signals, identifying intervention opportunities earlier in disease trajectories. Both approaches recommended similar intervention types, suggesting complementary rather than replacement roles.</p><p><strong>Discussion: </strong>An integrated clinical decision support system can enhance population health management by providing actionable guidance on intervention timing and selection.</p><p><strong>Conclusion: </strong>An integrated decision support system's ability to identify opportunities before high utilization manifests offers potential for shifting from reactive to preventive care delivery for vulnerable populations.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145851489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuang Wang, Yang Zhang, Ying Gao, Xin He, Guanghui Deng, Jian Du
Objectives: To develop and evaluate a knowledge graph-augmented large language model (LLM) framework that synthesizes epidemiological evidence to infer life-course exposure-outcome pathways, using gestational diabetes mellitus (GDM) and dementia as a case study.
Materials and methods: We constructed a causal knowledge graph by extracting empirical epidemiological associations from scientific literature, excluding hypothetical assertions. The graph was integrated with GPT-4 through four graph retrieval-augmented generation (GRAG) strategies to infer bridging variables between early-life exposure (GDM) and later-life outcome (dementia). Semantic triples served as structured inputs to support LLM reasoning. Each GRAG strategy was evaluated by human clinical experts and three LLM-based reviewers (GPT-4o, Llama 3-70B, and Gemini Advanced), assessing scientific reliability, novelty, and clinical relevance.
Results: The GRAG strategy using a minimal set of abstracts specifically related to GDM-dementia bridging variables performed comparably to the strategy using broader sub-community abstracts, and both significantly outperformed approaches using the full GDM- or dementia-related corpus or baseline GPT-4 without external augmentation. The knowledge graph-augmented LLM identified 108 maternal candidate mediators, including validated risk factors such as chronic kidney disease and physical inactivity. The structured approach improved accuracy and reduced confabulation compared to standard LLM outputs.
Discussion: Our findings suggest that augmenting LLMs with epidemiological knowledge graphs enables effective reasoning over fragmented literature and supports the reconstruction of progressive risk pathways. Expert assessments revealed that LLMs may overestimate clinical relevance, highlighting the need for human-AI collaboration in interpretation and application.
Conclusion: Integrating semantic epidemiological knowledge with LLMs via GRAG strategies provides a promising framework for life-course epidemiology, enabling early detection of modifiable risk factors and guiding variable selection in cohort study design.
{"title":"Knowledge graph-augmented large language models for reconstructing life course risk pathways: a gestational diabetes mellitus-to-dementia case study.","authors":"Shuang Wang, Yang Zhang, Ying Gao, Xin He, Guanghui Deng, Jian Du","doi":"10.1093/jamia/ocaf219","DOIUrl":"https://doi.org/10.1093/jamia/ocaf219","url":null,"abstract":"<p><strong>Objectives: </strong>To develop and evaluate a knowledge graph-augmented large language model (LLM) framework that synthesizes epidemiological evidence to infer life-course exposure-outcome pathways, using gestational diabetes mellitus (GDM) and dementia as a case study.</p><p><strong>Materials and methods: </strong>We constructed a causal knowledge graph by extracting empirical epidemiological associations from scientific literature, excluding hypothetical assertions. The graph was integrated with GPT-4 through four graph retrieval-augmented generation (GRAG) strategies to infer bridging variables between early-life exposure (GDM) and later-life outcome (dementia). Semantic triples served as structured inputs to support LLM reasoning. Each GRAG strategy was evaluated by human clinical experts and three LLM-based reviewers (GPT-4o, Llama 3-70B, and Gemini Advanced), assessing scientific reliability, novelty, and clinical relevance.</p><p><strong>Results: </strong>The GRAG strategy using a minimal set of abstracts specifically related to GDM-dementia bridging variables performed comparably to the strategy using broader sub-community abstracts, and both significantly outperformed approaches using the full GDM- or dementia-related corpus or baseline GPT-4 without external augmentation. The knowledge graph-augmented LLM identified 108 maternal candidate mediators, including validated risk factors such as chronic kidney disease and physical inactivity. The structured approach improved accuracy and reduced confabulation compared to standard LLM outputs.</p><p><strong>Discussion: </strong>Our findings suggest that augmenting LLMs with epidemiological knowledge graphs enables effective reasoning over fragmented literature and supports the reconstruction of progressive risk pathways. Expert assessments revealed that LLMs may overestimate clinical relevance, highlighting the need for human-AI collaboration in interpretation and application.</p><p><strong>Conclusion: </strong>Integrating semantic epidemiological knowledge with LLMs via GRAG strategies provides a promising framework for life-course epidemiology, enabling early detection of modifiable risk factors and guiding variable selection in cohort study design.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel Dubin, Gabrielle Mayer, Nishant Pradhan, Madeline Xin, Richard Greene
Objectives: Documentation of gender identity (GI) and anatomy data in the electronic health record (EHR) is a proposed standard of care for transgender populations. However, there is limited research on implementation of proposed best practices, particularly anatomy data collection. This study aims to characterize factors that influence patient preferences and comfort around the collection and documentation of GI and anatomy in EHRs.
Materials and methods: From November 2023 to January 2024, 17 one-on-one, semi-structured virtual interviews were conducted with transgender adults residing in the Metropolitan New York area. Transcriptions were analyzed using inductive thematic analysis.
Results: Themes clustered around comfort and preferences for data collection processes and outcomes. Factors that influenced preferences and comfort around anatomy data were distinct from those impacting GI documentation preferences and comfort. The tension between the categories of GI and sex assigned at birth impacted anatomy data documentation preferences. Clinical context emerged as a consistent factor that impacts both preferences and comfort of GI and anatomy data documentation.
Discussion and conclusion: GI data collection efforts in clinical settings must consider the implication of anatomy data collection when determining data collection best practice methodologies. Anticipated and experienced stigma remain significant hurdles to patient comfort and willingness to collect GI and anatomy data, and their impact on actual data collection should be further elucidated among diverse gender identities. Clinical data collection methods, tools, and education warrant ongoing research investment to further elucidate best practices.
{"title":"Patient perspectives on gender identity and anatomy data collection in electronic health records: a qualitative study.","authors":"Samuel Dubin, Gabrielle Mayer, Nishant Pradhan, Madeline Xin, Richard Greene","doi":"10.1093/jamia/ocaf205","DOIUrl":"https://doi.org/10.1093/jamia/ocaf205","url":null,"abstract":"<p><strong>Objectives: </strong>Documentation of gender identity (GI) and anatomy data in the electronic health record (EHR) is a proposed standard of care for transgender populations. However, there is limited research on implementation of proposed best practices, particularly anatomy data collection. This study aims to characterize factors that influence patient preferences and comfort around the collection and documentation of GI and anatomy in EHRs.</p><p><strong>Materials and methods: </strong>From November 2023 to January 2024, 17 one-on-one, semi-structured virtual interviews were conducted with transgender adults residing in the Metropolitan New York area. Transcriptions were analyzed using inductive thematic analysis.</p><p><strong>Results: </strong>Themes clustered around comfort and preferences for data collection processes and outcomes. Factors that influenced preferences and comfort around anatomy data were distinct from those impacting GI documentation preferences and comfort. The tension between the categories of GI and sex assigned at birth impacted anatomy data documentation preferences. Clinical context emerged as a consistent factor that impacts both preferences and comfort of GI and anatomy data documentation.</p><p><strong>Discussion and conclusion: </strong>GI data collection efforts in clinical settings must consider the implication of anatomy data collection when determining data collection best practice methodologies. Anticipated and experienced stigma remain significant hurdles to patient comfort and willingness to collect GI and anatomy data, and their impact on actual data collection should be further elucidated among diverse gender identities. Clinical data collection methods, tools, and education warrant ongoing research investment to further elucidate best practices.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145726643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward semantic interoperability of imaging and clinical data: reflections on the DICOM-OMOP integration framework.","authors":"Weihao Cheng, Zekai Yu","doi":"10.1093/jamia/ocaf215","DOIUrl":"https://doi.org/10.1093/jamia/ocaf215","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145844499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew R Weckstein, Shirley V Wang, Richard Wyss, Sebastian Schneeweiss
Objectives: Real-world evidence (RWE) increasingly informs clinical decisions, yet manual adjustment for confounding limits scalability. Data-adaptive (DA) algorithms for high-dimensional proxy adjustment show promise but have not been systematically compared to investigator-specified (IS) approaches across diverse treatment scenarios. We evaluated whether DA strategies perform comparably to manually curated IS models using claims-based emulations of 15 randomized trials from the RCT-DUPLICATE initiative.
Materials and methods: We identified new-user cohorts for 15 trial emulations in Optum's de-identified Clinformatics Data Mart Database (2004-2023). Treatment effects were estimated using 3 adjustment strategies: (1) IS models with manually tailored covariates; (2) full-DA strategies using empirical features from semiautomated pipelines; and (3) hybrid-DA models incorporating both empirical and investigator-defined covariates. Agreement with RCT benchmarks was assessed via binary metrics and difference-in-differences.
Results: Outcome-adaptive LASSO achieved better RWE-RCT agreement than IS adjustment in 73% of full-DA and 87% of hybrid-DA emulations. Other DA methods considering feature associations with both treatment and outcome performed similarly well, while models tuned solely for treatment prediction performed poorly. Performance of IS vs DA strategies differed across emulated trials.
Discussion: Top DA algorithms matched manual IS models on average, but impact varied by emulation. Case studies illustrate the continued importance of subject-matter knowledge, particularly for complex treatment strategies.
Conclusion: Data-adaptive algorithms show promise for scalable confounding adjustment in large-scale evidence systems and as augmentation tools for investigator-specified designs. Hybrid strategies combining algorithmic methods with investigator expertise offer the most reliable approach for individual causal questions.
{"title":"Scalable confounding adjustment in real-world evidence: benchmarking data-adaptive and investigator-specified strategies in a large-scale trial emulation study.","authors":"Andrew R Weckstein, Shirley V Wang, Richard Wyss, Sebastian Schneeweiss","doi":"10.1093/jamia/ocaf204","DOIUrl":"https://doi.org/10.1093/jamia/ocaf204","url":null,"abstract":"<p><strong>Objectives: </strong>Real-world evidence (RWE) increasingly informs clinical decisions, yet manual adjustment for confounding limits scalability. Data-adaptive (DA) algorithms for high-dimensional proxy adjustment show promise but have not been systematically compared to investigator-specified (IS) approaches across diverse treatment scenarios. We evaluated whether DA strategies perform comparably to manually curated IS models using claims-based emulations of 15 randomized trials from the RCT-DUPLICATE initiative.</p><p><strong>Materials and methods: </strong>We identified new-user cohorts for 15 trial emulations in Optum's de-identified Clinformatics Data Mart Database (2004-2023). Treatment effects were estimated using 3 adjustment strategies: (1) IS models with manually tailored covariates; (2) full-DA strategies using empirical features from semiautomated pipelines; and (3) hybrid-DA models incorporating both empirical and investigator-defined covariates. Agreement with RCT benchmarks was assessed via binary metrics and difference-in-differences.</p><p><strong>Results: </strong>Outcome-adaptive LASSO achieved better RWE-RCT agreement than IS adjustment in 73% of full-DA and 87% of hybrid-DA emulations. Other DA methods considering feature associations with both treatment and outcome performed similarly well, while models tuned solely for treatment prediction performed poorly. Performance of IS vs DA strategies differed across emulated trials.</p><p><strong>Discussion: </strong>Top DA algorithms matched manual IS models on average, but impact varied by emulation. Case studies illustrate the continued importance of subject-matter knowledge, particularly for complex treatment strategies.</p><p><strong>Conclusion: </strong>Data-adaptive algorithms show promise for scalable confounding adjustment in large-scale evidence systems and as augmentation tools for investigator-specified designs. Hybrid strategies combining algorithmic methods with investigator expertise offer the most reliable approach for individual causal questions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145670308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introduction: Automation of clinical orders in electronic health records (EHRs) has the potential to reduce clinician burden and enhance patient safety. However, determining which orders are appropriate for automation requires a structured framework to ensure clinical validity, transparency, and safety.
Objective: To develop and validate a framework of desiderata for assessing the appropriateness of automating clinical orders in EHRs and to demonstrate its operational value in a live health system dataset.
Materials and methods: The study comprised 4 phases to move from concept generation to real-world demonstration. First, we conducted focus group analyses using ground theory to identify themes and developed desiderata informed by these themes and existing literature. We validated the desiderata by surveying clinicians at a single institution, presenting 10 use cases to and assessing perceived appropriateness, cognitive support, and patient safety using a 4-point Likert scale. Survey results were compared to a priori appropriateness designations using t-tests. To evaluate operational impact, we analyzed one year of order-based alerts and orders (1.4 million firings alert and 44.1 million orders, respectively) using filtering rules and association rule mining to identify candidate orders for automation and their impact.
Results: We identified 8 desiderata for automated order appropriateness: logical consistency, data provenance, order transparency, context permanence, monitoring plans, trigger consistency, care team empowerment, and system accountability. Use cases deemed appropriate based on these criteria received significantly higher scores for appropriateness (3.13 ± 0.84 vs 2.30 ± 0.99), cognitive support (3.08 ± 0.82 vs 2.25 ± 0.94), and patient safety (3.08 ± 0.86 vs 2.21 ± 0.98) (all P < .001) compared to those considered inappropriate. Operational analysis revealed an alert firing 19 109 times annually, with a 96% signed order rate, where automation could save an estimated 26.5 provider hours per year. Additionally, an association rule with 16 628 occurrences (68.4% confidence) suggested automation could save 15.8 hours annually and yield 8000 additional appropriate orders.
Discussion: The desiderata align with clinician perceptions and provide a structured approach for evaluating automated orders. Our findings highlight the potential for automation of certain clinical orders to improve cognitive support while maintaining patient safety.
Conclusion: Healthcare systems should use these desiderata, coupled with data mining techniques, to systematically identify and govern appropriate automated orders. Further research is needed to validate operational scalability.
简介:电子健康记录(EHRs)中临床医嘱的自动化有可能减轻临床医生的负担,提高患者的安全。然而,确定哪些订单适合自动化需要一个结构化的框架,以确保临床有效性、透明度和安全性。目的:开发和验证一个理想的框架,用于评估在电子病历中自动化临床医嘱的适当性,并展示其在实时卫生系统数据集中的操作价值。材料和方法:研究分为四个阶段,从概念产生到现实世界的演示。首先,我们使用基础理论进行焦点小组分析,以确定主题,并根据这些主题和现有文献开发出所需的数据。我们通过调查单个机构的临床医生来验证期望,提出10个用例,并使用4点李克特量表评估感知适当性、认知支持和患者安全性。使用t检验将调查结果与先验适当性指定进行比较。为了评估运营影响,我们使用过滤规则和关联规则挖掘分析了一年的基于订单的警报和订单(分别为140万解雇警报和4410万订单),以确定自动化的候选订单及其影响。结果:我们确定了自动化订单适当性的8个需求:逻辑一致性、数据来源、订单透明度、上下文持久性、监控计划、触发一致性、护理团队授权和系统责任。基于这些标准认为合适的用例在适当性(3.13±0.84 vs 2.30±0.99)、认知支持(3.08±0.82 vs 2.25±0.94)和患者安全性(3.08±0.86 vs 2.21±0.98)方面获得了显着更高的分数(所有P)讨论:期望与临床医生的看法一致,并提供了评估自动化订单的结构化方法。我们的研究结果强调了某些临床医嘱自动化的潜力,以提高认知支持,同时维护患者安全。结论:医疗保健系统应该使用这些理想的数据,结合数据挖掘技术,系统地识别和管理适当的自动化订单。需要进一步的研究来验证操作的可扩展性。
{"title":"Development and application of desiderata for automated clinical ordering.","authors":"Sameh N Saleh, Kevin B Johnson","doi":"10.1093/jamia/ocaf152","DOIUrl":"10.1093/jamia/ocaf152","url":null,"abstract":"<p><strong>Introduction: </strong>Automation of clinical orders in electronic health records (EHRs) has the potential to reduce clinician burden and enhance patient safety. However, determining which orders are appropriate for automation requires a structured framework to ensure clinical validity, transparency, and safety.</p><p><strong>Objective: </strong>To develop and validate a framework of desiderata for assessing the appropriateness of automating clinical orders in EHRs and to demonstrate its operational value in a live health system dataset.</p><p><strong>Materials and methods: </strong>The study comprised 4 phases to move from concept generation to real-world demonstration. First, we conducted focus group analyses using ground theory to identify themes and developed desiderata informed by these themes and existing literature. We validated the desiderata by surveying clinicians at a single institution, presenting 10 use cases to and assessing perceived appropriateness, cognitive support, and patient safety using a 4-point Likert scale. Survey results were compared to a priori appropriateness designations using t-tests. To evaluate operational impact, we analyzed one year of order-based alerts and orders (1.4 million firings alert and 44.1 million orders, respectively) using filtering rules and association rule mining to identify candidate orders for automation and their impact.</p><p><strong>Results: </strong>We identified 8 desiderata for automated order appropriateness: logical consistency, data provenance, order transparency, context permanence, monitoring plans, trigger consistency, care team empowerment, and system accountability. Use cases deemed appropriate based on these criteria received significantly higher scores for appropriateness (3.13 ± 0.84 vs 2.30 ± 0.99), cognitive support (3.08 ± 0.82 vs 2.25 ± 0.94), and patient safety (3.08 ± 0.86 vs 2.21 ± 0.98) (all P < .001) compared to those considered inappropriate. Operational analysis revealed an alert firing 19 109 times annually, with a 96% signed order rate, where automation could save an estimated 26.5 provider hours per year. Additionally, an association rule with 16 628 occurrences (68.4% confidence) suggested automation could save 15.8 hours annually and yield 8000 additional appropriate orders.</p><p><strong>Discussion: </strong>The desiderata align with clinician perceptions and provide a structured approach for evaluating automated orders. Our findings highlight the potential for automation of certain clinical orders to improve cognitive support while maintaining patient safety.</p><p><strong>Conclusion: </strong>Healthcare systems should use these desiderata, coupled with data mining techniques, to systematically identify and govern appropriate automated orders. Further research is needed to validate operational scalability.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1899-1907"},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}