首页 > 最新文献

Journal of the American Medical Informatics Association最新文献

英文 中文
Evaluating the potential of fast healthcare interoperability resources for clinical registry data submission. 评估用于临床注册数据提交的快速医疗互操作性资源的潜力。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-09 DOI: 10.1093/jamia/ocag029
James E Tcheng, David Finney, Keith Boone, Samit P Desai, David A Pyke, Nandan Shanbhag, Ganesan Srinivasan, Nick Ramsing, Mark D Kelemen

Objective: We conducted the Clinical Registry Extraction and Data Submission (CREDS) project to evaluate the readiness of HL7 Fast Healthcare Interoperability Resources (FHIR) for provisioning data from health information systems for the American College of Cardiology Cardiac Catheterization Percutaneous Coronary Intervention (CathPCI) Registry.

Materials and methods: The CREDS project had 3 workstreams: (1) evaluation of the readiness of clinical documentation for data transforms, (2) modeling of a FHIR-based clinical workflow for registry data submission, and (3) development and demonstration of a CREDS FHIR implementation for registry data submission.

Results: Of the 344 data concepts comprising the CathPCI Registry, only 111 (32%) were sufficiently discrete to be listed in the CathPCI Data Dictionary with a terminology mapping. Cardiologist informaticians identified an additional 42 concepts suitable for provisioning via a FHIR payload. The resulting notional workflow combined FHIR-based data assembly with manual chart abstraction of compound, summative, and complex clinical concepts. A CathPCI FHIR StructureDefinition artifact was authored, incorporated into a CREDS FHIR Implementation Guide, and balloted to Standard for Trial Use status.

Discussion: CREDS demonstrated both potential and limitations for using FHIR for registry data submission. The largest technical impediment was the volume of code (>11 000 lines) for the FHIR StructureDefinition. Lack of regularized clinical vocabularies, reliance of registries on complex clinical concepts, and absence of FHIR infrastructure must be overcome before CREDS can be used at scale.

Conclusion: CREDS demonstrated proof-of-concept FHIR-based provisioning of clinical data for registry submission. All artifacts are open source to inform others with similar interests.

目的:我们进行了临床登记提取和数据提交(CREDS)项目,以评估HL7快速医疗互操作性资源(FHIR)为美国心脏病学会心导管经皮冠状动脉介入(CathPCI)登记提供健康信息系统数据的准备情况。材料和方法:CREDS项目有3个工作流程:(1)评估数据转换的临床文档准备情况,(2)基于FHIR的临床工作流程建模,用于注册数据提交,(3)开发和演示用于注册数据提交的CREDS FHIR实现。结果:在包含CathPCI Registry的344个数据概念中,只有111个(32%)是足够离散的,可以用术语映射在CathPCI数据字典中列出。心脏病学家和信息学家确定了另外42个适合通过FHIR负载提供的概念。由此产生的概念性工作流程结合了基于fhr的数据组装和手工图表抽象的化合物、总结性和复杂的临床概念。编写了一个CathPCI FHIR结构定义工件,将其合并到CREDS FHIR实现指南中,并投票给试用状态的标准。讨论:CREDS展示了使用FHIR提交注册表数据的潜力和局限性。最大的技术障碍是FHIR StructureDefinition的代码量(大约11000行)。在大规模使用CREDS之前,必须克服缺乏规范的临床词汇表、依赖于复杂临床概念的注册表以及缺乏FHIR基础设施等问题。结论:CREDS展示了基于fhr的临床数据提供注册提交的概念验证。所有的工件都是开源的,以告知其他有相似兴趣的人。
{"title":"Evaluating the potential of fast healthcare interoperability resources for clinical registry data submission.","authors":"James E Tcheng, David Finney, Keith Boone, Samit P Desai, David A Pyke, Nandan Shanbhag, Ganesan Srinivasan, Nick Ramsing, Mark D Kelemen","doi":"10.1093/jamia/ocag029","DOIUrl":"https://doi.org/10.1093/jamia/ocag029","url":null,"abstract":"<p><strong>Objective: </strong>We conducted the Clinical Registry Extraction and Data Submission (CREDS) project to evaluate the readiness of HL7 Fast Healthcare Interoperability Resources (FHIR) for provisioning data from health information systems for the American College of Cardiology Cardiac Catheterization Percutaneous Coronary Intervention (CathPCI) Registry.</p><p><strong>Materials and methods: </strong>The CREDS project had 3 workstreams: (1) evaluation of the readiness of clinical documentation for data transforms, (2) modeling of a FHIR-based clinical workflow for registry data submission, and (3) development and demonstration of a CREDS FHIR implementation for registry data submission.</p><p><strong>Results: </strong>Of the 344 data concepts comprising the CathPCI Registry, only 111 (32%) were sufficiently discrete to be listed in the CathPCI Data Dictionary with a terminology mapping. Cardiologist informaticians identified an additional 42 concepts suitable for provisioning via a FHIR payload. The resulting notional workflow combined FHIR-based data assembly with manual chart abstraction of compound, summative, and complex clinical concepts. A CathPCI FHIR StructureDefinition artifact was authored, incorporated into a CREDS FHIR Implementation Guide, and balloted to Standard for Trial Use status.</p><p><strong>Discussion: </strong>CREDS demonstrated both potential and limitations for using FHIR for registry data submission. The largest technical impediment was the volume of code (>11 000 lines) for the FHIR StructureDefinition. Lack of regularized clinical vocabularies, reliance of registries on complex clinical concepts, and absence of FHIR infrastructure must be overcome before CREDS can be used at scale.</p><p><strong>Conclusion: </strong>CREDS demonstrated proof-of-concept FHIR-based provisioning of clinical data for registry submission. All artifacts are open source to inform others with similar interests.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-site analysis of COVID-19 and new-onset diabetes reveals need for improved sensitivity of EHR-based COVID-19 phenotypes-a DiCAYA Network analysis. 对COVID-19和新发糖尿病的多位点分析表明,需要提高基于ehr的COVID-19表型的敏感性——DiCAYA网络分析。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf229
Sarah Conderino, H Lester Kirchner, Lorna E Thorpe, Jasmin Divers, Annemarie G Hirsch, Cara M Nordberg, Brian S Schwartz, Lu Zhang, Bo Cai, Caroline Rudisill, Jihad S Obeid, Angela Liese, Katie S Allen, Brian E Dixon, Tessa Crume, Dana Dabelea, Shawna Burgett, Anna Bellatorre, Hui Shao, Jiang Bian, Yi Guo, Sarah Bost, Tianchen Lyu, Kristi Reynolds, Matthew T Mefford, Hui Zhou, Matt Zhou, Eva Lustigova, Levon H Utidjian, Mitchell Maltenfort, Manmohan Kamboj, Eneida A Mendonca, Patrick Hanley, Ibrahim Zaganjor, Meda E Pavkov, Marc Rosenman, Andrea R Titus

Objective: We discuss implications of potential ascertainment biases for studies examining diabetes risk following SARS-CoV-2 infection using electronic health records (EHRs). We quantitatively explore sensitivity of results to misclassification of COVID-19 status using data from the U.S.-based Diabetes in Children, Adolescents and Young Adults (DiCAYA) Network on children (≤17 years) and young adults (18-44 years).

Materials and methods: In our retrospective case study from the DiCAYA Network, SARS-CoV-2 was identified using labs and diagnoses from June 1, 2020 to December 31, 2021. Patients were followed through December 31, 2022 for new diabetes diagnoses. Sites examined incident diabetes by COVID-19 status using Cox proportional hazards models. Results were pooled in meta-analyses. A bias analysis examined potential impact of COVID-19 misclassification scenarios on results, guided by hypotheses that sensitivity would be <50% and would be higher among those who developed diabetes.

Results: Prevalence of documented COVID-19 was low overall and variable across sites (children: 4.4%-7.7%, young adults: 6.2%-22.7%). Individuals with documented COVID-19 were at higher risk of incident diabetes compared to those with no documented infection, but results were heterogeneous across sites. Findings were highly sensitive to COVID-19 misclassification assumptions. Observed results could be biased away from the null under several differential misclassification scenarios.

Discussion: Although EHR-based documentation of COVID-19 was associated with incident diabetes, COVID-19 phenotypes likely had low sensitivity, with considerable variation across sites. Misclassification assumptions strongly impacted interpretation of results.

Conclusion: Given the potential for low phenotype sensitivity and misclassification, caution is warranted when interpreting analyses of COVID-19 and incident diabetes using clinical or administrative databases.

目的:我们讨论使用电子健康记录(EHRs)检查SARS-CoV-2感染后糖尿病风险的研究中潜在的确定偏差的含义。我们利用美国儿童、青少年和年轻人糖尿病(DiCAYA)网络对儿童(≤17岁)和年轻人(18-44岁)的数据,定量探讨了结果对COVID-19状态错误分类的敏感性。材料和方法:在我们来自DiCAYA网络的回顾性病例研究中,使用实验室和诊断从2020年6月1日至2021年12月31日确定了SARS-CoV-2。随访患者至2022年12月31日,以获得新的糖尿病诊断。站点使用Cox比例风险模型对COVID-19状态下的糖尿病事件进行了检查。结果汇总在荟萃分析中。一项偏倚分析考察了COVID-19错误分类情景对结果的潜在影响,假设敏感性为:结果:记录的COVID-19患病率总体较低,各部位差异较大(儿童:4.4%-7.7%,年轻人:6.2%-22.7%)。与未记录感染的个体相比,记录感染COVID-19的个体发生糖尿病的风险更高,但不同部位的结果不同。研究结果对COVID-19错误分类假设高度敏感。在几种不同的误分类情况下,观察到的结果可能偏离零值。讨论:尽管基于电子病历的COVID-19记录与糖尿病事件相关,但COVID-19表型可能具有低敏感性,且各部位差异很大。错误的分类假设严重影响了对结果的解释。结论:考虑到潜在的低表型敏感性和错误分类,在使用临床或管理数据库解释COVID-19和偶发糖尿病的分析时需要谨慎。
{"title":"Multi-site analysis of COVID-19 and new-onset diabetes reveals need for improved sensitivity of EHR-based COVID-19 phenotypes-a DiCAYA Network analysis.","authors":"Sarah Conderino, H Lester Kirchner, Lorna E Thorpe, Jasmin Divers, Annemarie G Hirsch, Cara M Nordberg, Brian S Schwartz, Lu Zhang, Bo Cai, Caroline Rudisill, Jihad S Obeid, Angela Liese, Katie S Allen, Brian E Dixon, Tessa Crume, Dana Dabelea, Shawna Burgett, Anna Bellatorre, Hui Shao, Jiang Bian, Yi Guo, Sarah Bost, Tianchen Lyu, Kristi Reynolds, Matthew T Mefford, Hui Zhou, Matt Zhou, Eva Lustigova, Levon H Utidjian, Mitchell Maltenfort, Manmohan Kamboj, Eneida A Mendonca, Patrick Hanley, Ibrahim Zaganjor, Meda E Pavkov, Marc Rosenman, Andrea R Titus","doi":"10.1093/jamia/ocaf229","DOIUrl":"10.1093/jamia/ocaf229","url":null,"abstract":"<p><strong>Objective: </strong>We discuss implications of potential ascertainment biases for studies examining diabetes risk following SARS-CoV-2 infection using electronic health records (EHRs). We quantitatively explore sensitivity of results to misclassification of COVID-19 status using data from the U.S.-based Diabetes in Children, Adolescents and Young Adults (DiCAYA) Network on children (≤17 years) and young adults (18-44 years).</p><p><strong>Materials and methods: </strong>In our retrospective case study from the DiCAYA Network, SARS-CoV-2 was identified using labs and diagnoses from June 1, 2020 to December 31, 2021. Patients were followed through December 31, 2022 for new diabetes diagnoses. Sites examined incident diabetes by COVID-19 status using Cox proportional hazards models. Results were pooled in meta-analyses. A bias analysis examined potential impact of COVID-19 misclassification scenarios on results, guided by hypotheses that sensitivity would be <50% and would be higher among those who developed diabetes.</p><p><strong>Results: </strong>Prevalence of documented COVID-19 was low overall and variable across sites (children: 4.4%-7.7%, young adults: 6.2%-22.7%). Individuals with documented COVID-19 were at higher risk of incident diabetes compared to those with no documented infection, but results were heterogeneous across sites. Findings were highly sensitive to COVID-19 misclassification assumptions. Observed results could be biased away from the null under several differential misclassification scenarios.</p><p><strong>Discussion: </strong>Although EHR-based documentation of COVID-19 was associated with incident diabetes, COVID-19 phenotypes likely had low sensitivity, with considerable variation across sites. Misclassification assumptions strongly impacted interpretation of results.</p><p><strong>Conclusion: </strong>Given the potential for low phenotype sensitivity and misclassification, caution is warranted when interpreting analyses of COVID-19 and incident diabetes using clinical or administrative databases.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"710-718"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12884381/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145829065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying and supporting trafficked individuals: provider and community organization perspectives on existing sociotechnical approaches. 识别和支持被贩运的个人:提供者和社区组织对现有社会技术方法的看法。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf220
Michelle Gomez, Ellen W Clayton, Colin G Walsh, Kim M Unertl

Objectives: Trafficked persons experience adverse health consequences and seek help, but many go unrecognized by health-care professionals. This study explored professionals' perspectives on current approaches toward identifying and supporting trafficked persons in health-care settings, highlighting current technology roles, gaps, and future directions.

Materials and methods: We developed an interview guide to investigate current human trafficking (HT) approaches, safety procedures, and HT education. Semistructured interviews were conducted via Zoom, iteratively coded in Dedoose, and analyzed using a thematic analysis approach.

Results: We interviewed 19 health-care and community group professionals and identified 3 themes: (1) participants described a responsibility to build trust with patients through compassionate communication, rapport, and trauma-informed approaches across different stages of care. (2) Technology played a dual role, as professionals navigated both benefits and challenges of tools such as Zoom, virtual interpreters, and cameras in trust building. (3) Safety and privacy concerns guided how participants documented patient encounters and shared community resources, ensuring confidentiality while supporting patient and community well-being.

Discussion: Technology can both support and hinder trust in health care, directly affecting trafficked patients and their safety. Informatics can improve care for trafficked persons, but further research is needed on technology-based interventions. We provide recommendations to strengthen trust, enhance safety, support trauma-informed care, and promote safe documentation practices.

Conclusion: Effective sociotechnical approaches rely on trust, safety, and mindful documentation to support trafficked patients. Future research directions include refining the role of informatics in trauma-informed care to strengthen trust and mitigate unintended consequences.

目标:被贩运者经历了不利的健康后果并寻求帮助,但许多人没有得到保健专业人员的认识。本研究探讨了专业人员对目前在卫生保健机构中识别和支持被贩运者的方法的看法,强调了目前的技术作用、差距和未来方向。材料和方法:我们制定了一份访谈指南来调查当前的人口贩运(HT)方法、安全程序和HT教育。通过Zoom进行半结构化访谈,在Dedoose中迭代编码,并使用主题分析方法进行分析。结果:我们采访了19名医疗保健和社区团体专业人员,并确定了3个主题:(1)参与者描述了在不同护理阶段通过富有同情心的沟通、融洽关系和创伤知情方法与患者建立信任的责任。(2)技术发挥了双重作用,因为专业人员在建立信任方面既能驾驭Zoom、虚拟口译员和摄像机等工具的优势,也能应对它们带来的挑战。(3)安全和隐私问题指导参与者如何记录患者遭遇和共享社区资源,在支持患者和社区福祉的同时确保保密性。讨论:技术既可以支持也可以阻碍对卫生保健的信任,直接影响到被贩运的病人及其安全。信息学可以改善对被贩运者的护理,但需要进一步研究基于技术的干预措施。我们提供建议,以加强信任,提高安全性,支持创伤知情护理,并促进安全的文件实践。结论:有效的社会技术手段依赖于信任、安全和谨慎的文件来支持被拐卖的患者。未来的研究方向包括完善信息学在创伤知情护理中的作用,以加强信任和减轻意外后果。
{"title":"Identifying and supporting trafficked individuals: provider and community organization perspectives on existing sociotechnical approaches.","authors":"Michelle Gomez, Ellen W Clayton, Colin G Walsh, Kim M Unertl","doi":"10.1093/jamia/ocaf220","DOIUrl":"10.1093/jamia/ocaf220","url":null,"abstract":"<p><strong>Objectives: </strong>Trafficked persons experience adverse health consequences and seek help, but many go unrecognized by health-care professionals. This study explored professionals' perspectives on current approaches toward identifying and supporting trafficked persons in health-care settings, highlighting current technology roles, gaps, and future directions.</p><p><strong>Materials and methods: </strong>We developed an interview guide to investigate current human trafficking (HT) approaches, safety procedures, and HT education. Semistructured interviews were conducted via Zoom, iteratively coded in Dedoose, and analyzed using a thematic analysis approach.</p><p><strong>Results: </strong>We interviewed 19 health-care and community group professionals and identified 3 themes: (1) participants described a responsibility to build trust with patients through compassionate communication, rapport, and trauma-informed approaches across different stages of care. (2) Technology played a dual role, as professionals navigated both benefits and challenges of tools such as Zoom, virtual interpreters, and cameras in trust building. (3) Safety and privacy concerns guided how participants documented patient encounters and shared community resources, ensuring confidentiality while supporting patient and community well-being.</p><p><strong>Discussion: </strong>Technology can both support and hinder trust in health care, directly affecting trafficked patients and their safety. Informatics can improve care for trafficked persons, but further research is needed on technology-based interventions. We provide recommendations to strengthen trust, enhance safety, support trauma-informed care, and promote safe documentation practices.</p><p><strong>Conclusion: </strong>Effective sociotechnical approaches rely on trust, safety, and mindful documentation to support trafficked patients. Future research directions include refining the role of informatics in trauma-informed care to strengthen trust and mitigate unintended consequences.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"641-652"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981671/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved evaluation frameworks are required to move application of LLMs from research into clinical practice. 需要改进评估框架,将法学硕士的应用从研究转移到临床实践。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocag016
Suzanne Bakken
{"title":"Improved evaluation frameworks are required to move application of LLMs from research into clinical practice.","authors":"Suzanne Bakken","doi":"10.1093/jamia/ocag016","DOIUrl":"10.1093/jamia/ocag016","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":"33 3","pages":"551-552"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981623/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147445873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AutoReporter: development of an artificial intelligence tool for automated assessment of research reporting guideline adherence. AutoReporter:开发用于自动评估研究报告指南遵守情况的人工智能工具。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf223
David Chen, Patrick Li, Ealia Khoshkish, Seungmin Lee, Tony Ning, Umair Tahir, Henry C Y Wong, Michael S F Lee, Srinivas Raman

Objectives: To develop AutoReporter, a large language model (LLM) system that automates evaluation of adherence to research reporting guidelines.

Materials and methods: Eight prompt-engineering and retrieval strategies coupled with reasoning and general-purpose LLMs were benchmarked on the SPIRIT-CONSORT-TM corpus. The top-performing approach, AutoReporter, was validated on BenchReport, a novel benchmark dataset of expert-rated reporting guideline assessments from 10 systematic reviews.

Results: AutoReporter, a zero-shot, no-retrieval prompt coupled with the o3-mini reasoning LLM, demonstrated strong accuracy (CONSORT 90.09%; SPIRIT: 92.07%), substantial agreement with humans (CONSORT Cohen's κ = 0.70, SPIRIT Cohen's κ = 0.77), runtime (CONSORT: 617.26 s; SPIRIT: 544.51 s), and cost (CONSORT: 0.68 USD; SPIRIT: 0.65 USD). AutoReporter achieved a mean accuracy of 91.8% and substantial agreement (Cohen's κ > 0.6) with expert ratings from the BenchReport benchmark.

Discussion: Structured prompting alone can match or exceed fine-tuned domain models while forgoing manually annotated corpora and computationally intensive training.

Conclusion: Large language models can feasibly automate reporting guideline adherence assessments for scalable quality control in scientific research reporting. AutoReporter is publicly accessible at https://autoreporter.streamlit.app.

目的:开发AutoReporter,一个大型语言模型(LLM)系统,自动评估遵守研究报告指南。材料和方法:在spirit - consortium - tm语料库上对8种与推理和通用llm相结合的提示工程和检索策略进行了基准测试。表现最好的方法AutoReporter在BenchReport上得到了验证,BenchReport是一个新的基准数据集,由10个系统评论的专家评级报告指南评估组成。结果:AutoReporter,零采样,无检索提示与o3-mini推理LLM相结合,显示出很强的准确性(CONSORT 90.09%; SPIRIT: 92.07%),与人类基本一致(CONSORT Cohen's κ = 0.70, SPIRIT Cohen's κ = 0.77),运行时间(CONSORT: 617.26 s; SPIRIT: 544.51 s),成本(CONSORT: 0.68美元;SPIRIT: 0.65美元)。AutoReporter的平均准确率为91.8%,与BenchReport基准的专家评级基本一致(Cohen’s κ > 0.6)。讨论:单独的结构化提示可以匹配或超过微调的领域模型,同时放弃手动注释的语料库和计算密集型训练。结论:大型语言模型可实现科研报告质量控制中报告准则依从性评估的自动化。AutoReporter可在https://autoreporter.streamlit.app公开访问。
{"title":"AutoReporter: development of an artificial intelligence tool for automated assessment of research reporting guideline adherence.","authors":"David Chen, Patrick Li, Ealia Khoshkish, Seungmin Lee, Tony Ning, Umair Tahir, Henry C Y Wong, Michael S F Lee, Srinivas Raman","doi":"10.1093/jamia/ocaf223","DOIUrl":"10.1093/jamia/ocaf223","url":null,"abstract":"<p><strong>Objectives: </strong>To develop AutoReporter, a large language model (LLM) system that automates evaluation of adherence to research reporting guidelines.</p><p><strong>Materials and methods: </strong>Eight prompt-engineering and retrieval strategies coupled with reasoning and general-purpose LLMs were benchmarked on the SPIRIT-CONSORT-TM corpus. The top-performing approach, AutoReporter, was validated on BenchReport, a novel benchmark dataset of expert-rated reporting guideline assessments from 10 systematic reviews.</p><p><strong>Results: </strong>AutoReporter, a zero-shot, no-retrieval prompt coupled with the o3-mini reasoning LLM, demonstrated strong accuracy (CONSORT 90.09%; SPIRIT: 92.07%), substantial agreement with humans (CONSORT Cohen's κ = 0.70, SPIRIT Cohen's κ = 0.77), runtime (CONSORT: 617.26 s; SPIRIT: 544.51 s), and cost (CONSORT: 0.68 USD; SPIRIT: 0.65 USD). AutoReporter achieved a mean accuracy of 91.8% and substantial agreement (Cohen's κ > 0.6) with expert ratings from the BenchReport benchmark.</p><p><strong>Discussion: </strong>Structured prompting alone can match or exceed fine-tuned domain models while forgoing manually annotated corpora and computationally intensive training.</p><p><strong>Conclusion: </strong>Large language models can feasibly automate reporting guideline adherence assessments for scalable quality control in scientific research reporting. AutoReporter is publicly accessible at https://autoreporter.streamlit.app.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"724-731"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981685/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Patient perspectives on gender identity and anatomy data collection in electronic health records: a qualitative study. 患者对电子健康记录中性别认同和解剖数据收集的看法:一项定性研究。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf205
Samuel Dubin, Gabrielle Mayer, Nishant Pradhan, Madeline Xin, Richard Greene

Objectives: Documentation of gender identity (GI) and anatomy data in the electronic health record (EHR) is a proposed standard of care for transgender populations. However, there is limited research on implementation of proposed best practices, particularly anatomy data collection. This study aims to characterize factors that influence patient preferences and comfort around the collection and documentation of GI and anatomy in EHRs.

Materials and methods: From November 2023 to January 2024, 17 one-on-one, semi-structured virtual interviews were conducted with transgender adults residing in the Metropolitan New York area. Transcriptions were analyzed using inductive thematic analysis.

Results: Themes clustered around comfort and preferences for data collection processes and outcomes. Factors that influenced preferences and comfort around anatomy data were distinct from those impacting GI documentation preferences and comfort. The tension between the categories of GI and sex assigned at birth impacted anatomy data documentation preferences. Clinical context emerged as a consistent factor that impacts both preferences and comfort of GI and anatomy data documentation.

Discussion and conclusion: GI data collection efforts in clinical settings must consider the implication of anatomy data collection when determining data collection best practice methodologies. Anticipated and experienced stigma remain significant hurdles to patient comfort and willingness to collect GI and anatomy data, and their impact on actual data collection should be further elucidated among diverse gender identities. Clinical data collection methods, tools, and education warrant ongoing research investment to further elucidate best practices.

目的:在电子健康记录(EHR)中记录性别认同(GI)和解剖数据是跨性别人群的拟议护理标准。然而,关于实施建议的最佳实践的研究有限,特别是解剖数据收集。本研究旨在描述影响患者在电子病历中收集和记录GI和解剖的偏好和舒适度的因素。材料与方法:从2023年11月至2024年1月,对居住在纽约大都会地区的跨性别成年人进行了17次一对一、半结构化的虚拟访谈。采用归纳主题性分析对转录进行分析。结果:主题集中在舒适度和偏好的数据收集过程和结果。影响解剖学数据偏好和舒适度的因素与影响GI文档偏好和舒适度的因素不同。GI分类和出生时性别分配之间的紧张关系影响了解剖学数据记录的偏好。临床环境成为影响GI和解剖数据文档偏好和舒适度的一致因素。讨论和结论:在确定数据收集最佳实践方法时,临床环境中的GI数据收集工作必须考虑解剖学数据收集的含义。预期和经历的耻辱感仍然是患者舒适度和收集GI和解剖数据意愿的重大障碍,它们对实际数据收集的影响应在不同性别认同中进一步阐明。临床数据收集方法、工具和教育需要持续的研究投资,以进一步阐明最佳实践。
{"title":"Patient perspectives on gender identity and anatomy data collection in electronic health records: a qualitative study.","authors":"Samuel Dubin, Gabrielle Mayer, Nishant Pradhan, Madeline Xin, Richard Greene","doi":"10.1093/jamia/ocaf205","DOIUrl":"10.1093/jamia/ocaf205","url":null,"abstract":"<p><strong>Objectives: </strong>Documentation of gender identity (GI) and anatomy data in the electronic health record (EHR) is a proposed standard of care for transgender populations. However, there is limited research on implementation of proposed best practices, particularly anatomy data collection. This study aims to characterize factors that influence patient preferences and comfort around the collection and documentation of GI and anatomy in EHRs.</p><p><strong>Materials and methods: </strong>From November 2023 to January 2024, 17 one-on-one, semi-structured virtual interviews were conducted with transgender adults residing in the Metropolitan New York area. Transcriptions were analyzed using inductive thematic analysis.</p><p><strong>Results: </strong>Themes clustered around comfort and preferences for data collection processes and outcomes. Factors that influenced preferences and comfort around anatomy data were distinct from those impacting GI documentation preferences and comfort. The tension between the categories of GI and sex assigned at birth impacted anatomy data documentation preferences. Clinical context emerged as a consistent factor that impacts both preferences and comfort of GI and anatomy data documentation.</p><p><strong>Discussion and conclusion: </strong>GI data collection efforts in clinical settings must consider the implication of anatomy data collection when determining data collection best practice methodologies. Anticipated and experienced stigma remain significant hurdles to patient comfort and willingness to collect GI and anatomy data, and their impact on actual data collection should be further elucidated among diverse gender identities. Clinical data collection methods, tools, and education warrant ongoing research investment to further elucidate best practices.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"587-592"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981651/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145726643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogenous effect of automated alerts on mortality. 自动警报对死亡率的异质性影响。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf222
Benjamin D Wissel, Zana Percy, Tanner J Zachem, Brett Beaulieu-Jones, Isaac S Kohane, Stuart L Goldstein, Emrah Gecili, Judith W Dexheimer

Objective: To understand the heterogeneous treatment effects of electronic alerts for acute kidney injury (AKI).

Materials and methods: Secondary analysis of individual patient data from 3 randomized controlled trials. Our outcome measure was 14-day all-cause mortality. Data from the ELAIA-1 trial were used to predict the individualized effect of alerts on mortality based on patients' phenotype. Results were internally validated on a holdout dataset and externally validated using data from 2 additional trials: UPenn and ELAIA-2. We used machine learning-based methods and performed a meta-analysis on individual patient data to identify patient subgroups whose risk of mortality was associated with alerts. In addition, provider actions following alerts were examined to explain how alerts impacted patient mortality.

Results: Compared to patients who were predicted to be harmed by an alert, patients predicted to benefit had a lower risk of death in both the internal validation cohort (n = 1809 patients; Pinteraction = .045) and both external validation cohorts (n = 7453 patients; Pinteraction < .0001). In external cohorts, 43 deaths may have been preventable if alerts were restricted to likely beneficiaries. Machine-learning based meta-analysis identified reduced mortality with alerts among patients with higher blood pressures (BP) and lower predicted risk, but increased mortality in non-urban and non-teaching hospitals. Provider responses to alerts differed across subgroups.

Discussion: Our findings indicate substantial heterogeneity in the effects of AKI alerts on patient mortality. Tailoring alert delivery based on predicted benefit may mitigate harm and enhance clinical outcomes.

Conclusion: Individualizing automated alerts may reduce all-cause mortality. A prospective trial of individualized alerts is needed to confirm these results.

Trial registration: https://clinicaltrials.gov/ct2/show/NCT02753751 and https://clinicaltrials.gov/ct2/show/NCT02771977.

目的:了解电子警报对急性肾损伤(AKI)的异质性治疗效果。材料和方法:对来自3个随机对照试验的个体患者资料进行二次分析。我们的结局指标是14天全因死亡率。来自ELAIA-1试验的数据用于预测基于患者表型的警报对死亡率的个体化影响。结果在一个抵抗数据集上进行了内部验证,并使用另外两个试验(UPenn和ELAIA-2)的数据进行了外部验证。我们使用了基于机器学习的方法,并对个体患者数据进行了荟萃分析,以确定死亡风险与警报相关的患者亚组。此外,还检查了警报后提供者的行动,以解释警报如何影响患者死亡率。结果:在两组内部验证队列(n = 1809例患者;p - interaction =。)中,与预测会受到警报伤害的患者相比,预测会受益的患者的死亡风险更低。045)和两个外部验证队列(n = 7453例患者;p相互作用讨论:我们的研究结果表明AKI警报对患者死亡率的影响存在很大的异质性。根据预测的益处来调整警报传递可能会减轻危害并提高临床结果。结论:个性化自动警报可降低全因死亡率。需要一项个性化警报的前瞻性试验来证实这些结果。试用注册:https://clinicaltrials.gov/ct2/show/NCT02753751和https://clinicaltrials.gov/ct2/show/NCT02771977。
{"title":"Heterogenous effect of automated alerts on mortality.","authors":"Benjamin D Wissel, Zana Percy, Tanner J Zachem, Brett Beaulieu-Jones, Isaac S Kohane, Stuart L Goldstein, Emrah Gecili, Judith W Dexheimer","doi":"10.1093/jamia/ocaf222","DOIUrl":"10.1093/jamia/ocaf222","url":null,"abstract":"<p><strong>Objective: </strong>To understand the heterogeneous treatment effects of electronic alerts for acute kidney injury (AKI).</p><p><strong>Materials and methods: </strong>Secondary analysis of individual patient data from 3 randomized controlled trials. Our outcome measure was 14-day all-cause mortality. Data from the ELAIA-1 trial were used to predict the individualized effect of alerts on mortality based on patients' phenotype. Results were internally validated on a holdout dataset and externally validated using data from 2 additional trials: UPenn and ELAIA-2. We used machine learning-based methods and performed a meta-analysis on individual patient data to identify patient subgroups whose risk of mortality was associated with alerts. In addition, provider actions following alerts were examined to explain how alerts impacted patient mortality.</p><p><strong>Results: </strong>Compared to patients who were predicted to be harmed by an alert, patients predicted to benefit had a lower risk of death in both the internal validation cohort (n = 1809 patients; Pinteraction = .045) and both external validation cohorts (n = 7453 patients; Pinteraction < .0001). In external cohorts, 43 deaths may have been preventable if alerts were restricted to likely beneficiaries. Machine-learning based meta-analysis identified reduced mortality with alerts among patients with higher blood pressures (BP) and lower predicted risk, but increased mortality in non-urban and non-teaching hospitals. Provider responses to alerts differed across subgroups.</p><p><strong>Discussion: </strong>Our findings indicate substantial heterogeneity in the effects of AKI alerts on patient mortality. Tailoring alert delivery based on predicted benefit may mitigate harm and enhance clinical outcomes.</p><p><strong>Conclusion: </strong>Individualizing automated alerts may reduce all-cause mortality. A prospective trial of individualized alerts is needed to confirm these results.</p><p><strong>Trial registration: </strong>https://clinicaltrials.gov/ct2/show/NCT02753751 and https://clinicaltrials.gov/ct2/show/NCT02771977.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"653-662"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981663/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145829031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Information extraction from clinical notes: are we ready to switch to large language models? 从临床记录中提取信息:我们准备好转向大型语言模型了吗?
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf213
Yan Hu, Xu Zuo, Yujia Zhou, Xueqing Peng, Jimin Huang, Vipina K Keloth, Vincent J Zhang, Ruey-Ling Weng, Cathy Shyr, Qingyu Chen, Xiaoqian Jiang, Kirk E Roberts, Hua Xu

Objectives: To assess the performance, generalizability, and computational efficiency of instruction-tuned Large Language Model Meta AI (LLaMA)-2 and LLaMA-3 models compared to bidirectional encoder representations from transformers (BERT) for clinical information extraction (IE) tasks, specifically named entity recognition (NER) and relation extraction (RE).

Materials and methods: We developed a comprehensive annotated corpus of 1588 clinical notes from 4 data sources-UT Physicians (UTP) (1342 notes), Transcribed Medical Transcription Sample Reports and Examples (MTSamples) (146), Medical Information Mart for Intensive Care (MIMIC)-III (50), and Informatics for Integrating Biology and the Bedside (i2b2) (50), capturing 4 clinical entities (problems, tests, medications, other treatments) and 16 modifiers (eg, negation, certainty). Large Language Model Meta AI-2 and LLaMA-3 were instruction-tuned for clinical NER and RE, and their performance was benchmarked against BERT.

Results: Large Language Model Meta AI models consistently outperformed BERT across datasets. In data-rich settings (eg, UTP), LLaMA achieved marginal gains (approximately 1% improvement for NER and 1.5%-3.7% for RE). Under limited data conditions (eg, MTSamples, MIMIC-III) and on the unseen i2b2 dataset, LLaMA-3-70B improved F1 scores by over 7% for NER and 4% for RE. However, performance gains came with increased computational costs, with LLaMA models requiring more memory and Graphics Processing Unit (GPU) hours and running up to 28 times slower than BERT.

Discussion: While LLaMA models offer enhanced performance, their higher computational demands and slower throughput highlight the need to balance performance with practical resource constraints. Application-specific considerations are essential when choosing between LLMs and BERT for clinical IE.

Conclusion: Instruction-tuned LLaMA models show promise for clinical NER and RE tasks. However, the tradeoff between improved performance and increased computational cost must be carefully evaluated. We release our Kiwi package (https://kiwi.clinicalnlp.org/) to facilitate the application of both LLaMA and BERT models in clinical IE applications.

目的:评估指令调谐大型语言模型元AI (LLaMA)-2和LLaMA-3模型在临床信息提取(IE)任务,特别是命名实体识别(NER)和关系提取(RE)中的性能、通用性和计算效率,并与来自变压器的双向编码器表示(BERT)进行比较。材料和方法:我们从4个数据源——ut医师(UTP)(1342个笔记)、转录医学转录样本报告和示例(MTSamples)(146个)、重症监护医学信息市场(MIMIC)-III(50个)和整合生物学和床边信息学(i2b2)(50个)——开发了一个综合注释的1588个临床笔记的数据库,捕获了4个临床实体(问题、测试、药物、其他治疗)和16个修饰词(例如,否定、确定性)。大型语言模型Meta AI-2和LLaMA-3对临床NER和RE进行了指令调整,并以BERT为基准进行了性能测试。结果:大型语言模型元人工智能模型在数据集上始终优于BERT。在数据丰富的环境中(如UTP), LLaMA取得了边际收益(NER改善约1%,RE改善1.5%-3.7%)。在有限的数据条件下(例如,MTSamples, MIMIC-III)和不可见的i2b2数据集,LLaMA-3- 70b提高F1分数超过7%的NER和4%的RE。然而,性能的提高伴随着计算成本的增加,LLaMA模型需要更多的内存和图形处理单元(GPU)小时,运行速度比BERT慢28倍。讨论:虽然LLaMA模型提供了增强的性能,但它们更高的计算需求和更慢的吞吐量突出了平衡性能与实际资源约束的需要。当在llm和BERT之间选择临床IE时,特定应用的考虑是必不可少的。结论:指令调整的LLaMA模型在临床NER和RE任务中表现出良好的前景。但是,必须仔细评估改进的性能和增加的计算成本之间的权衡。我们发布了我们的Kiwi包(https://kiwi.clinicalnlp.org/),以促进LLaMA和BERT模型在临床IE应用中的应用。
{"title":"Information extraction from clinical notes: are we ready to switch to large language models?","authors":"Yan Hu, Xu Zuo, Yujia Zhou, Xueqing Peng, Jimin Huang, Vipina K Keloth, Vincent J Zhang, Ruey-Ling Weng, Cathy Shyr, Qingyu Chen, Xiaoqian Jiang, Kirk E Roberts, Hua Xu","doi":"10.1093/jamia/ocaf213","DOIUrl":"10.1093/jamia/ocaf213","url":null,"abstract":"<p><strong>Objectives: </strong>To assess the performance, generalizability, and computational efficiency of instruction-tuned Large Language Model Meta AI (LLaMA)-2 and LLaMA-3 models compared to bidirectional encoder representations from transformers (BERT) for clinical information extraction (IE) tasks, specifically named entity recognition (NER) and relation extraction (RE).</p><p><strong>Materials and methods: </strong>We developed a comprehensive annotated corpus of 1588 clinical notes from 4 data sources-UT Physicians (UTP) (1342 notes), Transcribed Medical Transcription Sample Reports and Examples (MTSamples) (146), Medical Information Mart for Intensive Care (MIMIC)-III (50), and Informatics for Integrating Biology and the Bedside (i2b2) (50), capturing 4 clinical entities (problems, tests, medications, other treatments) and 16 modifiers (eg, negation, certainty). Large Language Model Meta AI-2 and LLaMA-3 were instruction-tuned for clinical NER and RE, and their performance was benchmarked against BERT.</p><p><strong>Results: </strong>Large Language Model Meta AI models consistently outperformed BERT across datasets. In data-rich settings (eg, UTP), LLaMA achieved marginal gains (approximately 1% improvement for NER and 1.5%-3.7% for RE). Under limited data conditions (eg, MTSamples, MIMIC-III) and on the unseen i2b2 dataset, LLaMA-3-70B improved F1 scores by over 7% for NER and 4% for RE. However, performance gains came with increased computational costs, with LLaMA models requiring more memory and Graphics Processing Unit (GPU) hours and running up to 28 times slower than BERT.</p><p><strong>Discussion: </strong>While LLaMA models offer enhanced performance, their higher computational demands and slower throughput highlight the need to balance performance with practical resource constraints. Application-specific considerations are essential when choosing between LLMs and BERT for clinical IE.</p><p><strong>Conclusion: </strong>Instruction-tuned LLaMA models show promise for clinical NER and RE tasks. However, the tradeoff between improved performance and increased computational cost must be carefully evaluated. We release our Kiwi package (https://kiwi.clinicalnlp.org/) to facilitate the application of both LLaMA and BERT models in clinical IE applications.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"553-562"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981642/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145985179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing example-selection in retrieval-augmented biomedical in-context learning: reflections on the MMRAG study. 检索增强生物医学语境学习中优化样本选择:对MMRAG研究的反思。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf236
Weihao Cheng
{"title":"Optimizing example-selection in retrieval-augmented biomedical in-context learning: reflections on the MMRAG study.","authors":"Weihao Cheng","doi":"10.1093/jamia/ocaf236","DOIUrl":"10.1093/jamia/ocaf236","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"779-780"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981649/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146202867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable confounding adjustment in real-world evidence: benchmarking data-adaptive and investigator-specified strategies in a large-scale trial emulation study. 真实世界证据中的可扩展混杂调整:在大规模试验模拟研究中对数据自适应和研究者指定策略进行基准测试。
IF 4.6 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-03-01 DOI: 10.1093/jamia/ocaf204
Andrew R Weckstein, Shirley V Wang, Richard Wyss, Sebastian Schneeweiss

Objectives: Real-world evidence (RWE) increasingly informs clinical decisions, yet manual adjustment for confounding limits scalability. Data-adaptive (DA) algorithms for high-dimensional proxy adjustment show promise but have not been systematically compared to investigator-specified (IS) approaches across diverse treatment scenarios. We evaluated whether DA strategies perform comparably to manually curated IS models using claims-based emulations of 15 randomized trials from the RCT-DUPLICATE initiative.

Materials and methods: We identified new-user cohorts for 15 trial emulations in Optum's de-identified Clinformatics Data Mart Database (2004-2023). Treatment effects were estimated using 3 adjustment strategies: (1) IS models with manually tailored covariates; (2) full-DA strategies using empirical features from semiautomated pipelines; and (3) hybrid-DA models incorporating both empirical and investigator-defined covariates. Agreement with RCT benchmarks was assessed via binary metrics and difference-in-differences.

Results: Outcome-adaptive LASSO achieved better RWE-RCT agreement than IS adjustment in 73% of full-DA and 87% of hybrid-DA emulations. Other DA methods considering feature associations with both treatment and outcome performed similarly well, while models tuned solely for treatment prediction performed poorly. Performance of IS vs DA strategies differed across emulated trials.

Discussion: Top DA algorithms matched manual IS models on average, but impact varied by emulation. Case studies illustrate the continued importance of subject-matter knowledge, particularly for complex treatment strategies.

Conclusion: Data-adaptive algorithms show promise for scalable confounding adjustment in large-scale evidence systems and as augmentation tools for investigator-specified designs. Hybrid strategies combining algorithmic methods with investigator expertise offer the most reliable approach for individual causal questions.

目的:真实世界证据(RWE)越来越多地为临床决策提供信息,但人为调整混杂因素限制了可扩展性。用于高维代理调整的数据自适应(DA)算法显示出前景,但尚未在不同治疗方案中与研究者指定(IS)方法进行系统比较。我们使用来自RCT-DUPLICATE计划的15个随机试验的基于索赔的模拟来评估DA策略是否与人工策划的IS模型表现相当。材料和方法:我们在Optum的去识别临床数据集市数据库(2004-2023)中确定了15个试验模拟的新用户队列。采用3种调整策略估计治疗效果:(1)人工定制协变量的IS模型;(2)基于半自动化管道经验特征的全数据分析策略;(3)结合实证变量和研究者定义协变量的混合数据分析模型。通过二元指标和差中差评估与RCT基准的一致性。结果:结果自适应LASSO在73%的全da和87%的混合da模拟中获得了比IS调整更好的RWE-RCT一致性。其他考虑到与治疗和结果的特征关联的数据分析方法同样表现良好,而仅针对治疗预测进行调整的模型表现不佳。在模拟试验中,IS和DA策略的性能存在差异。讨论:顶级数据处理算法平均匹配手动IS模型,但影响因仿真而异。案例研究说明了主题知识的持续重要性,特别是对于复杂的治疗策略。结论:数据自适应算法有望在大规模证据系统中进行可扩展的混杂调整,并作为调查员指定设计的增强工具。混合策略结合算法方法和调查员的专业知识,为个别因果问题提供了最可靠的方法。
{"title":"Scalable confounding adjustment in real-world evidence: benchmarking data-adaptive and investigator-specified strategies in a large-scale trial emulation study.","authors":"Andrew R Weckstein, Shirley V Wang, Richard Wyss, Sebastian Schneeweiss","doi":"10.1093/jamia/ocaf204","DOIUrl":"10.1093/jamia/ocaf204","url":null,"abstract":"<p><strong>Objectives: </strong>Real-world evidence (RWE) increasingly informs clinical decisions, yet manual adjustment for confounding limits scalability. Data-adaptive (DA) algorithms for high-dimensional proxy adjustment show promise but have not been systematically compared to investigator-specified (IS) approaches across diverse treatment scenarios. We evaluated whether DA strategies perform comparably to manually curated IS models using claims-based emulations of 15 randomized trials from the RCT-DUPLICATE initiative.</p><p><strong>Materials and methods: </strong>We identified new-user cohorts for 15 trial emulations in Optum's de-identified Clinformatics Data Mart Database (2004-2023). Treatment effects were estimated using 3 adjustment strategies: (1) IS models with manually tailored covariates; (2) full-DA strategies using empirical features from semiautomated pipelines; and (3) hybrid-DA models incorporating both empirical and investigator-defined covariates. Agreement with RCT benchmarks was assessed via binary metrics and difference-in-differences.</p><p><strong>Results: </strong>Outcome-adaptive LASSO achieved better RWE-RCT agreement than IS adjustment in 73% of full-DA and 87% of hybrid-DA emulations. Other DA methods considering feature associations with both treatment and outcome performed similarly well, while models tuned solely for treatment prediction performed poorly. Performance of IS vs DA strategies differed across emulated trials.</p><p><strong>Discussion: </strong>Top DA algorithms matched manual IS models on average, but impact varied by emulation. Case studies illustrate the continued importance of subject-matter knowledge, particularly for complex treatment strategies.</p><p><strong>Conclusion: </strong>Data-adaptive algorithms show promise for scalable confounding adjustment in large-scale evidence systems and as augmentation tools for investigator-specified designs. Hybrid strategies combining algorithmic methods with investigator expertise offer the most reliable approach for individual causal questions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"573-586"},"PeriodicalIF":4.6,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145670308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the American Medical Informatics Association
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1