Journal of the American Medical Informatics Association最新文献_第6页

A novel generative multi-task representation learning approach for predicting postoperative complications in cardiac surgery patients. 一种预测心脏手术患者术后并发症的新生成多任务表征学习方法。

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2024-12-28 DOI: 10.1093/jamia/ocae316

Junbo Shen, Bing Xue, Thomas Kannampallil, Chenyang Lu, Joanna Abraham

Objective: Early detection of surgical complications allows for timely therapy and proactive risk mitigation. Machine learning (ML) can be leveraged to identify and predict patient risks for postoperative complications. We developed and validated the effectiveness of predicting postoperative complications using a novel surgical Variational Autoencoder (surgVAE) that uncovers intrinsic patterns via cross-task and cross-cohort presentation learning.

Materials and methods: This retrospective cohort study used data from the electronic health records of adult surgical patients over 4 years (2018-2021). Six key postoperative complications for cardiac surgery were assessed: acute kidney injury, atrial fibrillation, cardiac arrest, deep vein thrombosis or pulmonary embolism, blood transfusion, and other intraoperative cardiac events. We compared surgVAE's prediction performance against widely-used ML models and advanced representation learning and generative models under 5-fold cross-validation.

Results: 89 246 surgeries (49% male, median [IQR] age: 57 [45-69]) were included, with 6502 in the targeted cardiac surgery cohort (61% male, median [IQR] age: 60 [53-70]). surgVAE demonstrated generally superior performance over existing ML solutions across postoperative complications of cardiac surgery patients, achieving macro-averaged AUPRC of 0.409 and macro-averaged AUROC of 0.831, which were 3.4% and 3.7% higher, respectively, than the best alternative method (by AUPRC scores). Model interpretation using Integrated Gradients highlighted key risk factors based on preoperative variable importance.

Discussion and conclusion: Our advanced representation learning framework surgVAE showed excellent discriminatory performance for predicting postoperative complications and addressing the challenges of data complexity, small cohort sizes, and low-frequency positive events. surgVAE enables data-driven predictions of patient risks and prognosis while enhancing the interpretability of patient risk profiles.

目的：早期发现手术并发症可以及时治疗和主动降低风险。机器学习（ML）可以用来识别和预测患者术后并发症的风险。我们开发并验证了使用新型手术变分自编码器（surgVAE）预测术后并发症的有效性，该编码器通过跨任务和跨队列演示学习揭示了内在模式。材料和方法：本回顾性队列研究使用了成人外科患者4年（2018-2021）的电子健康记录数据。评估心脏手术的六个关键术后并发症：急性肾损伤、心房颤动、心脏骤停、深静脉血栓形成或肺栓塞、输血和其他术中心脏事件。在5倍交叉验证下，我们将surgVAE的预测性能与广泛使用的ML模型、高级表示学习和生成模型进行了比较。结果：纳入89 246例手术（男性49%，中位[IQR]年龄：57岁[45-69]），目标心脏手术队列6502例（男性61%，中位[IQR]年龄：60岁[53-70]）。在心脏手术患者术后并发症方面，surgVAE总体上优于现有ML解决方案，宏观平均AUPRC为0.409，宏观平均AUROC为0.831，分别比最佳替代方法（AUPRC评分）高3.4%和3.7%。使用综合梯度的模型解释突出了基于术前变量重要性的关键风险因素。讨论和结论：我们的先进表征学习框架surgVAE在预测术后并发症和解决数据复杂性、小队列规模和低频积极事件的挑战方面表现出出色的歧视性表现。surgVAE能够对患者风险和预后进行数据驱动的预测，同时增强患者风险概况的可解释性。

{"title":"A novel generative multi-task representation learning approach for predicting postoperative complications in cardiac surgery patients.","authors":"Junbo Shen, Bing Xue, Thomas Kannampallil, Chenyang Lu, Joanna Abraham","doi":"10.1093/jamia/ocae316","DOIUrl":"https://doi.org/10.1093/jamia/ocae316","url":null,"abstract":"Objective: Early detection of surgical complications allows for timely therapy and proactive risk mitigation. Machine learning (ML) can be leveraged to identify and predict patient risks for postoperative complications. We developed and validated the effectiveness of predicting postoperative complications using a novel surgical Variational Autoencoder (surgVAE) that uncovers intrinsic patterns via cross-task and cross-cohort presentation learning.Materials and methods: This retrospective cohort study used data from the electronic health records of adult surgical patients over 4 years (2018-2021). Six key postoperative complications for cardiac surgery were assessed: acute kidney injury, atrial fibrillation, cardiac arrest, deep vein thrombosis or pulmonary embolism, blood transfusion, and other intraoperative cardiac events. We compared surgVAE's prediction performance against widely-used ML models and advanced representation learning and generative models under 5-fold cross-validation.Results: 89 246 surgeries (49% male, median [IQR] age: 57 [45-69]) were included, with 6502 in the targeted cardiac surgery cohort (61% male, median [IQR] age: 60 [53-70]). surgVAE demonstrated generally superior performance over existing ML solutions across postoperative complications of cardiac surgery patients, achieving macro-averaged AUPRC of 0.409 and macro-averaged AUROC of 0.831, which were 3.4% and 3.7% higher, respectively, than the best alternative method (by AUPRC scores). Model interpretation using Integrated Gradients highlighted key risk factors based on preoperative variable importance.Discussion and conclusion: Our advanced representation learning framework surgVAE showed excellent discriminatory performance for predicting postoperative complications and addressing the challenges of data complexity, small cohort sizes, and low-frequency positive events. surgVAE enables data-driven predictions of patient risks and prognosis while enhancing the interpretability of patient risk profiles.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142899994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing patient representation learning with inferred family pedigrees improves disease risk prediction. 通过推断家族谱系来增强患者表征学习，可以提高疾病风险预测。

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2024-12-26 DOI: 10.1093/jamia/ocae297

Xiayuan Huang, Jatin Arora, Abdullah Mesut Erzurumluoglu, Stephen A Stanhope, Daniel Lam, Hongyu Zhao, Zhihao Ding, Zuoheng Wang, Johann de Jong

Background: Machine learning and deep learning are powerful tools for analyzing electronic health records (EHRs) in healthcare research. Although family health history has been recognized as a major predictor for a wide spectrum of diseases, research has so far adopted a limited view of family relations, essentially treating patients as independent samples in the analysis.

Methods: To address this gap, we present ALIGATEHR, which models inferred family relations in a graph attention network augmented with an attention-based medical ontology representation, thus accounting for the complex influence of genetics, shared environmental exposures, and disease dependencies.

Results: Taking disease risk prediction as a use case, we demonstrate that explicitly modeling family relations significantly improves predictions across the disease spectrum. We then show how ALIGATEHR's attention mechanism, which links patients' disease risk to their relatives' clinical profiles, successfully captures genetic aspects of diseases using longitudinal EHR diagnosis data. Finally, we use ALIGATEHR to successfully distinguish the 2 main inflammatory bowel disease subtypes with highly shared risk factors and symptoms (Crohn's disease and ulcerative colitis).

Conclusion: Overall, our results highlight that family relations should not be overlooked in EHR research and illustrate ALIGATEHR's great potential for enhancing patient representation learning for predictive and interpretable modeling of EHRs.

背景：机器学习和深度学习是医疗保健研究中分析电子健康记录（EHRs）的强大工具。虽然家族健康史已被认为是广泛疾病的主要预测因素，但迄今为止的研究对家庭关系的看法有限，基本上将患者视为分析中的独立样本。方法：为了解决这一差距，我们提出了ALIGATEHR，它在一个基于注意的医学本体表示增强的图注意网络中建模推断家庭关系，从而考虑了遗传、共享环境暴露和疾病依赖的复杂影响。结果：以疾病风险预测为用例，我们证明了明确建模家庭关系显着提高了整个疾病谱的预测。然后，我们展示了ALIGATEHR的注意力机制，它将患者的疾病风险与其亲属的临床概况联系起来，如何成功地利用纵向电子病历诊断数据捕获疾病的遗传方面。最后，我们使用ALIGATEHR成功区分了两种主要的炎症性肠病亚型，它们具有高度共同的危险因素和症状（克罗恩病和溃疡性结肠炎）。结论：总的来说，我们的研究结果强调了家庭关系在电子病历研究中不应被忽视，并说明了ALIGATEHR在增强患者表征学习以实现电子病历预测和可解释建模方面的巨大潜力。

{"title":"Enhancing patient representation learning with inferred family pedigrees improves disease risk prediction.","authors":"Xiayuan Huang, Jatin Arora, Abdullah Mesut Erzurumluoglu, Stephen A Stanhope, Daniel Lam, Hongyu Zhao, Zhihao Ding, Zuoheng Wang, Johann de Jong","doi":"10.1093/jamia/ocae297","DOIUrl":"https://doi.org/10.1093/jamia/ocae297","url":null,"abstract":"Background: Machine learning and deep learning are powerful tools for analyzing electronic health records (EHRs) in healthcare research. Although family health history has been recognized as a major predictor for a wide spectrum of diseases, research has so far adopted a limited view of family relations, essentially treating patients as independent samples in the analysis.Methods: To address this gap, we present ALIGATEHR, which models inferred family relations in a graph attention network augmented with an attention-based medical ontology representation, thus accounting for the complex influence of genetics, shared environmental exposures, and disease dependencies.Results: Taking disease risk prediction as a use case, we demonstrate that explicitly modeling family relations significantly improves predictions across the disease spectrum. We then show how ALIGATEHR's attention mechanism, which links patients' disease risk to their relatives' clinical profiles, successfully captures genetic aspects of diseases using longitudinal EHR diagnosis data. Finally, we use ALIGATEHR to successfully distinguish the 2 main inflammatory bowel disease subtypes with highly shared risk factors and symptoms (Crohn's disease and ulcerative colitis).Conclusion: Overall, our results highlight that family relations should not be overlooked in EHR research and illustrate ALIGATEHR's great potential for enhancing patient representation learning for predictive and interpretable modeling of EHRs.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142900000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analysis of eligibility criteria clusters based on large language models for clinical trial design. 基于大型语言模型的临床试验设计合格标准聚类分析。

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2024-12-26 DOI: 10.1093/jamia/ocae311

Alban Bornet, Philipp Khlebnikov, Florian Meer, Quentin Haas, Anthony Yazdani, Boya Zhang, Poorya Amini, Douglas Teodoro

Objectives: Clinical trials (CTs) are essential for improving patient care by evaluating new treatments' safety and efficacy. A key component in CT protocols is the study population defined by the eligibility criteria. This study aims to evaluate the effectiveness of large language models (LLMs) in encoding eligibility criterion information to support CT-protocol design.

Materials and methods: We extracted eligibility criterion sections, phases, conditions, and interventions from CT protocols available in the ClinicalTrials.gov registry. Eligibility sections were split into individual rules using a criterion tokenizer and embedded using LLMs. The obtained representations were clustered. The quality and relevance of the clusters for protocol design was evaluated through 3 experiments: intrinsic alignment with protocol information and human expert cluster coherence assessment, extrinsic evaluation through CT-level classification tasks, and eligibility section generation.

Results: Sentence embeddings fine-tuned using biomedical corpora produce clusters with the highest alignment to CT-level information. Human expert evaluation confirms that clusters are well structured and coherent. Despite the high information compression, clusters retain significant CT information, up to 97% of the classification performance obtained with raw embeddings. Finally, eligibility sections automatically generated using clusters achieve 95% of the ROUGE scores obtained with a generative LLM prompted with CT-protocol details, suggesting that clusters encapsulate information useful to CT-protocol design.

Discussion: Clusters derived from sentence-level LLM embeddings effectively summarize complex eligibility criterion data while retaining relevant CT-protocol details. Clustering-based approaches provide a scalable enhancement in CT design that balances information compression with accuracy.

Conclusions: Clustering eligibility criteria using LLM embeddings provides a practical and efficient method to summarize critical protocol information. We provide an interactive visualization of the pipeline here.

目的：临床试验（ct）是通过评估新疗法的安全性和有效性来改善患者护理的必要条件。CT方案的一个关键组成部分是由资格标准定义的研究人群。本研究旨在评估大型语言模型（LLMs）在编码合格标准信息以支持ct协议设计方面的有效性。材料和方法：我们从ClinicalTrials.gov注册中心的CT方案中提取合格标准部分、阶段、条件和干预措施。资格部分使用标准标记器划分为单独的规则，并使用llm嵌入。对得到的表示进行聚类。通过3个实验来评估方案设计聚类的质量和相关性：与方案信息和人类专家聚类一致性评估的内在一致性，通过ct级分类任务进行的外在评估，以及资格截面生成。结果：使用生物医学语料库对句子嵌入进行微调，产生与ct级信息最高对齐的聚类。人类专家的评估证实，集群结构良好，连贯。尽管信息压缩程度很高，但聚类仍然保留了大量的CT信息，其分类性能达到原始嵌入的97%。最后，使用集群自动生成的合格性部分达到了95%的ROUGE分数，而ROUGE分数是由提示ct协议细节的生成式LLM获得的，这表明集群封装了对ct协议设计有用的信息。讨论：来自句子级LLM嵌入的聚类有效地总结了复杂的资格标准数据，同时保留了相关的ct协议细节。基于聚类的方法为CT设计提供了可扩展的增强，平衡了信息压缩和准确性。结论：使用LLM嵌入的聚类资格标准为总结关键协议信息提供了实用而有效的方法。我们在这里提供了管道的交互式可视化。

{"title":"Analysis of eligibility criteria clusters based on large language models for clinical trial design.","authors":"Alban Bornet, Philipp Khlebnikov, Florian Meer, Quentin Haas, Anthony Yazdani, Boya Zhang, Poorya Amini, Douglas Teodoro","doi":"10.1093/jamia/ocae311","DOIUrl":"https://doi.org/10.1093/jamia/ocae311","url":null,"abstract":"Objectives: Clinical trials (CTs) are essential for improving patient care by evaluating new treatments' safety and efficacy. A key component in CT protocols is the study population defined by the eligibility criteria. This study aims to evaluate the effectiveness of large language models (LLMs) in encoding eligibility criterion information to support CT-protocol design.Materials and methods: We extracted eligibility criterion sections, phases, conditions, and interventions from CT protocols available in the ClinicalTrials.gov registry. Eligibility sections were split into individual rules using a criterion tokenizer and embedded using LLMs. The obtained representations were clustered. The quality and relevance of the clusters for protocol design was evaluated through 3 experiments: intrinsic alignment with protocol information and human expert cluster coherence assessment, extrinsic evaluation through CT-level classification tasks, and eligibility section generation.Results: Sentence embeddings fine-tuned using biomedical corpora produce clusters with the highest alignment to CT-level information. Human expert evaluation confirms that clusters are well structured and coherent. Despite the high information compression, clusters retain significant CT information, up to 97% of the classification performance obtained with raw embeddings. Finally, eligibility sections automatically generated using clusters achieve 95% of the ROUGE scores obtained with a generative LLM prompted with CT-protocol details, suggesting that clusters encapsulate information useful to CT-protocol design.Discussion: Clusters derived from sentence-level LLM embeddings effectively summarize complex eligibility criterion data while retaining relevant CT-protocol details. Clustering-based approaches provide a scalable enhancement in CT design that balances information compression with accuracy.Conclusions: Clustering eligibility criteria using LLM embeddings provides a practical and efficient method to summarize critical protocol information. We provide an interactive visualization of the pipeline here.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142899996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Descriptive epidemiology demonstrating the All of Us database as a versatile resource for the rare and undiagnosed disease community. 描述流行病学证明All of Us数据库是罕见和未确诊疾病社区的通用资源。

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2024-12-23 DOI: 10.1093/jamia/ocae241

Drenen J Magee, Sierra Kicker, Aeisha Thomas

Objective: We aim to demonstrate the versatility of the All of Us database as an important source of rare and undiagnosed disease (RUD) data, because of its large size and range of data types.

Materials and methods: We searched the public data browser, electronic health record (EHR), and several surveys to investigate the prevalence, mental health, healthcare access, and other data of select RUDs.

Results: Several RUDs have participants in All of Us [eg, 75 of 100 rare infectious diseases (RIDs)]. We generated health-related data for undiagnosed, sickle cell disease (SCD), cystic fibrosis (CF), and infectious (2 diseases) and chronic (4 diseases) disease pools.

Conclusion: Our results highlight the potential value of All of Us with both data breadth and depth to help identify possible solutions for shared and disease-specific biomedical and other problems such as healthcare access, thus enhancing diagnosis, treatment, prevention, and support for the RUD community.

目的：我们的目标是展示All of Us数据库作为罕见和未确诊疾病（RUD）数据的重要来源的多功能性，因为它的数据规模大，数据类型范围广。材料和方法：我们检索了公共数据浏览器、电子健康记录（EHR）和几项调查，以调查所选RUDs的患病率、心理健康、医疗保健可及性和其他数据。结果：一些罕见传染病在All of Us中有参与者[例如，100种罕见传染病（rid）中有75种]。我们生成了未确诊的镰状细胞病（SCD）、囊性纤维化（CF）、感染性（2种疾病）和慢性（4种疾病）疾病池的健康相关数据。结论：我们的研究结果突出了All of Us的潜在价值，其数据广度和深度有助于确定共享和特定疾病的生物医学和其他问题（如医疗保健获取）的可能解决方案，从而加强对RUD社区的诊断、治疗、预防和支持。

引用次数: 0

An electronic health record metadata-mining approach to identifying patient-level interprofessional clinician teams in the intensive care unit. 电子健康记录元数据挖掘方法，用于识别重症监护室中患者级别的跨专业临床医生团队。

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2024-12-17 DOI: 10.1093/jamia/ocae275

Olga Yakusheva, Lara Khadr, Kathryn A Lee, Hannah C Ratliff, Deanna J Marriott, Deena Kelly Costa

Objectives: Advances in health informatics rapidly expanded use of big-data analytics and electronic health records (EHR) by clinical researchers seeking to optimize interprofessional ICU team care. This study developed and validated a program for extracting interprofessional teams assigned to each patient each shift from EHR event logs.

Materials and methods: A retrospective analysis of EHR event logs for mechanically-ventilated patients 18 and older from 5 ICUs in an academic medical center during 1/1/2018-12/31/2019. We defined interprofessional teams as all medical providers (physicians, physician assistants, and nurse practitioners), registered nurses, and respiratory therapists assigned to each patient each shift. We created an EHR event logs-mining program that extracts clinicians who interact with each patient's medical record each shift. The algorithm was validated using the Message Understanding Conference (MUC-6) method against manual chart review of a random sample of 200 patient-shifts from each ICU by two independent reviewers.

Results: Our sample included 4559 ICU encounters and 72 846 patient-shifts. Our program extracted 3288 medical providers, 2702 registered nurses, and 219 respiratory therapists linked to these encounters. Eighty-three percent of patient-shift teams included medical providers, 99.3% included registered nurses, and 74.1% included respiratory therapists; 63.4% of shift-level teams included clinicians from all three professions. The program demonstrated 95.9% precision, 96.2% recall, and high face validity.

Discussion: Our EHR event logs-mining program has high precision, recall, and validity for identifying patient-levelshift interprofessional teams in ICUs.

Conclusions: Algorithmic and artificial intelligence approaches have a strong potential for informing research to optimize patient team assignments and improve ICU care and outcomes.

目的：健康信息学的进步迅速扩大了临床研究人员对大数据分析和电子健康记录（EHR）的使用，以优化ICU团队的跨专业护理。本研究开发并验证了一个程序，用于从EHR事件日志中提取分配给每个患者每个班次的跨专业团队。材料与方法：回顾性分析2018年1月1日至2019年12月31日某学术医疗中心5个icu中18岁及以上机械通气患者的电子病历事件日志。我们将跨专业团队定义为每班分配给每位患者的所有医疗提供者（医生、医师助理和执业护士）、注册护士和呼吸治疗师。我们创建了一个EHR事件日志挖掘程序，该程序可以提取每班与每位患者医疗记录交互的临床医生。该算法使用消息理解会议（MUC-6）方法进行验证，并由两名独立审稿人对每个ICU的200个患者班次的随机样本进行手动图表审查。结果：我们的样本包括4559次ICU就诊和72 846次患者轮班。我们的程序提取了3288名医疗服务提供者、2702名注册护士和219名呼吸治疗师。83%的患者轮班团队包括医疗服务提供者，99.3%包括注册护士，74.1%包括呼吸治疗师；63.4%的轮班级别团队包括来自所有三个专业的临床医生。该程序具有95.9%的准确率、96.2%的召回率和较高的面孔效度。讨论：我们的EHR事件日志挖掘程序在识别icu患者级别的跨专业团队方面具有很高的精度、召回率和有效性。结论：算法和人工智能方法具有强大的潜力，可以为研究提供信息，优化患者团队分配，改善ICU护理和预后。

{"title":"An electronic health record metadata-mining approach to identifying patient-level interprofessional clinician teams in the intensive care unit.","authors":"Olga Yakusheva, Lara Khadr, Kathryn A Lee, Hannah C Ratliff, Deanna J Marriott, Deena Kelly Costa","doi":"10.1093/jamia/ocae275","DOIUrl":"https://doi.org/10.1093/jamia/ocae275","url":null,"abstract":"Objectives: Advances in health informatics rapidly expanded use of big-data analytics and electronic health records (EHR) by clinical researchers seeking to optimize interprofessional ICU team care. This study developed and validated a program for extracting interprofessional teams assigned to each patient each shift from EHR event logs.Materials and methods: A retrospective analysis of EHR event logs for mechanically-ventilated patients 18 and older from 5 ICUs in an academic medical center during 1/1/2018-12/31/2019. We defined interprofessional teams as all medical providers (physicians, physician assistants, and nurse practitioners), registered nurses, and respiratory therapists assigned to each patient each shift. We created an EHR event logs-mining program that extracts clinicians who interact with each patient's medical record each shift. The algorithm was validated using the Message Understanding Conference (MUC-6) method against manual chart review of a random sample of 200 patient-shifts from each ICU by two independent reviewers.Results: Our sample included 4559 ICU encounters and 72 846 patient-shifts. Our program extracted 3288 medical providers, 2702 registered nurses, and 219 respiratory therapists linked to these encounters. Eighty-three percent of patient-shift teams included medical providers, 99.3% included registered nurses, and 74.1% included respiratory therapists; 63.4% of shift-level teams included clinicians from all three professions. The program demonstrated 95.9% precision, 96.2% recall, and high face validity.Discussion: Our EHR event logs-mining program has high precision, recall, and validity for identifying patient-levelshift interprofessional teams in ICUs.Conclusions: Algorithmic and artificial intelligence approaches have a strong potential for informing research to optimize patient team assignments and improve ICU care and outcomes.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142839957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Variations in digital health literacy for pediatric caregivers of hospitalized children: implications for digital health equity. 住院儿童儿科护理人员数字健康素养的差异：对数字健康公平的影响

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2024-12-17 DOI: 10.1093/jamia/ocae305

Steven Crook, Glenn Rosenbluth, David V Glidden, Alicia Fernandez, Chuan-Mei Lee, Lizette Avina, Leslie Magana, Kiana Washington, Naomi S Bardach

Objectives: We sought to assess whether race, ethnicity, and preferred language were associated with digital health literacy in pediatric caregivers.

Materials and methods: We used linear regression to measure associations between 3 eHealth Literacy Questionnaire (eHLQ) domains (score range: 1-4) and demographic characteristics.

Results: Non-Latinx White respondents (n = 230) had highest adjusted mean eHLQ scores: 3.44 (95% confidence interval: 3.36-3.52) in "Ability to engage," 3.39 (3.31 to 3.47) in "Feel safe and in control," and 3.34 (3.25 to 3.41) in "Motivated." By contrast, Spanish-preferring Latinx respondents (n = 246) had lower adjusted mean scores across all 3 eHLQ domains: 2.97 (P < .0001), 3.21 (P = .004), and 3.19 (P = .033), respectively.

Discussion: Our study contributes insights in variations across ethnoracial and language preference groups by different eHLQ domains, with implications for addressing digital health inequities.

Conclusion: Digital health literacy was lower in Spanish-preferring Latinx pediatric caregivers compared to non-Latinx White caregivers across 3 eHLQ domains. It was lower than English-preferring Latinx caregivers in "Ability."

目的：我们试图评估种族、民族和首选语言是否与儿科护理人员的数字健康素养有关：我们试图评估种族、民族和首选语言是否与儿科护理人员的数字健康素养有关：我们使用线性回归法测量了电子健康素养问卷（eHLQ）的 3 个领域（得分范围：1-4）与人口统计学特征之间的关联：非拉丁裔白人受访者（n = 230）的调整后平均 eHLQ 分数最高：参与能力 "为 3.44（95% 置信区间：3.36-3.52）分，"安全感和控制力 "为 3.39（3.31-3.47）分，"积极性 "为 3.34（3.25-3.41）分。相比之下，偏好西班牙语的拉丁裔受访者（n = 246）在所有 3 个 eHLQ 领域的调整后平均得分较低：2.97（P 讨论）：我们的研究有助于深入了解不同种族和语言偏好群体在不同 eHLQ 领域的差异，对解决数字健康不平等问题具有重要意义：结论：与非拉丁裔白人儿科护理人员相比，偏好西班牙语的拉丁裔儿科护理人员在 3 个 eHLQ 领域的数字健康素养较低。在 "能力 "方面，拉丁裔护理人员的数字健康素养低于英语优先的护理人员。

{"title":"Variations in digital health literacy for pediatric caregivers of hospitalized children: implications for digital health equity.","authors":"Steven Crook, Glenn Rosenbluth, David V Glidden, Alicia Fernandez, Chuan-Mei Lee, Lizette Avina, Leslie Magana, Kiana Washington, Naomi S Bardach","doi":"10.1093/jamia/ocae305","DOIUrl":"https://doi.org/10.1093/jamia/ocae305","url":null,"abstract":"Objectives: We sought to assess whether race, ethnicity, and preferred language were associated with digital health literacy in pediatric caregivers.Materials and methods: We used linear regression to measure associations between 3 eHealth Literacy Questionnaire (eHLQ) domains (score range: 1-4) and demographic characteristics.Results: Non-Latinx White respondents (n = 230) had highest adjusted mean eHLQ scores: 3.44 (95% confidence interval: 3.36-3.52) in \"Ability to engage,\" 3.39 (3.31 to 3.47) in \"Feel safe and in control,\" and 3.34 (3.25 to 3.41) in \"Motivated.\" By contrast, Spanish-preferring Latinx respondents (n = 246) had lower adjusted mean scores across all 3 eHLQ domains: 2.97 (P < .0001), 3.21 (P = .004), and 3.19 (P = .033), respectively.Discussion: Our study contributes insights in variations across ethnoracial and language preference groups by different eHLQ domains, with implications for addressing digital health inequities.Conclusion: Digital health literacy was lower in Spanish-preferring Latinx pediatric caregivers compared to non-Latinx White caregivers across 3 eHLQ domains. It was lower than English-preferring Latinx caregivers in \"Ability.\"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142839964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Clinical implementation of preemptive pharmacogenomics testing for personalized medicine at an academic medical center. 在某学术医疗中心进行个体化药物抢先性药物基因组学检测的临床实施。

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2024-12-12 DOI: 10.1093/jamia/ocae293

Bani Tamraz, Jaekyu Shin, Raman Khanna, Jessica Van Ziffle, Susan Knowles, Susan Stregowski, Eunice Wan, Rajesh Kamath, Christopher Collins, Choeying Phunsur, Benjamin Tsai, Patsy Kong, Clari Calanoc, Aleta Pollard, Rajeev Sawhney, Jennifer Pleiman, Walter Patrick Devine, Rhiannon Croci, Aparna Sashikanth, Lisa Kroon, Russell Cucina, Aleks Rajkovic

Objective: This article describes the implementation of preemptive clinical pharmacogenomics (PGx) testing linked to an automated clinical decision support (CDS) system delivering actionable PGx information to clinicians at the point of care at UCSF Health, a large Academic Medical Center.

Methods: A multidisciplinary team developed the strategic vision for the PGx program. Drug-gene interactions of interest were compiled, and actionable alleles identified. A genotyping platform was selected and validated in-house. Following HIPAA protocols, genotype results were electronically transferred and stored in electronic health records (EHRs). CDS was developed and integrated with electronic prescribing.

Results: We developed a customized PGx program for 56 medications and 15 genes. Two hundred thirty-three pharmacogenomic prescribing alerts and 15 pharmacogenomic testing prompts, approved by clinicians, were built into EHR to deliver actionable clinical PGx information to clinicians.

Conclusions: Our multidisciplinary team successfully implemented preemptive PGx testing linked to point-of-care CDS to guide clinicians with precise medication decision-making.

目的：本文描述了先发制人的临床药物基因组学（PGx）测试的实施，该测试与自动临床决策支持（CDS）系统相关联，该系统向UCSF健康中心（一个大型学术医疗中心）的临床医生提供可操作的PGx信息。方法：一个多学科团队制定了PGx计划的战略愿景。编译感兴趣的药物基因相互作用，并确定可操作的等位基因。选择一个基因分型平台并进行内部验证。根据HIPAA协议，基因型结果以电子方式传输并存储在电子健康记录（EHRs）中。CDS被开发出来并与电子处方相结合。结果：我们开发了56种药物和15个基因的定制PGx程序。经临床医生批准的233个药物基因组学处方警报和15个药物基因组学测试提示被纳入电子病历，以便向临床医生提供可操作的临床PGx信息。结论：我们的多学科团队成功实施了与护理点CDS相关的预防性PGx检测，以指导临床医生进行精确的药物决策。

{"title":"Clinical implementation of preemptive pharmacogenomics testing for personalized medicine at an academic medical center.","authors":"Bani Tamraz, Jaekyu Shin, Raman Khanna, Jessica Van Ziffle, Susan Knowles, Susan Stregowski, Eunice Wan, Rajesh Kamath, Christopher Collins, Choeying Phunsur, Benjamin Tsai, Patsy Kong, Clari Calanoc, Aleta Pollard, Rajeev Sawhney, Jennifer Pleiman, Walter Patrick Devine, Rhiannon Croci, Aparna Sashikanth, Lisa Kroon, Russell Cucina, Aleks Rajkovic","doi":"10.1093/jamia/ocae293","DOIUrl":"https://doi.org/10.1093/jamia/ocae293","url":null,"abstract":"Objective: This article describes the implementation of preemptive clinical pharmacogenomics (PGx) testing linked to an automated clinical decision support (CDS) system delivering actionable PGx information to clinicians at the point of care at UCSF Health, a large Academic Medical Center.Methods: A multidisciplinary team developed the strategic vision for the PGx program. Drug-gene interactions of interest were compiled, and actionable alleles identified. A genotyping platform was selected and validated in-house. Following HIPAA protocols, genotype results were electronically transferred and stored in electronic health records (EHRs). CDS was developed and integrated with electronic prescribing.Results: We developed a customized PGx program for 56 medications and 15 genes. Two hundred thirty-three pharmacogenomic prescribing alerts and 15 pharmacogenomic testing prompts, approved by clinicians, were built into EHR to deliver actionable clinical PGx information to clinicians.Conclusions: Our multidisciplinary team successfully implemented preemptive PGx testing linked to point-of-care CDS to guide clinicians with precise medication decision-making.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142813473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using large language models to detect outcomes in qualitative studies of adolescent depression. 使用大型语言模型来检测青少年抑郁症定性研究的结果。

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2024-12-11 DOI: 10.1093/jamia/ocae298

Alison W Xin, Dylan M Nielson, Karolin Rose Krause, Guilherme Fiorini, Nick Midgley, Francisco Pereira, Juan Antonio Lossio-Ventura

Objective: We aim to use large language models (LLMs) to detect mentions of nuanced psychotherapeutic outcomes and impacts than previously considered in transcripts of interviews with adolescent depression. Our clinical authors previously created a novel coding framework containing fine-grained therapy outcomes beyond the binary classification (eg, depression vs control) based on qualitative analysis embedded within a clinical study of depression. Moreover, we seek to demonstrate that embeddings from LLMs are informative enough to accurately label these experiences.

Materials and methods: Data were drawn from interviews, where text segments were annotated with different outcome labels. Five different open-source LLMs were evaluated to classify outcomes from the coding framework. Classification experiments were carried out in the original interview transcripts. Furthermore, we repeated those experiments for versions of the data produced by breaking those segments into conversation turns, or keeping non-interviewer utterances (monologues).

Results: We used classification models to predict 31 outcomes and 8 derived labels, for 3 different text segmentations. Area under the ROC curve scores ranged between 0.6 and 0.9 for the original segmentation and 0.7 and 1.0 for the monologues and turns.

Discussion: LLM-based classification models could identify outcomes important to adolescents, such as friendships or academic and vocational functioning, in text transcripts of patient interviews. By using clinical data, we also aim to better generalize to clinical settings compared to studies based on public social media data.

Conclusion: Our results demonstrate that fine-grained therapy outcome coding in psychotherapeutic text is feasible, and can be used to support the quantification of important outcomes for downstream uses.

目的：我们的目标是使用大型语言模型（LLMs）来检测提及的细致入微的心理治疗结果和影响，而不是之前在青少年抑郁症访谈记录中考虑的。我们的临床作者之前创建了一个新的编码框架，其中包含了超越二元分类（例如，抑郁症与对照组）的细粒度治疗结果，该框架基于抑郁症临床研究中的定性分析。此外，我们试图证明法学硕士的嵌入信息足够准确地标记这些经验。材料和方法：数据来自访谈，其中文本片段用不同的结果标签进行注释。评估了五种不同的开源llm，以对编码框架的结果进行分类。对原始访谈笔录进行分类实验。此外，我们重复了这些实验，通过将这些片段分解为对话回合，或保留非采访者的话语（独白）来产生不同版本的数据。结果：我们使用分类模型预测了31个结果和8个衍生标签，用于3种不同的文本分割。原始分割的ROC曲线下面积得分在0.6到0.9之间，独白和回合得分在0.7到1.0之间。讨论：基于法学硕士的分类模型可以识别对青少年重要的结果，如友谊或学术和职业功能，在患者访谈的文本记录中。通过使用临床数据，与基于公共社交媒体数据的研究相比，我们还旨在更好地推广到临床环境。结论：我们的研究结果表明，在心理治疗文本中进行细粒度的治疗结果编码是可行的，并且可以用于支持下游用途的重要结果的量化。

{"title":"Using large language models to detect outcomes in qualitative studies of adolescent depression.","authors":"Alison W Xin, Dylan M Nielson, Karolin Rose Krause, Guilherme Fiorini, Nick Midgley, Francisco Pereira, Juan Antonio Lossio-Ventura","doi":"10.1093/jamia/ocae298","DOIUrl":"https://doi.org/10.1093/jamia/ocae298","url":null,"abstract":"Objective: We aim to use large language models (LLMs) to detect mentions of nuanced psychotherapeutic outcomes and impacts than previously considered in transcripts of interviews with adolescent depression. Our clinical authors previously created a novel coding framework containing fine-grained therapy outcomes beyond the binary classification (eg, depression vs control) based on qualitative analysis embedded within a clinical study of depression. Moreover, we seek to demonstrate that embeddings from LLMs are informative enough to accurately label these experiences.Materials and methods: Data were drawn from interviews, where text segments were annotated with different outcome labels. Five different open-source LLMs were evaluated to classify outcomes from the coding framework. Classification experiments were carried out in the original interview transcripts. Furthermore, we repeated those experiments for versions of the data produced by breaking those segments into conversation turns, or keeping non-interviewer utterances (monologues).Results: We used classification models to predict 31 outcomes and 8 derived labels, for 3 different text segmentations. Area under the ROC curve scores ranged between 0.6 and 0.9 for the original segmentation and 0.7 and 1.0 for the monologues and turns.Discussion: LLM-based classification models could identify outcomes important to adolescents, such as friendships or academic and vocational functioning, in text transcripts of patient interviews. By using clinical data, we also aim to better generalize to clinical settings compared to studies based on public social media data.Conclusion: Our results demonstrate that fine-grained therapy outcome coding in psychotherapeutic text is feasible, and can be used to support the quantification of important outcomes for downstream uses.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Empowering the biomedical research community: Innovative SAS deployment on the All of Us Researcher Workbench. 增强生物医学研究界的能力：在 "全民研究员工作台 "上创新部署 SAS。

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2024-12-01 DOI: 10.1093/jamia/ocae216

Izabelle Humes, Cathy Shyr, Moira Dillon, Zhongjie Liu, Jennifer Peterson, Chris St Jeor, Jacqueline Malkes, Hiral Master, Brandy Mapes, Romuladus Azuine, Nakia Mack, Bassent Abdelbary, Joyonna Gamble-George, Emily Goldmann, Stephanie Cook, Fatemeh Choupani, Rubin Baskir, Sydney McMaster, Chris Lunt, Karriem Watson, Minnkyong Lee, Sophie Schwartz, Ruchi Munshi, David Glazer, Eric Banks, Anthony Philippakis, Melissa Basford, Dan Roden, Paul A Harris

Objectives: The All of Us Research Program is a precision medicine initiative aimed at establishing a vast, diverse biomedical database accessible through a cloud-based data analysis platform, the Researcher Workbench (RW). Our goal was to empower the research community by co-designing the implementation of SAS in the RW alongside researchers to enable broader use of All of Us data.

Materials and methods: Researchers from various fields and with different SAS experience levels participated in co-designing the SAS implementation through user experience interviews.

Results: Feedback and lessons learned from user testing informed the final design of the SAS application.

Discussion: The co-design approach is critical for reducing technical barriers, broadening All of Us data use, and enhancing the user experience for data analysis on the RW.

Conclusion: Our co-design approach successfully tailored the implementation of the SAS application to researchers' needs. This approach may inform future software implementations on the RW.

目标：我们所有人研究计划是一项精准医学计划，旨在建立一个庞大、多样的生物医学数据库，可通过基于云的数据分析平台--研究者工作台（RW）进行访问。我们的目标是通过与研究人员共同设计 RW 中 SAS 的实施来增强研究社区的能力，从而更广泛地使用 All of Us 数据：来自不同领域、具有不同 SAS 经验水平的研究人员通过用户体验访谈参与了 SAS 实施的共同设计：结果：从用户测试中获得的反馈和经验教训为 SAS 应用程序的最终设计提供了依据：讨论：共同设计方法对于减少技术障碍、扩大 "我们所有人 "数据的使用范围以及增强用户在 RW 上进行数据分析的体验至关重要：我们的共同设计方法成功地使 SAS 应用程序的实施符合研究人员的需求。这种方法可为未来在 RW 上实施软件提供参考。

{"title":"Empowering the biomedical research community: Innovative SAS deployment on the All of Us Researcher Workbench.","authors":"Izabelle Humes, Cathy Shyr, Moira Dillon, Zhongjie Liu, Jennifer Peterson, Chris St Jeor, Jacqueline Malkes, Hiral Master, Brandy Mapes, Romuladus Azuine, Nakia Mack, Bassent Abdelbary, Joyonna Gamble-George, Emily Goldmann, Stephanie Cook, Fatemeh Choupani, Rubin Baskir, Sydney McMaster, Chris Lunt, Karriem Watson, Minnkyong Lee, Sophie Schwartz, Ruchi Munshi, David Glazer, Eric Banks, Anthony Philippakis, Melissa Basford, Dan Roden, Paul A Harris","doi":"10.1093/jamia/ocae216","DOIUrl":"10.1093/jamia/ocae216","url":null,"abstract":"Objectives: The All of Us Research Program is a precision medicine initiative aimed at establishing a vast, diverse biomedical database accessible through a cloud-based data analysis platform, the Researcher Workbench (RW). Our goal was to empower the research community by co-designing the implementation of SAS in the RW alongside researchers to enable broader use of All of Us data.Materials and methods: Researchers from various fields and with different SAS experience levels participated in co-designing the SAS implementation through user experience interviews.Results: Feedback and lessons learned from user testing informed the final design of the SAS application.Discussion: The co-design approach is critical for reducing technical barriers, broadening All of Us data use, and enhancing the user experience for data analysis on the RW.Conclusion: Our co-design approach successfully tailored the implementation of the SAS application to researchers' needs. This approach may inform future software implementations on the RW.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"2994-3000"},"PeriodicalIF":4.7,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631098/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141972205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-modality risk prediction of cardiovascular diseases for breast cancer cohort in the All of Us Research Program. 全民研究计划中乳腺癌队列的心血管疾病多模式风险预测。

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2024-12-01 DOI: 10.1093/jamia/ocae199

Han Yang, Sicheng Zhou, Zexi Rao, Chen Zhao, Erjia Cui, Chetan Shenoy, Anne H Blaes, Nishitha Paidimukkala, Jinhua Wang, Jue Hou, Rui Zhang

Objective: This study leverages the rich diversity of the All of Us Research Program (All of Us)'s dataset to devise a predictive model for cardiovascular disease (CVD) in breast cancer (BC) survivors. Central to this endeavor is the creation of a robust data integration pipeline that synthesizes electronic health records (EHRs), patient surveys, and genomic data, while upholding fairness across demographic variables.

Materials and methods: We have developed a universal data wrangling pipeline to process and merge heterogeneous data sources of the All of Us dataset, address missingness and variance in data, and align disparate data modalities into a coherent framework for analysis. Utilizing a composite feature set including EHR, lifestyle, and social determinants of health (SDoH) data, we then employed Adaptive Lasso and Random Forest regression models to predict 6 CVD outcomes. The models were evaluated using the c-index and time-dependent Area Under the Receiver Operating Characteristic Curve over a 10-year period.

Results: The Adaptive Lasso model showed consistent performance across most CVD outcomes, while the Random Forest model excelled particularly in predicting outcomes like transient ischemic attack when incorporating the full multi-model feature set. Feature importance analysis revealed age and previous coronary events as dominant predictors across CVD outcomes, with SDoH clustering labels highlighting the nuanced impact of social factors.

Discussion: The development of both Cox-based predictive model and Random Forest Regression model represents the extensive application of the All of Us, in integrating EHR and patient surveys to enhance precision medicine. And the inclusion of SDoH clustering labels revealed the significant impact of sociobehavioral factors on patient outcomes, emphasizing the importance of comprehensive health determinants in predictive models. Despite these advancements, limitations include the exclusion of genetic data, broad categorization of CVD conditions, and the need for fairness analyses to ensure equitable model performance across diverse populations. Future work should refine clinical and social variable measurements, incorporate advanced imputation techniques, and explore additional predictive algorithms to enhance model precision and fairness.

Conclusion: This study demonstrates the liability of the All of Us's diverse dataset in developing a multi-modality predictive model for CVD in BC survivors risk stratification in oncological survivorship. The data integration pipeline and subsequent predictive models establish a methodological foundation for future research into personalized healthcare.

研究目的本研究利用 "我们所有人研究计划"（All of Us）数据集的丰富多样性，设计出乳腺癌（BC）幸存者心血管疾病（CVD）的预测模型。这项工作的核心是创建一个强大的数据集成管道，该管道可综合电子健康记录（EHR）、患者调查和基因组数据，同时维护不同人口统计学变量之间的公平性：我们开发了一个通用数据处理管道，用于处理和合并 "我们所有人 "数据集的异构数据源，解决数据缺失和数据差异问题，并将不同的数据模式整合到一个连贯的分析框架中。利用包括电子病历、生活方式和健康的社会决定因素 (SDoH) 数据在内的复合特征集，我们采用自适应拉索和随机森林回归模型来预测 6 种心血管疾病的结果。在 10 年的时间里，我们使用 c 指数和随时间变化的接收者工作特征曲线下面积对模型进行了评估：结果：自适应套索模型在大多数心血管疾病结果中表现出一致的性能，而随机森林模型在预测短暂性脑缺血发作等结果时表现尤为突出，因为它结合了完整的多模型特征集。特征重要性分析表明，年龄和既往冠心病事件是预测心血管疾病结果的主要因素，而SDoH聚类标签则突出了社会因素的细微影响：基于 Cox 的预测模型和随机森林回归模型的开发代表了 "我们所有人 "在整合电子病历和患者调查以提高精准医疗方面的广泛应用。SDoH聚类标签的加入揭示了社会行为因素对患者预后的重大影响，强调了预测模型中综合健康决定因素的重要性。尽管取得了这些进步，但仍存在一些局限性，包括未纳入基因数据、心血管疾病分类过宽，以及需要进行公平性分析以确保模型在不同人群中的公平表现。未来的工作应完善临床和社会变量测量，采用先进的估算技术，并探索更多的预测算法，以提高模型的精确性和公平性：本研究证明了 "我们所有人 "的多样化数据集在开发多模式预测模型以预测不列颠哥伦比亚省幸存者心血管疾病方面的作用。数据整合管道和后续预测模型为未来个性化医疗保健研究奠定了方法论基础。

{"title":"Multi-modality risk prediction of cardiovascular diseases for breast cancer cohort in the All of Us Research Program.","authors":"Han Yang, Sicheng Zhou, Zexi Rao, Chen Zhao, Erjia Cui, Chetan Shenoy, Anne H Blaes, Nishitha Paidimukkala, Jinhua Wang, Jue Hou, Rui Zhang","doi":"10.1093/jamia/ocae199","DOIUrl":"10.1093/jamia/ocae199","url":null,"abstract":"Objective: This study leverages the rich diversity of the All of Us Research Program (All of Us)'s dataset to devise a predictive model for cardiovascular disease (CVD) in breast cancer (BC) survivors. Central to this endeavor is the creation of a robust data integration pipeline that synthesizes electronic health records (EHRs), patient surveys, and genomic data, while upholding fairness across demographic variables.Materials and methods: We have developed a universal data wrangling pipeline to process and merge heterogeneous data sources of the All of Us dataset, address missingness and variance in data, and align disparate data modalities into a coherent framework for analysis. Utilizing a composite feature set including EHR, lifestyle, and social determinants of health (SDoH) data, we then employed Adaptive Lasso and Random Forest regression models to predict 6 CVD outcomes. The models were evaluated using the c-index and time-dependent Area Under the Receiver Operating Characteristic Curve over a 10-year period.Results: The Adaptive Lasso model showed consistent performance across most CVD outcomes, while the Random Forest model excelled particularly in predicting outcomes like transient ischemic attack when incorporating the full multi-model feature set. Feature importance analysis revealed age and previous coronary events as dominant predictors across CVD outcomes, with SDoH clustering labels highlighting the nuanced impact of social factors.Discussion: The development of both Cox-based predictive model and Random Forest Regression model represents the extensive application of the All of Us, in integrating EHR and patient surveys to enhance precision medicine. And the inclusion of SDoH clustering labels revealed the significant impact of sociobehavioral factors on patient outcomes, emphasizing the importance of comprehensive health determinants in predictive models. Despite these advancements, limitations include the exclusion of genetic data, broad categorization of CVD conditions, and the need for fairness analyses to ensure equitable model performance across diverse populations. Future work should refine clinical and social variable measurements, incorporate advanced imputation techniques, and explore additional predictive algorithms to enhance model precision and fairness.Conclusion: This study demonstrates the liability of the All of Us's diverse dataset in developing a multi-modality predictive model for CVD in BC survivors risk stratification in oncological survivorship. The data integration pipeline and subsequent predictive models establish a methodological foundation for future research into personalized healthcare.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"2800-2810"},"PeriodicalIF":4.7,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631116/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141767875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0