{"title":"Correction to: Are medical history data fit for risk stratification of patients with chest pain in emergency care? Comparing data collected from patients using computerized history taking with data documented by physicians in the electronic health record in the CLEOS-CPDS prospective cohort study.","authors":"","doi":"10.1093/jamia/ocae252","DOIUrl":"10.1093/jamia/ocae252","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"261-263"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648703/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mary S Kim, Beomseok Park, Genevieve J Sippel, Aaron H Mun, Wanzhao Yang, Kathleen H McCarthy, Emely Fernandez, Marius George Linguraru, Aleksandra Sarcevic, Ivan Marsic, Randall S Burd
Objectives: Human monitoring of personal protective equipment (PPE) adherence among healthcare providers has several limitations, including the need for additional personnel during staff shortages and decreased vigilance during prolonged tasks. To address these challenges, we developed an automated computer vision system for monitoring PPE adherence in healthcare settings. We assessed the system performance against human observers detecting nonadherence in a video surveillance experiment.
Materials and methods: The automated system was trained to detect 15 classes of eyewear, masks, gloves, and gowns using an object detector and tracker. To assess how the system performs compared to human observers in detecting nonadherence, we designed a video surveillance experiment under 2 conditions: variations in video durations (20, 40, and 60 seconds) and the number of individuals in the videos (3 versus 6). Twelve nurses participated as human observers. Performance was assessed based on the number of detections of nonadherence.
Results: Human observers detected fewer instances of nonadherence than the system (parameter estimate -0.3, 95% CI -0.4 to -0.2, P < .001). Human observers detected more nonadherence during longer video durations (parameter estimate 0.7, 95% CI 0.4-1.0, P < .001). The system achieved a sensitivity of 0.86, specificity of 1, and Matthew's correlation coefficient of 0.82 for detecting PPE nonadherence.
Discussion: An automated system simultaneously tracks multiple objects and individuals. The system performance is also independent of observation duration, an improvement over human monitoring.
Conclusion: The automated system presents a potential solution for scalable monitoring of hospital-wide infection control practices and improving PPE usage in healthcare settings.
{"title":"Comparative analysis of personal protective equipment nonadherence detection: computer vision versus human observers.","authors":"Mary S Kim, Beomseok Park, Genevieve J Sippel, Aaron H Mun, Wanzhao Yang, Kathleen H McCarthy, Emely Fernandez, Marius George Linguraru, Aleksandra Sarcevic, Ivan Marsic, Randall S Burd","doi":"10.1093/jamia/ocae262","DOIUrl":"10.1093/jamia/ocae262","url":null,"abstract":"<p><strong>Objectives: </strong>Human monitoring of personal protective equipment (PPE) adherence among healthcare providers has several limitations, including the need for additional personnel during staff shortages and decreased vigilance during prolonged tasks. To address these challenges, we developed an automated computer vision system for monitoring PPE adherence in healthcare settings. We assessed the system performance against human observers detecting nonadherence in a video surveillance experiment.</p><p><strong>Materials and methods: </strong>The automated system was trained to detect 15 classes of eyewear, masks, gloves, and gowns using an object detector and tracker. To assess how the system performs compared to human observers in detecting nonadherence, we designed a video surveillance experiment under 2 conditions: variations in video durations (20, 40, and 60 seconds) and the number of individuals in the videos (3 versus 6). Twelve nurses participated as human observers. Performance was assessed based on the number of detections of nonadherence.</p><p><strong>Results: </strong>Human observers detected fewer instances of nonadherence than the system (parameter estimate -0.3, 95% CI -0.4 to -0.2, P < .001). Human observers detected more nonadherence during longer video durations (parameter estimate 0.7, 95% CI 0.4-1.0, P < .001). The system achieved a sensitivity of 0.86, specificity of 1, and Matthew's correlation coefficient of 0.82 for detecting PPE nonadherence.</p><p><strong>Discussion: </strong>An automated system simultaneously tracks multiple objects and individuals. The system performance is also independent of observation duration, an improvement over human monitoring.</p><p><strong>Conclusion: </strong>The automated system presents a potential solution for scalable monitoring of hospital-wide infection control practices and improving PPE usage in healthcare settings.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"163-171"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648733/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiffani J Bright, Oliver J Bear Don't Walk Iv, Carl Erwin Johnson, Carolyn Petersen, Patricia C Dykes, Krista G Martin, Kevin B Johnson, Lois Walters-Threat, Catherine K Craven, Robert J Lucero, Gretchen P Jackson, Rubina F Rizvi
Objective: The American Medical Informatics Association (AMIA) Task Force on Diversity, Equity, and Inclusion (DEI) was established to address systemic racism and health disparities in biomedical and health informatics, aligning with AMIA's mission to transform healthcare. AMIA's DEI initiatives were spurred by member voices responding to police brutality and COVID-19's impact on Black/African American communities.
Materials and methods: The Task Force, consisting of 20 members across 3 groups aligned with AMIA's 2020-2025 Strategic Plan, met biweekly to develop DEI recommendations with the help of 16 additional volunteers. These recommendations were reviewed, prioritized, and presented to the AMIA Board of Directors for approval.
Results: In 9 months, the Task Force (1) created a logic model to support workforce diversity and raise AMIA's DEI awareness, (2) conducted an environmental scan of other associations' DEI activities, (3) developed a DEI framework for AMIA meetings, (4) gathered member feedback, (5) cultivated DEI educational resources, (6) created a Board nominations and diversity session, (7) reviewed the Board's Strategic Planning for DEI alignment, (8) led a program to increase diversity at the 2020 AMIA Virtual Annual Symposium, and (9) standardized socially-assigned race and ethnicity data collection.
Discussion: The Task Force proposed actionable recommendations that focused on AMIA's role in addressing systemic racism and health equity, helping the organization understand its member diversity.
Conclusion: This work supported marginalized groups, broadened the research agenda, and positioned AMIA as a DEI leader while reinforcing the need for ongoing transformation within informatics.
目标:美国医学信息学协会(American Medical Informatics Association,AMIA)多样性、公平性和包容性(Diversity, Equity, and Inclusion,DEI)工作组的成立旨在解决生物医学和健康信息学中的系统性种族主义和健康差异问题,这与 AMIA 改变医疗保健的使命相一致。AMIA的 "多样性与包容性"(DEI)倡议是由成员对警察暴力和COVID-19对黑人/非裔美国人社区的影响所发出的呼声推动的:工作组由 20 名成员组成,涉及 3 个与 AMIA 2020-2025 年战略计划相一致的小组,每两周召开一次会议,在另外 16 名志愿者的帮助下制定 DEI 建议。这些建议经过审核、排定优先次序后,提交给 AMIA 董事会批准:在 9 个月的时间里,特别工作组(1)创建了一个逻辑模型,以支持劳动力多样性并提高 AMIA 的 DEI 意识;(2)对其他协会的 DEI 活动进行了环境扫描;(3)为 AMIA 会议制定了 DEI 框架;(4)收集了会员反馈意见;(5)开发了 DEI 教育资源、(6) 创建了董事会提名和多样性会议,(7) 审查了董事会的战略规划,使其与 DEI 保持一致,(8) 在 2020 年 AMIA 虚拟年度研讨会上领导了一项提高多样性的计划,(9) 将社会分配的种族和民族数据收集标准化。讨论:工作组提出了可操作的建议,重点关注 AMIA 在解决系统性种族主义和健康公平方面的作用,帮助该组织了解其成员的多样性:这项工作为边缘化群体提供了支持,拓宽了研究议程,并将 AMIA 定位为 DEI 领导者,同时加强了信息学内部持续转型的必要性。
{"title":"The journey to building a diverse, equitable, and inclusive American Medical Informatics Association.","authors":"Tiffani J Bright, Oliver J Bear Don't Walk Iv, Carl Erwin Johnson, Carolyn Petersen, Patricia C Dykes, Krista G Martin, Kevin B Johnson, Lois Walters-Threat, Catherine K Craven, Robert J Lucero, Gretchen P Jackson, Rubina F Rizvi","doi":"10.1093/jamia/ocae258","DOIUrl":"10.1093/jamia/ocae258","url":null,"abstract":"<p><strong>Objective: </strong>The American Medical Informatics Association (AMIA) Task Force on Diversity, Equity, and Inclusion (DEI) was established to address systemic racism and health disparities in biomedical and health informatics, aligning with AMIA's mission to transform healthcare. AMIA's DEI initiatives were spurred by member voices responding to police brutality and COVID-19's impact on Black/African American communities.</p><p><strong>Materials and methods: </strong>The Task Force, consisting of 20 members across 3 groups aligned with AMIA's 2020-2025 Strategic Plan, met biweekly to develop DEI recommendations with the help of 16 additional volunteers. These recommendations were reviewed, prioritized, and presented to the AMIA Board of Directors for approval.</p><p><strong>Results: </strong>In 9 months, the Task Force (1) created a logic model to support workforce diversity and raise AMIA's DEI awareness, (2) conducted an environmental scan of other associations' DEI activities, (3) developed a DEI framework for AMIA meetings, (4) gathered member feedback, (5) cultivated DEI educational resources, (6) created a Board nominations and diversity session, (7) reviewed the Board's Strategic Planning for DEI alignment, (8) led a program to increase diversity at the 2020 AMIA Virtual Annual Symposium, and (9) standardized socially-assigned race and ethnicity data collection.</p><p><strong>Discussion: </strong>The Task Force proposed actionable recommendations that focused on AMIA's role in addressing systemic racism and health equity, helping the organization understand its member diversity.</p><p><strong>Conclusion: </strong>This work supported marginalized groups, broadened the research agenda, and positioned AMIA as a DEI leader while reinforcing the need for ongoing transformation within informatics.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"3-8"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648708/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chelsea Richwine, Vaishali Patel, Jordan Everson, Bradley Iott
Objectives: To understand how health-related social needs (HRSN) data are collected at US hospitals and implications for use.
Materials and methods: Using 2023 nationally representative survey data on US hospitals (N = 2775), we described hospitals' routine and structured collection and use of HRSN data and examined the relationship between methods of data collection and specific uses. Multivariate logistic regression was used to identify characteristics associated with data collection and use and understand how methods of data collection relate to use.
Results: In 2023, 88% of hospitals collected HRSN data (64% routinely, 72% structured). While hospitals commonly used data for internal purposes (eg, discharge planning, 79%), those that collected data routinely and in a structured format (58%) used data for purposes involving coordination or exchange with other organizations (eg, making referrals, 74%) at higher rates than hospitals that collected data but not routinely or in a non-structured format (eg, 93% vs 67% for referrals, P< .05). In multivariate regression, routine and structured data collection was positively associated with all uses of data examined. Hospital location, ownership, system-affiliation, value-based care participation, and critical access designation were associated with HRSN data collection, but only system-affiliation was consistently (positively) associated with use.
Discussion: While most hospitals screen for social needs, fewer collect data routinely and in a structured format that would facilitate downstream use. Routine and structured data collection was associated with greater use, particularly for secondary purposes.
Conclusion: Routine and structured screening may result in more actionable data that facilitates use for various purposes that support patient care and improve community and population health, indicating the importance of continuing efforts to increase routine screening and standardize HRSN data collection.
{"title":"The role of routine and structured social needs data collection in improving care in US hospitals.","authors":"Chelsea Richwine, Vaishali Patel, Jordan Everson, Bradley Iott","doi":"10.1093/jamia/ocae279","DOIUrl":"10.1093/jamia/ocae279","url":null,"abstract":"<p><strong>Objectives: </strong>To understand how health-related social needs (HRSN) data are collected at US hospitals and implications for use.</p><p><strong>Materials and methods: </strong>Using 2023 nationally representative survey data on US hospitals (N = 2775), we described hospitals' routine and structured collection and use of HRSN data and examined the relationship between methods of data collection and specific uses. Multivariate logistic regression was used to identify characteristics associated with data collection and use and understand how methods of data collection relate to use.</p><p><strong>Results: </strong>In 2023, 88% of hospitals collected HRSN data (64% routinely, 72% structured). While hospitals commonly used data for internal purposes (eg, discharge planning, 79%), those that collected data routinely and in a structured format (58%) used data for purposes involving coordination or exchange with other organizations (eg, making referrals, 74%) at higher rates than hospitals that collected data but not routinely or in a non-structured format (eg, 93% vs 67% for referrals, P< .05). In multivariate regression, routine and structured data collection was positively associated with all uses of data examined. Hospital location, ownership, system-affiliation, value-based care participation, and critical access designation were associated with HRSN data collection, but only system-affiliation was consistently (positively) associated with use.</p><p><strong>Discussion: </strong>While most hospitals screen for social needs, fewer collect data routinely and in a structured format that would facilitate downstream use. Routine and structured data collection was associated with greater use, particularly for secondary purposes.</p><p><strong>Conclusion: </strong>Routine and structured screening may result in more actionable data that facilitates use for various purposes that support patient care and improve community and population health, indicating the importance of continuing efforts to increase routine screening and standardize HRSN data collection.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"28-37"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648711/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: Artificial intelligence for optimizing recruitment and retention in clinical trials: a scoping review.","authors":"","doi":"10.1093/jamia/ocae283","DOIUrl":"10.1093/jamia/ocae283","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"260"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648702/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142583537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Is ChatGPT worthy enough for provisioning clinical decision support?","authors":"Partha Pratim Ray","doi":"10.1093/jamia/ocae282","DOIUrl":"10.1093/jamia/ocae282","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"258-259"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648701/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142583648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objectives: This study aims to (1) review machine learning (ML)-based models for early infection diagnostic and prognosis prediction in post-acute care (PAC) settings, (2) identify key risk predictors influencing infection-related outcomes, and (3) examine the quality and limitations of these models.
Materials and methods: PubMed, Web of Science, Scopus, IEEE Xplore, CINAHL, and ACM digital library were searched in February 2024. Eligible studies leveraged PAC data to develop and evaluate ML models for infection-related risks. Data extraction followed the CHARMS checklist. Quality appraisal followed the PROBAST tool. Data synthesis was guided by the socio-ecological conceptual framework.
Results: Thirteen studies were included, mainly focusing on respiratory infections and nursing homes. Most used regression models with structured electronic health record data. Since 2020, there has been a shift toward advanced ML algorithms and multimodal data, biosensors, and clinical notes being significant sources of unstructured data. Despite these advances, there is insufficient evidence to support performance improvements over traditional models. Individual-level risk predictors, like impaired cognition, declined function, and tachycardia, were commonly used, while contextual-level predictors were barely utilized, consequently limiting model fairness. Major sources of bias included lack of external validation, inadequate model calibration, and insufficient consideration of data complexity.
Discussion and conclusion: Despite the growth of advanced modeling approaches in infection-related models in PAC settings, evidence supporting their superiority remains limited. Future research should leverage a socio-ecological lens for predictor selection and model construction, exploring optimal data modalities and ML model usage in PAC, while ensuring rigorous methodologies and fairness considerations.
研究目的本研究旨在:(1) 综述基于机器学习(ML)的急性期后护理(PAC)环境中早期感染诊断和预后预测模型;(2) 确定影响感染相关结果的关键风险预测因素;(3) 检验这些模型的质量和局限性:于 2024 年 2 月检索了 PubMed、Web of Science、Scopus、IEEE Xplore、CINAHL 和 ACM 数字图书馆。符合条件的研究利用 PAC 数据开发并评估了感染相关风险的 ML 模型。数据提取遵循 CHARMS 核对表。质量评估采用 PROBAST 工具。数据综合以社会生态概念框架为指导:共纳入 13 项研究,主要集中在呼吸道感染和疗养院。大多数研究使用了结构化电子健康记录数据回归模型。自 2020 年以来,先进的 ML 算法、多模态数据、生物传感器和临床笔记已成为非结构化数据的重要来源。尽管取得了这些进展,但仍没有足够的证据支持其性能比传统模型有所提高。个体层面的风险预测因素,如认知能力受损、功能下降和心动过速等,被普遍使用,而情境层面的预测因素几乎未被使用,从而限制了模型的公平性。偏差的主要来源包括缺乏外部验证、模型校准不足以及对数据复杂性考虑不足:尽管先进的建模方法在 PAC 环境中的感染相关模型中得到了发展,但支持其优越性的证据仍然有限。未来的研究应利用社会生态学的视角来选择预测因子和构建模型,探索 PAC 中的最佳数据模式和 ML 模型用法,同时确保采用严格的方法并考虑公平性。
{"title":"Machine learning-based infection diagnostic and prognostic models in post-acute care settings: a systematic review.","authors":"Zidu Xu, Danielle Scharp, Mollie Hobensack, Jiancheng Ye, Jungang Zou, Sirui Ding, Jingjing Shang, Maxim Topaz","doi":"10.1093/jamia/ocae278","DOIUrl":"10.1093/jamia/ocae278","url":null,"abstract":"<p><strong>Objectives: </strong>This study aims to (1) review machine learning (ML)-based models for early infection diagnostic and prognosis prediction in post-acute care (PAC) settings, (2) identify key risk predictors influencing infection-related outcomes, and (3) examine the quality and limitations of these models.</p><p><strong>Materials and methods: </strong>PubMed, Web of Science, Scopus, IEEE Xplore, CINAHL, and ACM digital library were searched in February 2024. Eligible studies leveraged PAC data to develop and evaluate ML models for infection-related risks. Data extraction followed the CHARMS checklist. Quality appraisal followed the PROBAST tool. Data synthesis was guided by the socio-ecological conceptual framework.</p><p><strong>Results: </strong>Thirteen studies were included, mainly focusing on respiratory infections and nursing homes. Most used regression models with structured electronic health record data. Since 2020, there has been a shift toward advanced ML algorithms and multimodal data, biosensors, and clinical notes being significant sources of unstructured data. Despite these advances, there is insufficient evidence to support performance improvements over traditional models. Individual-level risk predictors, like impaired cognition, declined function, and tachycardia, were commonly used, while contextual-level predictors were barely utilized, consequently limiting model fairness. Major sources of bias included lack of external validation, inadequate model calibration, and insufficient consideration of data complexity.</p><p><strong>Discussion and conclusion: </strong>Despite the growth of advanced modeling approaches in infection-related models in PAC settings, evidence supporting their superiority remains limited. Future research should leverage a socio-ecological lens for predictor selection and model construction, exploring optimal data modalities and ML model usage in PAC, while ensuring rigorous methodologies and fairness considerations.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"241-252"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648729/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alison W Xin, Dylan M Nielson, Karolin Rose Krause, Guilherme Fiorini, Nick Midgley, Francisco Pereira, Juan Antonio Lossio-Ventura
Objective: We aim to use large language models (LLMs) to detect mentions of nuanced psychotherapeutic outcomes and impacts than previously considered in transcripts of interviews with adolescent depression. Our clinical authors previously created a novel coding framework containing fine-grained therapy outcomes beyond the binary classification (eg, depression vs control) based on qualitative analysis embedded within a clinical study of depression. Moreover, we seek to demonstrate that embeddings from LLMs are informative enough to accurately label these experiences.
Materials and methods: Data were drawn from interviews, where text segments were annotated with different outcome labels. Five different open-source LLMs were evaluated to classify outcomes from the coding framework. Classification experiments were carried out in the original interview transcripts. Furthermore, we repeated those experiments for versions of the data produced by breaking those segments into conversation turns, or keeping non-interviewer utterances (monologues).
Results: We used classification models to predict 31 outcomes and 8 derived labels, for 3 different text segmentations. Area under the ROC curve scores ranged between 0.6 and 0.9 for the original segmentation and 0.7 and 1.0 for the monologues and turns.
Discussion: LLM-based classification models could identify outcomes important to adolescents, such as friendships or academic and vocational functioning, in text transcripts of patient interviews. By using clinical data, we also aim to better generalize to clinical settings compared to studies based on public social media data.
Conclusion: Our results demonstrate that fine-grained therapy outcome coding in psychotherapeutic text is feasible, and can be used to support the quantification of important outcomes for downstream uses.
{"title":"Using large language models to detect outcomes in qualitative studies of adolescent depression.","authors":"Alison W Xin, Dylan M Nielson, Karolin Rose Krause, Guilherme Fiorini, Nick Midgley, Francisco Pereira, Juan Antonio Lossio-Ventura","doi":"10.1093/jamia/ocae298","DOIUrl":"https://doi.org/10.1093/jamia/ocae298","url":null,"abstract":"<p><strong>Objective: </strong>We aim to use large language models (LLMs) to detect mentions of nuanced psychotherapeutic outcomes and impacts than previously considered in transcripts of interviews with adolescent depression. Our clinical authors previously created a novel coding framework containing fine-grained therapy outcomes beyond the binary classification (eg, depression vs control) based on qualitative analysis embedded within a clinical study of depression. Moreover, we seek to demonstrate that embeddings from LLMs are informative enough to accurately label these experiences.</p><p><strong>Materials and methods: </strong>Data were drawn from interviews, where text segments were annotated with different outcome labels. Five different open-source LLMs were evaluated to classify outcomes from the coding framework. Classification experiments were carried out in the original interview transcripts. Furthermore, we repeated those experiments for versions of the data produced by breaking those segments into conversation turns, or keeping non-interviewer utterances (monologues).</p><p><strong>Results: </strong>We used classification models to predict 31 outcomes and 8 derived labels, for 3 different text segmentations. Area under the ROC curve scores ranged between 0.6 and 0.9 for the original segmentation and 0.7 and 1.0 for the monologues and turns.</p><p><strong>Discussion: </strong>LLM-based classification models could identify outcomes important to adolescents, such as friendships or academic and vocational functioning, in text transcripts of patient interviews. By using clinical data, we also aim to better generalize to clinical settings compared to studies based on public social media data.</p><p><strong>Conclusion: </strong>Our results demonstrate that fine-grained therapy outcome coding in psychotherapeutic text is feasible, and can be used to support the quantification of important outcomes for downstream uses.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Izabelle Humes, Cathy Shyr, Moira Dillon, Zhongjie Liu, Jennifer Peterson, Chris St Jeor, Jacqueline Malkes, Hiral Master, Brandy Mapes, Romuladus Azuine, Nakia Mack, Bassent Abdelbary, Joyonna Gamble-George, Emily Goldmann, Stephanie Cook, Fatemeh Choupani, Rubin Baskir, Sydney McMaster, Chris Lunt, Karriem Watson, Minnkyong Lee, Sophie Schwartz, Ruchi Munshi, David Glazer, Eric Banks, Anthony Philippakis, Melissa Basford, Dan Roden, Paul A Harris
Objectives: The All of Us Research Program is a precision medicine initiative aimed at establishing a vast, diverse biomedical database accessible through a cloud-based data analysis platform, the Researcher Workbench (RW). Our goal was to empower the research community by co-designing the implementation of SAS in the RW alongside researchers to enable broader use of All of Us data.
Materials and methods: Researchers from various fields and with different SAS experience levels participated in co-designing the SAS implementation through user experience interviews.
Results: Feedback and lessons learned from user testing informed the final design of the SAS application.
Discussion: The co-design approach is critical for reducing technical barriers, broadening All of Us data use, and enhancing the user experience for data analysis on the RW.
Conclusion: Our co-design approach successfully tailored the implementation of the SAS application to researchers' needs. This approach may inform future software implementations on the RW.
目标:我们所有人研究计划是一项精准医学计划,旨在建立一个庞大、多样的生物医学数据库,可通过基于云的数据分析平台--研究者工作台(RW)进行访问。我们的目标是通过与研究人员共同设计 RW 中 SAS 的实施来增强研究社区的能力,从而更广泛地使用 All of Us 数据:来自不同领域、具有不同 SAS 经验水平的研究人员通过用户体验访谈参与了 SAS 实施的共同设计:结果:从用户测试中获得的反馈和经验教训为 SAS 应用程序的最终设计提供了依据:讨论:共同设计方法对于减少技术障碍、扩大 "我们所有人 "数据的使用范围以及增强用户在 RW 上进行数据分析的体验至关重要:我们的共同设计方法成功地使 SAS 应用程序的实施符合研究人员的需求。这种方法可为未来在 RW 上实施软件提供参考。
{"title":"Empowering the biomedical research community: Innovative SAS deployment on the All of Us Researcher Workbench.","authors":"Izabelle Humes, Cathy Shyr, Moira Dillon, Zhongjie Liu, Jennifer Peterson, Chris St Jeor, Jacqueline Malkes, Hiral Master, Brandy Mapes, Romuladus Azuine, Nakia Mack, Bassent Abdelbary, Joyonna Gamble-George, Emily Goldmann, Stephanie Cook, Fatemeh Choupani, Rubin Baskir, Sydney McMaster, Chris Lunt, Karriem Watson, Minnkyong Lee, Sophie Schwartz, Ruchi Munshi, David Glazer, Eric Banks, Anthony Philippakis, Melissa Basford, Dan Roden, Paul A Harris","doi":"10.1093/jamia/ocae216","DOIUrl":"10.1093/jamia/ocae216","url":null,"abstract":"<p><strong>Objectives: </strong>The All of Us Research Program is a precision medicine initiative aimed at establishing a vast, diverse biomedical database accessible through a cloud-based data analysis platform, the Researcher Workbench (RW). Our goal was to empower the research community by co-designing the implementation of SAS in the RW alongside researchers to enable broader use of All of Us data.</p><p><strong>Materials and methods: </strong>Researchers from various fields and with different SAS experience levels participated in co-designing the SAS implementation through user experience interviews.</p><p><strong>Results: </strong>Feedback and lessons learned from user testing informed the final design of the SAS application.</p><p><strong>Discussion: </strong>The co-design approach is critical for reducing technical barriers, broadening All of Us data use, and enhancing the user experience for data analysis on the RW.</p><p><strong>Conclusion: </strong>Our co-design approach successfully tailored the implementation of the SAS application to researchers' needs. This approach may inform future software implementations on the RW.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"2994-3000"},"PeriodicalIF":4.7,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631098/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141972205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han Yang, Sicheng Zhou, Zexi Rao, Chen Zhao, Erjia Cui, Chetan Shenoy, Anne H Blaes, Nishitha Paidimukkala, Jinhua Wang, Jue Hou, Rui Zhang
Objective: This study leverages the rich diversity of the All of Us Research Program (All of Us)'s dataset to devise a predictive model for cardiovascular disease (CVD) in breast cancer (BC) survivors. Central to this endeavor is the creation of a robust data integration pipeline that synthesizes electronic health records (EHRs), patient surveys, and genomic data, while upholding fairness across demographic variables.
Materials and methods: We have developed a universal data wrangling pipeline to process and merge heterogeneous data sources of the All of Us dataset, address missingness and variance in data, and align disparate data modalities into a coherent framework for analysis. Utilizing a composite feature set including EHR, lifestyle, and social determinants of health (SDoH) data, we then employed Adaptive Lasso and Random Forest regression models to predict 6 CVD outcomes. The models were evaluated using the c-index and time-dependent Area Under the Receiver Operating Characteristic Curve over a 10-year period.
Results: The Adaptive Lasso model showed consistent performance across most CVD outcomes, while the Random Forest model excelled particularly in predicting outcomes like transient ischemic attack when incorporating the full multi-model feature set. Feature importance analysis revealed age and previous coronary events as dominant predictors across CVD outcomes, with SDoH clustering labels highlighting the nuanced impact of social factors.
Discussion: The development of both Cox-based predictive model and Random Forest Regression model represents the extensive application of the All of Us, in integrating EHR and patient surveys to enhance precision medicine. And the inclusion of SDoH clustering labels revealed the significant impact of sociobehavioral factors on patient outcomes, emphasizing the importance of comprehensive health determinants in predictive models. Despite these advancements, limitations include the exclusion of genetic data, broad categorization of CVD conditions, and the need for fairness analyses to ensure equitable model performance across diverse populations. Future work should refine clinical and social variable measurements, incorporate advanced imputation techniques, and explore additional predictive algorithms to enhance model precision and fairness.
Conclusion: This study demonstrates the liability of the All of Us's diverse dataset in developing a multi-modality predictive model for CVD in BC survivors risk stratification in oncological survivorship. The data integration pipeline and subsequent predictive models establish a methodological foundation for future research into personalized healthcare.
研究目的本研究利用 "我们所有人研究计划"(All of Us)数据集的丰富多样性,设计出乳腺癌(BC)幸存者心血管疾病(CVD)的预测模型。这项工作的核心是创建一个强大的数据集成管道,该管道可综合电子健康记录(EHR)、患者调查和基因组数据,同时维护不同人口统计学变量之间的公平性:我们开发了一个通用数据处理管道,用于处理和合并 "我们所有人 "数据集的异构数据源,解决数据缺失和数据差异问题,并将不同的数据模式整合到一个连贯的分析框架中。利用包括电子病历、生活方式和健康的社会决定因素 (SDoH) 数据在内的复合特征集,我们采用自适应拉索和随机森林回归模型来预测 6 种心血管疾病的结果。在 10 年的时间里,我们使用 c 指数和随时间变化的接收者工作特征曲线下面积对模型进行了评估:结果:自适应套索模型在大多数心血管疾病结果中表现出一致的性能,而随机森林模型在预测短暂性脑缺血发作等结果时表现尤为突出,因为它结合了完整的多模型特征集。特征重要性分析表明,年龄和既往冠心病事件是预测心血管疾病结果的主要因素,而SDoH聚类标签则突出了社会因素的细微影响:基于 Cox 的预测模型和随机森林回归模型的开发代表了 "我们所有人 "在整合电子病历和患者调查以提高精准医疗方面的广泛应用。SDoH聚类标签的加入揭示了社会行为因素对患者预后的重大影响,强调了预测模型中综合健康决定因素的重要性。尽管取得了这些进步,但仍存在一些局限性,包括未纳入基因数据、心血管疾病分类过宽,以及需要进行公平性分析以确保模型在不同人群中的公平表现。未来的工作应完善临床和社会变量测量,采用先进的估算技术,并探索更多的预测算法,以提高模型的精确性和公平性:本研究证明了 "我们所有人 "的多样化数据集在开发多模式预测模型以预测不列颠哥伦比亚省幸存者心血管疾病方面的作用。数据整合管道和后续预测模型为未来个性化医疗保健研究奠定了方法论基础。
{"title":"Multi-modality risk prediction of cardiovascular diseases for breast cancer cohort in the All of Us Research Program.","authors":"Han Yang, Sicheng Zhou, Zexi Rao, Chen Zhao, Erjia Cui, Chetan Shenoy, Anne H Blaes, Nishitha Paidimukkala, Jinhua Wang, Jue Hou, Rui Zhang","doi":"10.1093/jamia/ocae199","DOIUrl":"10.1093/jamia/ocae199","url":null,"abstract":"<p><strong>Objective: </strong>This study leverages the rich diversity of the All of Us Research Program (All of Us)'s dataset to devise a predictive model for cardiovascular disease (CVD) in breast cancer (BC) survivors. Central to this endeavor is the creation of a robust data integration pipeline that synthesizes electronic health records (EHRs), patient surveys, and genomic data, while upholding fairness across demographic variables.</p><p><strong>Materials and methods: </strong>We have developed a universal data wrangling pipeline to process and merge heterogeneous data sources of the All of Us dataset, address missingness and variance in data, and align disparate data modalities into a coherent framework for analysis. Utilizing a composite feature set including EHR, lifestyle, and social determinants of health (SDoH) data, we then employed Adaptive Lasso and Random Forest regression models to predict 6 CVD outcomes. The models were evaluated using the c-index and time-dependent Area Under the Receiver Operating Characteristic Curve over a 10-year period.</p><p><strong>Results: </strong>The Adaptive Lasso model showed consistent performance across most CVD outcomes, while the Random Forest model excelled particularly in predicting outcomes like transient ischemic attack when incorporating the full multi-model feature set. Feature importance analysis revealed age and previous coronary events as dominant predictors across CVD outcomes, with SDoH clustering labels highlighting the nuanced impact of social factors.</p><p><strong>Discussion: </strong>The development of both Cox-based predictive model and Random Forest Regression model represents the extensive application of the All of Us, in integrating EHR and patient surveys to enhance precision medicine. And the inclusion of SDoH clustering labels revealed the significant impact of sociobehavioral factors on patient outcomes, emphasizing the importance of comprehensive health determinants in predictive models. Despite these advancements, limitations include the exclusion of genetic data, broad categorization of CVD conditions, and the need for fairness analyses to ensure equitable model performance across diverse populations. Future work should refine clinical and social variable measurements, incorporate advanced imputation techniques, and explore additional predictive algorithms to enhance model precision and fairness.</p><p><strong>Conclusion: </strong>This study demonstrates the liability of the All of Us's diverse dataset in developing a multi-modality predictive model for CVD in BC survivors risk stratification in oncological survivorship. The data integration pipeline and subsequent predictive models establish a methodological foundation for future research into personalized healthcare.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"2800-2810"},"PeriodicalIF":4.7,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631116/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141767875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}