Pub Date : 2026-01-11eCollection Date: 2026-02-01DOI: 10.1093/jamiaopen/ooag001
João Brainer Clares de Andrade, Rafael P Gomes, Alexandre Cristiuma Robles, Thales Pardini Fagundes, George N Nunes Mendes
Objectives: Identifying patients at high risk for atrial fibrillation (AF) after cryptogenic stroke remains a challenge, particularly in settings with limited access to long-term cardiac monitoring. The AFibrisk platform, a free digital decision-support tool, integrates 19 validated AF prediction scores to support post-stroke triage. We aimed to assess the concordance of AFibrisk-supported classification decisions with expert electrophysiologist consensus and compare performance across evaluator groups with different levels of clinical experience.
Materials and methods: A prospective, cross-sectional concordance study was conducted using 29 standardized clinical vignettes. Evaluators-3 vascular neurologists, 4 cardiology residents, and 11 neurology residents-classified each case as high or low AF risk using AFibrisk outputs. Expert consensus served as the reference standard. Statistical analyses included inter-group comparisons, inter-rater reliability, and regression models adjusting for group size and response clustering.
Results: Vascular neurologists demonstrated the highest agreement with the reference standard (mean 90.3%), followed by cardiology residents (85.2%) and neurology residents (77.5%). Differences were statistically significant (ANOVA p = .0199; Kruskal-Wallis p = .0259). Neurology residents showed the greatest intra-group consistency (Light's κ = 0.607), despite lower accuracy. Classification errors differed by experience: residents tended to overestimate risk, while experts showed occasional underestimation. Overall, 30.1% of responses were "not classified," with the highest uncertainty among vascular neurologists (43.8%).
Discussion and conclusion: AFibrisk improved alignment with expert judgment across evaluator groups and helped standardize decision-making. Our free platform may support AF risk stratification in low-resource environments and reinforce evidence-based heuristics among early-career clinicians, and it is available at www.afibrisk.net.
目的:确定隐源性卒中后心房颤动(AF)高风险患者仍然是一个挑战,特别是在长期心脏监测有限的情况下。AFibrisk平台是一个免费的数字决策支持工具,集成了19个经过验证的房颤预测评分,以支持卒中后分诊。我们旨在评估afifisk支持的分类决策与电生理学专家共识的一致性,并比较具有不同临床经验水平的评估者组的表现。材料和方法:一项前瞻性、横断面一致性研究使用29个标准化临床小样本进行。评估人员——3名血管神经科医生、4名心脏病住院医生和11名神经内科住院医生——使用AFibrisk输出将每个病例分为高或低心房颤动风险。专家共识作为参考标准。统计分析包括组间比较、组间信度和根据组大小和反应聚类调整的回归模型。结果:血管神经科医师与参考标准的一致性最高(平均90.3%),其次是心脏病科(85.2%)和神经内科(77.5%)。差异有统计学意义(方差分析p = 0.0199; Kruskal-Wallis p = 0.0259)。尽管准确率较低,但神经内科住院医生表现出最大的组内一致性(Light’s κ = 0.607)。分类错误因经验而异:居民倾向于高估风险,而专家偶尔会低估风险。总体而言,30.1%的回答“未分类”,其中血管神经科医生的不确定性最高(43.8%)。讨论和结论:AFibrisk提高了评估小组与专家判断的一致性,并有助于标准化决策。我们的免费平台可以支持低资源环境下的房颤风险分层,并加强早期职业临床医生的循证启发式,该平台可在www.afibrisk.net上获得。
{"title":"A preliminary evaluation of AFibrisk: digital decision-support platform for atrial fibrillation risk assessment after cryptogenic stroke-a cross-sectional concordance study.","authors":"João Brainer Clares de Andrade, Rafael P Gomes, Alexandre Cristiuma Robles, Thales Pardini Fagundes, George N Nunes Mendes","doi":"10.1093/jamiaopen/ooag001","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooag001","url":null,"abstract":"<p><strong>Objectives: </strong>Identifying patients at high risk for atrial fibrillation (AF) after cryptogenic stroke remains a challenge, particularly in settings with limited access to long-term cardiac monitoring. The AFibrisk platform, a free digital decision-support tool, integrates 19 validated AF prediction scores to support post-stroke triage. We aimed to assess the concordance of AFibrisk-supported classification decisions with expert electrophysiologist consensus and compare performance across evaluator groups with different levels of clinical experience.</p><p><strong>Materials and methods: </strong>A prospective, cross-sectional concordance study was conducted using 29 standardized clinical vignettes. Evaluators-3 vascular neurologists, 4 cardiology residents, and 11 neurology residents-classified each case as high or low AF risk using AFibrisk outputs. Expert consensus served as the reference standard. Statistical analyses included inter-group comparisons, inter-rater reliability, and regression models adjusting for group size and response clustering.</p><p><strong>Results: </strong>Vascular neurologists demonstrated the highest agreement with the reference standard (mean 90.3%), followed by cardiology residents (85.2%) and neurology residents (77.5%). Differences were statistically significant (ANOVA <i>p</i> = .0199; Kruskal-Wallis <i>p</i> = .0259). Neurology residents showed the greatest intra-group consistency (Light's κ = 0.607), despite lower accuracy. Classification errors differed by experience: residents tended to overestimate risk, while experts showed occasional underestimation. Overall, 30.1% of responses were \"not classified,\" with the highest uncertainty among vascular neurologists (43.8%).</p><p><strong>Discussion and conclusion: </strong>AFibrisk improved alignment with expert judgment across evaluator groups and helped standardize decision-making. Our free platform may support AF risk stratification in low-resource environments and reinforce evidence-based heuristics among early-career clinicians, and it is available at www.afibrisk.net.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"9 1","pages":"ooag001"},"PeriodicalIF":3.4,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12924629/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08eCollection Date: 2026-02-01DOI: 10.1093/jamiaopen/ooaf175
Mohammed Al-Garadi, Rishi J Desai, Kerry Ngan, Michele LeNoue-Newton, Ruth M Reeves, Daniel Park, Jose J Hernández-Muñoz, Shirley V Wang, Judith C Maro, Candace C Fuller, Joshua Lin Kueiyu, Aida Kuzucan, Kevin Coughlin, Haritha Pillai, Melissa McPheeters, Jill Whitaker, Jessica A Buckner, Michael F McLemore, Dax M Westerman, Michael E Matheny
Objectives: To develop and validate machine learning (ML) models that predict probable cause of death (CoD) using structured electronic health record (EHR) data, unstructured clinical notes, and publicly available sources.
Materials and methods: This multi-institutional retrospective study was conducted across Vanderbilt University Medical Center (VUMC) and Massachusetts General Brigham (MGB), including deceased patients with encounters between October 1, 2015, and January 1, 2021, and confirmed death records. The cohort included 13 708 patients from VUMC and 34 839 from MGB.The primary outcome was underlying CoD categorized into the top 15 National Center for Health Statistics rankable causes, with others grouped as "Other." Performance was assessed using weighted area under the receiver operating characteristic curve (AUC) and F-measure.
Results: The XGBoost model using structured EHR data alone achieved weighted AUCs of 0.86 (95% CI, 0.84-0.88) at VUMC and 0.80 (95% CI, 0.79-0.80) at MGB. Adding unstructured notes improved performance, with weighted AUCs of 0.90 (95% CI, 0.88-0.93) at VUMC and 0.92 (95% CI, 0.91-0.92) at MGB. Adding publicly available data did not further improve performance. Cross-institutional validation revealed significant performance degradation.
Discussion: Models integrating structured and unstructured EHR data show strong within-institution performance but limited generalizability across healthcare systems, highlighting challenges related to institutional data heterogeneity.
Conclusions: Machine learning models combining structured and unstructured EHR data accurately predict CoD within institutions but perform poorly across sites. Health-care institutions may benefit from adopting robust processes for locally tailored models, and future research should focus on enhancing model generalizability while addressing unique institutional data environments.
{"title":"Enhancing cause of death prediction: development and validation of machine learning models using multimodal data across multiple health-care sites.","authors":"Mohammed Al-Garadi, Rishi J Desai, Kerry Ngan, Michele LeNoue-Newton, Ruth M Reeves, Daniel Park, Jose J Hernández-Muñoz, Shirley V Wang, Judith C Maro, Candace C Fuller, Joshua Lin Kueiyu, Aida Kuzucan, Kevin Coughlin, Haritha Pillai, Melissa McPheeters, Jill Whitaker, Jessica A Buckner, Michael F McLemore, Dax M Westerman, Michael E Matheny","doi":"10.1093/jamiaopen/ooaf175","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooaf175","url":null,"abstract":"<p><strong>Objectives: </strong>To develop and validate machine learning (ML) models that predict probable cause of death (CoD) using structured electronic health record (EHR) data, unstructured clinical notes, and publicly available sources.</p><p><strong>Materials and methods: </strong>This multi-institutional retrospective study was conducted across Vanderbilt University Medical Center (VUMC) and Massachusetts General Brigham (MGB), including deceased patients with encounters between October 1, 2015, and January 1, 2021, and confirmed death records. The cohort included 13 708 patients from VUMC and 34 839 from MGB.The primary outcome was underlying CoD categorized into the top 15 National Center for Health Statistics rankable causes, with others grouped as \"Other.\" Performance was assessed using weighted area under the receiver operating characteristic curve (AUC) and F-measure.</p><p><strong>Results: </strong>The XGBoost model using structured EHR data alone achieved weighted AUCs of 0.86 (95% CI, 0.84-0.88) at VUMC and 0.80 (95% CI, 0.79-0.80) at MGB. Adding unstructured notes improved performance, with weighted AUCs of 0.90 (95% CI, 0.88-0.93) at VUMC and 0.92 (95% CI, 0.91-0.92) at MGB. Adding publicly available data did not further improve performance. Cross-institutional validation revealed significant performance degradation.</p><p><strong>Discussion: </strong>Models integrating structured and unstructured EHR data show strong within-institution performance but limited generalizability across healthcare systems, highlighting challenges related to institutional data heterogeneity.</p><p><strong>Conclusions: </strong>Machine learning models combining structured and unstructured EHR data accurately predict CoD within institutions but perform poorly across sites. Health-care institutions may benefit from adopting robust processes for locally tailored models, and future research should focus on enhancing model generalizability while addressing unique institutional data environments.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"9 1","pages":"ooaf175"},"PeriodicalIF":3.4,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12924636/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07eCollection Date: 2026-02-01DOI: 10.1093/jamiaopen/ooaf181
Ya-Yun Yeh, Hsin-Yueh Lin, Jingchuan Guo, Ramon C Sun, Sizun Jiang, Jiang Bian, Hao Dai
Objectives: Electronic health records (EHRs) rarely capture dietary detail, limiting diet-disease research. We aimed to develop machine learning (ML) computable phenotypes to identify high-fat diet (HFD) using variables typically available in EHRs.
Materials and methods: We used National Health and Nutrition Examination Survey (NHANES) 1999-2020 data, where 24-h dietary recall served as ground truth. Dietary fat intake was summarized into a score (0-30) based on percent energy from fat, carbohydrate, and protein; lower scores indicated HFD. We defined HFD at cutoffs of 10, 15, and 20, and trained ML models (Extreme Gradient Boosting, logistic regression, random forest) using EHR-compatible variables (demographics, comorbidities, labs, anthropometrics). Model interpretability was assessed using Shapley Additive Explanations. To evaluate clinical relevance, we compared cancer associations using ML-predicted vs true diet labels.
Results: Machine learning models classified HFD with good performance, strongest at broader definitions. Random forest achieved an F1-score of 0.79 (recall 0.74, precision 0.84) at cutoff 20. Key predictors included race/ethnicity, triglycerides, obesity metrics (body mass index and derived indices), and metabolic panel results.
Discussion: These findings indicate that dietary patterns, though seldom recorded in EHRs, can be inferred from routinely available variables. The ability of ML-derived phenotypes to reproduce known diet-disease relationships underscore their epidemiologic validity. Top predictors also align with established biological pathways linking obesity, lipid metabolism, and cancer risk, supporting plausibility.
Conclusion: A high-fat dietary pattern can be inferred from EHR-compatible variables using ML-based phenotyping. This approach offers a scalable tool to integrate diet into EHR-based research and precision medicine.
{"title":"Inferring high-fat dietary patterns from electronic health record data using machine learning.","authors":"Ya-Yun Yeh, Hsin-Yueh Lin, Jingchuan Guo, Ramon C Sun, Sizun Jiang, Jiang Bian, Hao Dai","doi":"10.1093/jamiaopen/ooaf181","DOIUrl":"10.1093/jamiaopen/ooaf181","url":null,"abstract":"<p><strong>Objectives: </strong>Electronic health records (EHRs) rarely capture dietary detail, limiting diet-disease research. We aimed to develop machine learning (ML) computable phenotypes to identify high-fat diet (HFD) using variables typically available in EHRs.</p><p><strong>Materials and methods: </strong>We used National Health and Nutrition Examination Survey (NHANES) 1999-2020 data, where 24-h dietary recall served as ground truth. Dietary fat intake was summarized into a score (0-30) based on percent energy from fat, carbohydrate, and protein; lower scores indicated HFD. We defined HFD at cutoffs of 10, 15, and 20, and trained ML models (Extreme Gradient Boosting, logistic regression, random forest) using EHR-compatible variables (demographics, comorbidities, labs, anthropometrics). Model interpretability was assessed using Shapley Additive Explanations. To evaluate clinical relevance, we compared cancer associations using ML-predicted vs true diet labels.</p><p><strong>Results: </strong>Machine learning models classified HFD with good performance, strongest at broader definitions. Random forest achieved an F1-score of 0.79 (recall 0.74, precision 0.84) at cutoff 20. Key predictors included race/ethnicity, triglycerides, obesity metrics (body mass index and derived indices), and metabolic panel results.</p><p><strong>Discussion: </strong>These findings indicate that dietary patterns, though seldom recorded in EHRs, can be inferred from routinely available variables. The ability of ML-derived phenotypes to reproduce known diet-disease relationships underscore their epidemiologic validity. Top predictors also align with established biological pathways linking obesity, lipid metabolism, and cancer risk, supporting plausibility.</p><p><strong>Conclusion: </strong>A high-fat dietary pattern can be inferred from EHR-compatible variables using ML-based phenotyping. This approach offers a scalable tool to integrate diet into EHR-based research and precision medicine.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"9 1","pages":"ooaf181"},"PeriodicalIF":3.4,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12794014/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145967244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf166
Katherine Parkin, Ryan Crowley, Rachel Sippy, Shabina Hayat, Yi Zhang, Emily Brewis, Nicole Marshall, Tara Ramsay-Patel, Vahgisha Thirugnanasampanthan, Guy Skinner, Peter Fonagy, Carol Brayne, Anna Moore
Objectives: To create a theoretical framework of mental health risk factors to inform the development of prediction models for young people's mental health problems.
Materials and methods: We created an initial prototype theoretical framework using a rapid literature search and stakeholder discussion. A snowball sampling approach identified experts for the Delphi study. Round 1 sought consensus on the overall approach, framework domains, and life course stages. Round 2 aimed to establish the points in the life course where exposure to specific risk factors would be most influential. Round 3 ranked risk factors within domains by their predictive importance for young people's mental health problems.
Results: The final framework reached consensus after 3 rounds and included 287 risk factors across 8 domains and 5 life course stages. Twenty-five experts completed round 3. Domains ranked as most important were "Social and Environmental" and "Psychological and Mental Health." Ranked lists of risk factors within domains and heat maps showing the salience of risk factors across life course stages were generated.
Discussion: The study integrated multidisciplinary expert perspectives and prioritized health equity throughout the framework's development. The ranked risk factor lists and life stage heat maps support the targeted inclusion of risk factors across developmental stages in prediction models.
Conclusion: This theoretical framework provides a roadmap of important risk factors for inclusion in early identification models to enhance the predictive accuracy of childhood mental health problems. It offers a useful theoretical reference point to support model building for those without domain expertise.
{"title":"Development of a risk factor framework to inform machine learning prediction of young people's mental health problems: a Delphi study.","authors":"Katherine Parkin, Ryan Crowley, Rachel Sippy, Shabina Hayat, Yi Zhang, Emily Brewis, Nicole Marshall, Tara Ramsay-Patel, Vahgisha Thirugnanasampanthan, Guy Skinner, Peter Fonagy, Carol Brayne, Anna Moore","doi":"10.1093/jamiaopen/ooaf166","DOIUrl":"10.1093/jamiaopen/ooaf166","url":null,"abstract":"<p><strong>Objectives: </strong>To create a theoretical framework of mental health risk factors to inform the development of prediction models for young people's mental health problems.</p><p><strong>Materials and methods: </strong>We created an initial prototype theoretical framework using a rapid literature search and stakeholder discussion. A snowball sampling approach identified experts for the Delphi study. Round 1 sought consensus on the overall approach, framework domains, and life course stages. Round 2 aimed to establish the points in the life course where exposure to specific risk factors would be most influential. Round 3 ranked risk factors within domains by their predictive importance for young people's mental health problems.</p><p><strong>Results: </strong>The final framework reached consensus after 3 rounds and included 287 risk factors across 8 domains and 5 life course stages. Twenty-five experts completed round 3. Domains ranked as most important were \"Social and Environmental\" and \"Psychological and Mental Health.\" Ranked lists of risk factors within domains and heat maps showing the salience of risk factors across life course stages were generated.</p><p><strong>Discussion: </strong>The study integrated multidisciplinary expert perspectives and prioritized health equity throughout the framework's development. The ranked risk factor lists and life stage heat maps support the targeted inclusion of risk factors across developmental stages in prediction models.</p><p><strong>Conclusion: </strong>This theoretical framework provides a roadmap of important risk factors for inclusion in early identification models to enhance the predictive accuracy of childhood mental health problems. It offers a useful theoretical reference point to support model building for those without domain expertise.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf166"},"PeriodicalIF":3.4,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12726920/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145828723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf164
Sarah Y Bessen, Sean Tackett, Kimberly S Peairs, Lisa Christopher-Stine, Charles M Stewart, Lee D Biddison, Maria Oliva-Hemker, Jennifer K Lee
Objectives: Electronic health record (EHR) work may differently affect women and men physicians. Identifying gender discrepancies in EHR work across different specialties may inform strategies to reduce EHR burdens.
Materials and methods: We retrospectively evaluated EHR use by ambulatory physicians in 4 specialties (2 procedural [cardiology and gastroenterology] and 2 nonprocedural [internal medicine and rheumatology]) during 1 year at a large academic medical institution. Gender differences in EHR and clinical workload across specialties were evaluated by analysis of variance. Mixed-effects linear regression models analyzed gender differences in EHR workload controlling for specialty. Significant differences were additionally examined by stratifying procedural and nonprocedural specialties.
Results: Clinical and EHR workload varied across specialties (P <.05), though scheduled clinical workload did not differ by gender. Controlling for specialty, women physicians spent more time per appointment on In Basket messages (P =.001), sent more Secure Chat messages per appointment (P =.003), and spent more time in the EHR outside 7:00 AM-7:00 PM (P <.001) than men. Gender differences in messaging were concentrated among the procedural physicians. Women procedural physicians spent more time on In Basket messages (P <.001) and sent more Secure Chat messages (P =.007) than men, whereas these differences did not occur among nonprocedural physicians.
Discussion: Women physicians had greater EHR burdens despite similar scheduled clinical workloads as men. The greater messaging workload predominantly affected women procedural physicians.
Conclusion: Gender disparities in EHR burden in ambulatory specialties vary between procedural and nonprocedural fields. Future research is needed to mitigate gender inequity in EHR workloads.
{"title":"Higher electronic health record burden among women physicians in academic ambulatory medicine.","authors":"Sarah Y Bessen, Sean Tackett, Kimberly S Peairs, Lisa Christopher-Stine, Charles M Stewart, Lee D Biddison, Maria Oliva-Hemker, Jennifer K Lee","doi":"10.1093/jamiaopen/ooaf164","DOIUrl":"10.1093/jamiaopen/ooaf164","url":null,"abstract":"<p><strong>Objectives: </strong>Electronic health record (EHR) work may differently affect women and men physicians. Identifying gender discrepancies in EHR work across different specialties may inform strategies to reduce EHR burdens.</p><p><strong>Materials and methods: </strong>We retrospectively evaluated EHR use by ambulatory physicians in 4 specialties (2 procedural [cardiology and gastroenterology] and 2 nonprocedural [internal medicine and rheumatology]) during 1 year at a large academic medical institution. Gender differences in EHR and clinical workload across specialties were evaluated by analysis of variance. Mixed-effects linear regression models analyzed gender differences in EHR workload controlling for specialty. Significant differences were additionally examined by stratifying procedural and nonprocedural specialties.</p><p><strong>Results: </strong>Clinical and EHR workload varied across specialties (<i>P</i> <.05), though scheduled clinical workload did not differ by gender. Controlling for specialty, women physicians spent more time per appointment on In Basket messages (<i>P</i> =.001), sent more Secure Chat messages per appointment (<i>P</i> =.003), and spent more time in the EHR outside 7:00 AM-7:00 PM (<i>P</i> <.001) than men. Gender differences in messaging were concentrated among the procedural physicians. Women procedural physicians spent more time on In Basket messages (<i>P</i> <.001) and sent more Secure Chat messages (<i>P</i> =.007) than men, whereas these differences did not occur among nonprocedural physicians.</p><p><strong>Discussion: </strong>Women physicians had greater EHR burdens despite similar scheduled clinical workloads as men. The greater messaging workload predominantly affected women procedural physicians.</p><p><strong>Conclusion: </strong>Gender disparities in EHR burden in ambulatory specialties vary between procedural and nonprocedural fields. Future research is needed to mitigate gender inequity in EHR workloads.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf164"},"PeriodicalIF":3.4,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12715314/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145805614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-14eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf168
Robin Austin, Malin Britt Lalich, Katy Stewart, Jonna Zarbano, Matthew Byrne, Melissa D Pinto, Elizabeth E Umberfield
Objectives: The primary objective of this research is to assess the content coverage of nursing data within a publicly available common data model (CDM), focusing on how nursing data, documented in flowsheets, are represented within the model.
Materials and methods: This mapping study was informed by previous evaluation studies and serves as a framework for evaluating information resources, including to guide development and implementation. The overall research process consists of 4 steps: (1) identify a CDM; (2) define evaluation criteria; (3) map nursing flowsheet data; and (4) apply evaluation criteria.
Results: Overall, 65.5% (n = 1170) of the flowsheet concepts were mapped to Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and Logical Observation Identifiers Names and Codes (LOINC) target codes and 56.0% (n = 1831) of the flowsheet values were mapped to SNOMED CT and LOINC target codes. The flowsheet concepts had a higher average mapping time per concept/reviewer (1.19 min) as compared to the average mapping time per value/reviewer (0.64 min).
Discussion: This mapping study demonstrated the progress and ongoing challenges of mapping nursing data to a national common data model. However, the ability to use nursing data at scale in a national CDM remains limited until more comprehensive mapping is completed.
Conclusion: This mapping study identifies a significant gap in integrating nursing data into a national common data model, highlighting an opportunity to enhance patient care through improved real-time insights and evidence-based nursing practices. Addressing this gap can help shape policies that prioritize the inclusion of nursing data. Additionally, aligning nursing data at scale can advance research, increase efficiency, and optimize nurse-sensitive patient outcomes.
{"title":"Exploring common data model coverage of nursing flowsheet data: a pilot study using SNOMED CT and LOINC mapping.","authors":"Robin Austin, Malin Britt Lalich, Katy Stewart, Jonna Zarbano, Matthew Byrne, Melissa D Pinto, Elizabeth E Umberfield","doi":"10.1093/jamiaopen/ooaf168","DOIUrl":"10.1093/jamiaopen/ooaf168","url":null,"abstract":"<p><strong>Objectives: </strong>The primary objective of this research is to assess the content coverage of nursing data within a publicly available common data model (CDM), focusing on how nursing data, documented in flowsheets, are represented within the model.</p><p><strong>Materials and methods: </strong>This mapping study was informed by previous evaluation studies and serves as a framework for evaluating information resources, including to guide development and implementation. The overall research process consists of 4 steps: (1) identify a CDM; (2) define evaluation criteria; (3) map nursing flowsheet data; and (4) apply evaluation criteria.</p><p><strong>Results: </strong>Overall, 65.5% (<i>n</i> = 1170) of the flowsheet concepts were mapped to Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and Logical Observation Identifiers Names and Codes (LOINC) target codes and 56.0% (<i>n</i> = 1831) of the flowsheet values were mapped to SNOMED CT and LOINC target codes. The flowsheet concepts had a higher average mapping time per concept/reviewer (1.19 min) as compared to the average mapping time per value/reviewer (0.64 min).</p><p><strong>Discussion: </strong>This mapping study demonstrated the progress and ongoing challenges of mapping nursing data to a national common data model. However, the ability to use nursing data at scale in a national CDM remains limited until more comprehensive mapping is completed.</p><p><strong>Conclusion: </strong>This mapping study identifies a significant gap in integrating nursing data into a national common data model, highlighting an opportunity to enhance patient care through improved real-time insights and evidence-based nursing practices. Addressing this gap can help shape policies that prioritize the inclusion of nursing data. Additionally, aligning nursing data at scale can advance research, increase efficiency, and optimize nurse-sensitive patient outcomes.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf168"},"PeriodicalIF":3.4,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701890/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145763949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-10eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf124
[This corrects the article DOI: 10.1093/jamiaopen/ooz061.].
[这更正了文章DOI: 10.1093/jamiaopen/ooz061.]。
{"title":"Correction to: Response to survey directed to patient portal members differs by age, race, and healthcare utilization.","authors":"","doi":"10.1093/jamiaopen/ooaf124","DOIUrl":"https://doi.org/10.1093/jamiaopen/ooaf124","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.1093/jamiaopen/ooz061.].</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf124"},"PeriodicalIF":3.4,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12706857/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf127
Juan Antonio Lossio-Ventura, Samuel Frank, Grace Ringlein, Kirsten Bonson, Ardyn Olszko, Abbey Knobel, Daniel S Pine, Jennifer B Freeman, Kristen Benito, David C Jangraw, Francisco Pereira
Objective: To develop and evaluate an automated classification system for labeling Exposure Process Coding System (EPCS) quality codes-specifically exposure and encourage events-during in-person exposure therapy sessions using automatic speech recognition (ASR) and natural language processing techniques.
Materials and methods: The system was trained and tested on 360 manually labeled pediatric Obsessive-Compulsive Disorder (OCD) therapy sessions from 3 clinical trials. Audio recordings were transcribed using ASR tools (OpenAI's Whisper and Google Speech-to-Text). Transcription accuracy was evaluated via word error rate (WER) on manual transcriptions of 2-minute audio segments compared against ASR-generated transcripts. The resulting text was analyzed with transformer-based models, including Bidirectional Encoder Representations from Transformers (BERT), Sentence-BERT, and Meta Llama 3. Models were trained to predict EPCS codes in 2 classification settings: sequence-level classification, where events are labeled in delimited text chunks, and token-level classification, where event boundaries are unknown. Classification was performed either with fine-tuned transformer-based models, or with logistic regression on embeddings produced by each model.
Results: With respect to transcription accuracy, Whisper outperformed Google Speech-to-Text with a lower WER (0.31 vs 0.51). For sequence classification setting, Llama 3 models achieved high performance with area under the ROC curve (AUC) scores of 0.95 for exposures and 0.75 for encourage events, outperforming traditional methods and standard BERT models. In the token-level setting, fine-tuned BERT models performed best, achieving AUC scores of 0.85 for exposures and 0.75 for encourage events.
Discussion and conclusion: Current ASR and transformer-based models enable automated quality coding of in-person exposure therapy sessions. These findings demonstrate potential for real-time assessment in clinical practice and scalable research on effective therapy methods. Future work should focus on optimization, including improvements in ASR accuracy, expanding training datasets, and multimodal data integration.
目的:利用自动语音识别(ASR)和自然语言处理技术,开发和评估一种用于标记暴露过程编码系统(EPCS)质量代码(特别是暴露和鼓励事件)的自动分类系统。材料与方法:对该系统进行了3个临床试验的360个手动标记的儿童强迫症(OCD)治疗疗程的训练和测试。使用ASR工具(OpenAI的Whisper和谷歌Speech-to-Text)转录音频记录。转录准确性通过人工转录2分钟音频片段的单词错误率(WER)与asr生成的转录进行比较。结果文本使用基于变压器的模型进行分析,包括来自变压器的双向编码器表示(BERT)、句子-BERT和Meta Llama 3。训练模型在两种分类设置下预测EPCS代码:序列级分类,其中事件在分隔的文本块中标记,以及标记级分类,其中事件边界未知。通过微调变压器模型或对每个模型产生的嵌入进行逻辑回归进行分类。结果:在转录准确性方面,Whisper优于谷歌Speech-to-Text, WER较低(0.31 vs 0.51)。在序列分类设置方面,Llama 3模型的ROC曲线下面积(area under ROC curve, AUC)得分在曝光和鼓励事件下分别为0.95和0.75,优于传统方法和标准BERT模型。在令牌级别设置中,微调BERT模型表现最佳,暴露的AUC得分为0.85,鼓励事件的AUC得分为0.75。讨论和结论:当前的ASR和基于变压器的模型能够实现面对面暴露治疗过程的自动质量编码。这些发现显示了在临床实践和有效治疗方法的可扩展研究中进行实时评估的潜力。未来的工作应侧重于优化,包括提高ASR的准确性、扩展训练数据集和多模态数据集成。
{"title":"Automated classification of exposure and encourage events in speech data from pediatric OCD treatment.","authors":"Juan Antonio Lossio-Ventura, Samuel Frank, Grace Ringlein, Kirsten Bonson, Ardyn Olszko, Abbey Knobel, Daniel S Pine, Jennifer B Freeman, Kristen Benito, David C Jangraw, Francisco Pereira","doi":"10.1093/jamiaopen/ooaf127","DOIUrl":"10.1093/jamiaopen/ooaf127","url":null,"abstract":"<p><strong>Objective: </strong>To develop and evaluate an automated classification system for labeling Exposure Process Coding System (EPCS) quality codes-specifically exposure and encourage events-during in-person exposure therapy sessions using automatic speech recognition (ASR) and natural language processing techniques.</p><p><strong>Materials and methods: </strong>The system was trained and tested on 360 manually labeled pediatric Obsessive-Compulsive Disorder (OCD) therapy sessions from 3 clinical trials. Audio recordings were transcribed using ASR tools (OpenAI's Whisper and Google Speech-to-Text). Transcription accuracy was evaluated via word error rate (WER) on manual transcriptions of 2-minute audio segments compared against ASR-generated transcripts. The resulting text was analyzed with transformer-based models, including Bidirectional Encoder Representations from Transformers (BERT), Sentence-BERT, and Meta Llama 3. Models were trained to predict EPCS codes in 2 classification settings: sequence-level classification, where events are labeled in delimited text chunks, and token-level classification, where event boundaries are unknown. Classification was performed either with fine-tuned transformer-based models, or with logistic regression on embeddings produced by each model.</p><p><strong>Results: </strong>With respect to transcription accuracy, Whisper outperformed Google Speech-to-Text with a lower WER (0.31 vs 0.51). For sequence classification setting, Llama 3 models achieved high performance with area under the ROC curve (AUC) scores of 0.95 for exposures and 0.75 for encourage events, outperforming traditional methods and standard BERT models. In the token-level setting, fine-tuned BERT models performed best, achieving AUC scores of 0.85 for exposures and 0.75 for encourage events.</p><p><strong>Discussion and conclusion: </strong>Current ASR and transformer-based models enable automated quality coding of in-person exposure therapy sessions. These findings demonstrate potential for real-time assessment in clinical practice and scalable research on effective therapy methods. Future work should focus on optimization, including improvements in ASR accuracy, expanding training datasets, and multimodal data integration.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf127"},"PeriodicalIF":3.4,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12696644/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145757883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf156
Whitney Shae, Md Saiful Islam Saif, John Fife, Dinesh Pal Mudaranthakam, Dong Pei, Lisa Harlan-Williams, Jeffrey A Thompson, Devin C Koestler
Objectives: The objective of this study was to develop and test natural language processing (NLP) methods for screening and, ultimately, predicting the cancer relevance of peer-reviewed publications.
Materials and methods: Two datasets were used: (1) manually curated publications labeled for cancer relevance, co-authored by members of The University of Kansas Cancer Center (KUCC) and (2) a derived dataset containing cancer-related abstracts from American Association for Cancer Research journals and noncancer-related abstracts from other medical journals. Two text encoding methods were explored: term frequency-inverse document frequency (TF-IDF) vectorization and various BERT embeddings. These representations served as inputs to 3 supervised machine learning classifiers: Support Vector Classification (SVC), Gradient Boosting Classification, and Multilayer Perceptron (MLP) neural networks. Model performance was evaluated by comparing predictions to the "true" cancer-relevant labels in a withheld test set.
Results: All machine learning models performed best when trained and tested within the derived dataset. Across the datasets, SVC and MLP both exhibited strong performance, with F1 scores as high as 0.976 and 0.997, respectively. BioBERT embeddings resulted in slightly higher metrics when compared to TF-IDF vectorization across most models.
Discussion: Models trained on the derived data performed very well internally; however, weaker performance was noted when these models were tested on the KUCC dataset. This finding highlights the subjective nature of cancer-relevant determinations. In contrast, KUCC trained models had high predictive performance when tested on the derived-specific classifications, showing that models trained on the KUCC dataset may be suitable for wider cancer-relevant prediction.
Conclusions: Overall, our results suggest that NLP can effectively automate the classification of cancer-relevant publications, enhancing research productivity tracking; however, great care should be taken in selecting the appropriate data, text representation approach, and machine learning approach.
{"title":"Utilizing natural language processing to identify cancer-relevant publications at a National Cancer Institute-designated cancer center.","authors":"Whitney Shae, Md Saiful Islam Saif, John Fife, Dinesh Pal Mudaranthakam, Dong Pei, Lisa Harlan-Williams, Jeffrey A Thompson, Devin C Koestler","doi":"10.1093/jamiaopen/ooaf156","DOIUrl":"10.1093/jamiaopen/ooaf156","url":null,"abstract":"<p><strong>Objectives: </strong>The objective of this study was to develop and test natural language processing (NLP) methods for screening and, ultimately, predicting the cancer relevance of peer-reviewed publications.</p><p><strong>Materials and methods: </strong>Two datasets were used: (1) manually curated publications labeled for cancer relevance, co-authored by members of The University of Kansas Cancer Center (KUCC) and (2) a derived dataset containing cancer-related abstracts from American Association for Cancer Research journals and noncancer-related abstracts from other medical journals. Two text encoding methods were explored: term frequency-inverse document frequency (TF-IDF) vectorization and various BERT embeddings. These representations served as inputs to 3 supervised machine learning classifiers: Support Vector Classification (SVC), Gradient Boosting Classification, and Multilayer Perceptron (MLP) neural networks. Model performance was evaluated by comparing predictions to the \"true\" cancer-relevant labels in a withheld test set.</p><p><strong>Results: </strong>All machine learning models performed best when trained and tested within the derived dataset. Across the datasets, SVC and MLP both exhibited strong performance, with F1 scores as high as 0.976 and 0.997, respectively. BioBERT embeddings resulted in slightly higher metrics when compared to TF-IDF vectorization across most models.</p><p><strong>Discussion: </strong>Models trained on the derived data performed very well internally; however, weaker performance was noted when these models were tested on the KUCC dataset. This finding highlights the subjective nature of cancer-relevant determinations. In contrast, KUCC trained models had high predictive performance when tested on the derived-specific classifications, showing that models trained on the KUCC dataset may be suitable for wider cancer-relevant prediction.</p><p><strong>Conclusions: </strong>Overall, our results suggest that NLP can effectively automate the classification of cancer-relevant publications, enhancing research productivity tracking; however, great care should be taken in selecting the appropriate data, text representation approach, and machine learning approach.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf156"},"PeriodicalIF":3.4,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12696645/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145757887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-06eCollection Date: 2025-12-01DOI: 10.1093/jamiaopen/ooaf165
Yunfeng Liang, Lin Zou, Millie Ming Rong Goh, Alvin Jia Hao Ngeow, Ngiap Chuan Tan, Andy Wee An Ta, Han Leong Goh
Objective: Neonatal jaundice monitoring is resource-intensive. Existing artificial intelligence methods use image or clinical data, but none systematically combine both or compare feature contributions. This study fills that gap by extracting and analyzing multimodal features on a large dataset, identifying an optimal feature set for accurate, accessible jaundice assessment.
Materials and methods: This study collected clinical data and skin images from 3 body regions of 633 neonates, generating 460 features across 4 categories. Four tree-based models were used to predict total serum bilirubin levels and feature importance analysis guided the selection of an optimal feature set.
Results: The optimal performance was achieved using the Light Gradient Boosting Machine (LGBM) model with 140 selected features, yielding a root mean square error (RMSE) of 2.0477 mg/dL and a Pearson correlation of 0.8435. This represents a performance gain of over 10% in RMSE compared to models using only a single data modality. Moreover, selecting the top 30 features based on SHapley Additive exPlanation (SHAP) allows for a substantial reduction in data dimensionality, while maintaining performance within 5% of the optimal model.
Discussion: Color features contributed over 60% of the total importance, with clinical data adding more than 25%, led by hour of life. Light temperature also affected predictions, while texture features had minimal impact. Among body regions, the abdomen provided the most informative signals for jaundice severity.
Conclusion: The proposed algorithm shows promise for real-world use by enabling timely, automated jaundice assessment for families, while also offering insights for future research and broader medical applications.
{"title":"Multimodal feature analysis for automated neonatal jaundice assessment using machine learning.","authors":"Yunfeng Liang, Lin Zou, Millie Ming Rong Goh, Alvin Jia Hao Ngeow, Ngiap Chuan Tan, Andy Wee An Ta, Han Leong Goh","doi":"10.1093/jamiaopen/ooaf165","DOIUrl":"10.1093/jamiaopen/ooaf165","url":null,"abstract":"<p><strong>Objective: </strong>Neonatal jaundice monitoring is resource-intensive. Existing artificial intelligence methods use image or clinical data, but none systematically combine both or compare feature contributions. This study fills that gap by extracting and analyzing multimodal features on a large dataset, identifying an optimal feature set for accurate, accessible jaundice assessment.</p><p><strong>Materials and methods: </strong>This study collected clinical data and skin images from 3 body regions of 633 neonates, generating 460 features across 4 categories. Four tree-based models were used to predict total serum bilirubin levels and feature importance analysis guided the selection of an optimal feature set.</p><p><strong>Results: </strong>The optimal performance was achieved using the Light Gradient Boosting Machine (LGBM) model with 140 selected features, yielding a root mean square error (RMSE) of 2.0477 mg/dL and a Pearson correlation of 0.8435. This represents a performance gain of over 10% in RMSE compared to models using only a single data modality. Moreover, selecting the top 30 features based on SHapley Additive exPlanation (SHAP) allows for a substantial reduction in data dimensionality, while maintaining performance within 5% of the optimal model.</p><p><strong>Discussion: </strong>Color features contributed over 60% of the total importance, with clinical data adding more than 25%, led by hour of life. Light temperature also affected predictions, while texture features had minimal impact. Among body regions, the abdomen provided the most informative signals for jaundice severity.</p><p><strong>Conclusion: </strong>The proposed algorithm shows promise for real-world use by enabling timely, automated jaundice assessment for families, while also offering insights for future research and broader medical applications.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 6","pages":"ooaf165"},"PeriodicalIF":3.4,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12687590/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145726113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}