Melissa A Gunderson, Peter Embí, Charles P Friedman, Genevieve B Melton
Objectives: There is rapidly growing interest in learning health systems (LHSs) nationally and globally. While the critical role of informatics is recognized, the informatics community has been relatively slow to formalize LHS as a priority area.
Materials and methods: We compiled results from a short survey of LHS leaders and American Medical Informatics Association (AMIA) members, discussion from an LHS reception at the AMIA annual meeting, and a follow-up survey to inform priorities at the intersection of LHS and informatics.
Results: We present opportunities between informatics and LHS which fell into themes of: Understanding and Context, Shared Resources, Collaboration, Education, Data, Evaluation, and Patient Centeredness. Immediate LHS informatics priorities identified include establishing informatics LHS forum(s), case reports of LHS informatics successes and failures, LHS informatics education resources, and improved understanding of LHS principles in informatics.
Conclusion: Increased informatics and LHS alignment is critical for advancing this transformative national priority.
{"title":"Opportunities for the informatics community to advance learning health systems.","authors":"Melissa A Gunderson, Peter Embí, Charles P Friedman, Genevieve B Melton","doi":"10.1093/jamia/ocae281","DOIUrl":"10.1093/jamia/ocae281","url":null,"abstract":"<p><strong>Objectives: </strong>There is rapidly growing interest in learning health systems (LHSs) nationally and globally. While the critical role of informatics is recognized, the informatics community has been relatively slow to formalize LHS as a priority area.</p><p><strong>Materials and methods: </strong>We compiled results from a short survey of LHS leaders and American Medical Informatics Association (AMIA) members, discussion from an LHS reception at the AMIA annual meeting, and a follow-up survey to inform priorities at the intersection of LHS and informatics.</p><p><strong>Results: </strong>We present opportunities between informatics and LHS which fell into themes of: Understanding and Context, Shared Resources, Collaboration, Education, Data, Evaluation, and Patient Centeredness. Immediate LHS informatics priorities identified include establishing informatics LHS forum(s), case reports of LHS informatics successes and failures, LHS informatics education resources, and improved understanding of LHS principles in informatics.</p><p><strong>Conclusion: </strong>Increased informatics and LHS alignment is critical for advancing this transformative national priority.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"253-257"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648723/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142696018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiomara T Gonzalez, Karen Steger-May, Joanna Abraham
Objectives: Successful implementation of machine learning-augmented clinical decision support systems (ML-CDSS) in perioperative care requires the prioritization of patient-centric approaches to ensure alignment with societal expectations. We assessed general public and surgical patient attitudes and perspectives on ML-CDSS use in perioperative care.
Materials and methods: A sequential explanatory study was conducted. Stage 1 collected public opinions through a survey. Stage 2 ascertained surgical patients' experiences and attitudes via focus groups and interviews.
Results: For Stage 1, a total of 281 respondents' (140 males [49.8%]) data were considered. Among participants without ML awareness, males were almost three times more likely than females to report more acceptance (OR = 2.97; 95% CI, 1.36-6.49) and embrace (OR = 2.74; 95% CI, 1.23-6.09) of ML-CDSS use by perioperative teams. Males were almost twice as likely as females to report more acceptance across all perioperative phases with ORs ranging from 1.71 to 2.07. In Stage 2, insights from 10 surgical patients revealed unanimous agreement that ML-CDSS should primarily serve a supportive function. The pre- and post-operative phases were identified explicitly as forums where ML-CDSS can enhance care delivery. Patients requested for education on ML-CDSS's role in their care to be disseminated by surgeons across multiple platforms.
Discussion and conclusion: The general public and surgical patients are receptive to ML-CDSS use throughout their perioperative care provided its role is auxiliary to perioperative teams. However, the integration of ML-CDSS into perioperative workflows presents unique challenges for healthcare settings. Insights from this study can inform strategies to support large-scale implementation and adoption of ML-CDSS by patients in all perioperative phases. Key strategies to promote the feasibility and acceptability of ML-CDSS include clinician-led discussions about ML-CDSS's role in perioperative care, established metrics to evaluate the clinical utility of ML-CDSS, and patient education.
{"title":"Just another tool in their repertoire: uncovering insights into public and patient perspectives on clinicians' use of machine learning in perioperative care.","authors":"Xiomara T Gonzalez, Karen Steger-May, Joanna Abraham","doi":"10.1093/jamia/ocae257","DOIUrl":"10.1093/jamia/ocae257","url":null,"abstract":"<p><strong>Objectives: </strong>Successful implementation of machine learning-augmented clinical decision support systems (ML-CDSS) in perioperative care requires the prioritization of patient-centric approaches to ensure alignment with societal expectations. We assessed general public and surgical patient attitudes and perspectives on ML-CDSS use in perioperative care.</p><p><strong>Materials and methods: </strong>A sequential explanatory study was conducted. Stage 1 collected public opinions through a survey. Stage 2 ascertained surgical patients' experiences and attitudes via focus groups and interviews.</p><p><strong>Results: </strong>For Stage 1, a total of 281 respondents' (140 males [49.8%]) data were considered. Among participants without ML awareness, males were almost three times more likely than females to report more acceptance (OR = 2.97; 95% CI, 1.36-6.49) and embrace (OR = 2.74; 95% CI, 1.23-6.09) of ML-CDSS use by perioperative teams. Males were almost twice as likely as females to report more acceptance across all perioperative phases with ORs ranging from 1.71 to 2.07. In Stage 2, insights from 10 surgical patients revealed unanimous agreement that ML-CDSS should primarily serve a supportive function. The pre- and post-operative phases were identified explicitly as forums where ML-CDSS can enhance care delivery. Patients requested for education on ML-CDSS's role in their care to be disseminated by surgeons across multiple platforms.</p><p><strong>Discussion and conclusion: </strong>The general public and surgical patients are receptive to ML-CDSS use throughout their perioperative care provided its role is auxiliary to perioperative teams. However, the integration of ML-CDSS into perioperative workflows presents unique challenges for healthcare settings. Insights from this study can inform strategies to support large-scale implementation and adoption of ML-CDSS by patients in all perioperative phases. Key strategies to promote the feasibility and acceptability of ML-CDSS include clinician-led discussions about ML-CDSS's role in perioperative care, established metrics to evaluate the clinical utility of ML-CDSS, and patient education.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"150-162"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648718/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Balu Bhasuran, Katharina Schmolly, Yuvraaj Kapoor, Nanditha Lakshmi Jayakumar, Raymond Doan, Jigar Amin, Stephen Meninger, Nathan Cheng, Robert Deering, Karl Anderson, Simon W Beaven, Bruce Wang, Vivek A Rudrapatna
Background: Acute hepatic porphyria (AHP) is a group of rare but treatable conditions associated with diagnostic delays of 15 years on average. The advent of electronic health records (EHR) data and machine learning (ML) may improve the timely recognition of rare diseases like AHP. However, prediction models can be difficult to train given the limited case numbers, unstructured EHR data, and selection biases intrinsic to healthcare delivery. We sought to train and characterize models for identifying patients with AHP.
Methods: This diagnostic study used structured and notes-based EHR data from 2 centers at the University of California, UCSF (2012-2022) and UCLA (2019-2022). The data were split into 2 cohorts (referral and diagnosis) and used to develop models that predict (1) who will be referred for testing of acute porphyria, among those who presented with abdominal pain (a cardinal symptom of AHP), and (2) who will test positive, among those referred. The referral cohort consisted of 747 patients referred for testing and 99 849 contemporaneous patients who were not. The diagnosis cohort consisted of 72 confirmed AHP cases and 347 patients who tested negative. The case cohort was 81% female and 6-75 years old at the time of diagnosis. Candidate models used a range of architectures. Feature selection was semi-automated and incorporated publicly available data from knowledge graphs. Our primary outcome was the F-score on an outcome-stratified test set.
Results: The best center-specific referral models achieved an F-score of 86%-91%. The best diagnosis model achieved an F-score of 92%. To further test our model, we contacted 372 current patients who lack an AHP diagnosis but were predicted by our models as potentially having it (≥10% probability of referral, ≥50% of testing positive). However, we were only able to recruit 10 of these patients for biochemical testing, all of whom were negative. Nonetheless, post hoc evaluations suggested that these models could identify 71% of cases earlier than their diagnosis date, saving 1.2 years.
Conclusions: ML can reduce diagnostic delays in AHP and other rare diseases. Robust recruitment strategies and multicenter coordination will be needed to validate these models before they can be deployed.
{"title":"Reducing diagnostic delays in acute hepatic porphyria using health records data and machine learning.","authors":"Balu Bhasuran, Katharina Schmolly, Yuvraaj Kapoor, Nanditha Lakshmi Jayakumar, Raymond Doan, Jigar Amin, Stephen Meninger, Nathan Cheng, Robert Deering, Karl Anderson, Simon W Beaven, Bruce Wang, Vivek A Rudrapatna","doi":"10.1093/jamia/ocae141","DOIUrl":"10.1093/jamia/ocae141","url":null,"abstract":"<p><strong>Background: </strong>Acute hepatic porphyria (AHP) is a group of rare but treatable conditions associated with diagnostic delays of 15 years on average. The advent of electronic health records (EHR) data and machine learning (ML) may improve the timely recognition of rare diseases like AHP. However, prediction models can be difficult to train given the limited case numbers, unstructured EHR data, and selection biases intrinsic to healthcare delivery. We sought to train and characterize models for identifying patients with AHP.</p><p><strong>Methods: </strong>This diagnostic study used structured and notes-based EHR data from 2 centers at the University of California, UCSF (2012-2022) and UCLA (2019-2022). The data were split into 2 cohorts (referral and diagnosis) and used to develop models that predict (1) who will be referred for testing of acute porphyria, among those who presented with abdominal pain (a cardinal symptom of AHP), and (2) who will test positive, among those referred. The referral cohort consisted of 747 patients referred for testing and 99 849 contemporaneous patients who were not. The diagnosis cohort consisted of 72 confirmed AHP cases and 347 patients who tested negative. The case cohort was 81% female and 6-75 years old at the time of diagnosis. Candidate models used a range of architectures. Feature selection was semi-automated and incorporated publicly available data from knowledge graphs. Our primary outcome was the F-score on an outcome-stratified test set.</p><p><strong>Results: </strong>The best center-specific referral models achieved an F-score of 86%-91%. The best diagnosis model achieved an F-score of 92%. To further test our model, we contacted 372 current patients who lack an AHP diagnosis but were predicted by our models as potentially having it (≥10% probability of referral, ≥50% of testing positive). However, we were only able to recruit 10 of these patients for biochemical testing, all of whom were negative. Nonetheless, post hoc evaluations suggested that these models could identify 71% of cases earlier than their diagnosis date, saving 1.2 years.</p><p><strong>Conclusions: </strong>ML can reduce diagnostic delays in AHP and other rare diseases. Robust recruitment strategies and multicenter coordination will be needed to validate these models before they can be deployed.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"63-70"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648717/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141472084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective: To understand barriers to obtaining and using interoperable information at US hospitals.
Materials and methods: Using 2023 nationally representative survey data on US hospitals (N = 2420), we examined major and minor barriers to exchanging information with other organizations, and how barriers vary by hospital characteristics and methods used to obtain information. Using a series of regression models, we examined how hospital experiences with barriers relate to routine use of information at responding hospitals.
Results: In 2023, most hospitals experienced at least one minor (81%) or major (62%) barrier to exchange, with the most common major barriers relating to different vendors and exchange partners' capabilities. Higher-resourced hospitals and those often using network-based exchange tended to experience more minor barriers whereas lower-resourced hospitals and those often using mail/fax or direct access to outside electronic health records experienced more major barriers. In multivariate regression, hospitals indicating "Patient matching" and "Costs to exchange" were a major or minor barrier had the strongest independent negative association with the likelihood of reporting providers at their hospital frequently use information from outside organizations.
Discussion: Despite progress in interoperable exchange, various barriers remain. The prevalence of barriers varied by hospital type and methods used, with barriers more often preventing exchange for lower-resourced hospitals and those using outdated exchange methods.
Conclusion: While several technical and policy efforts are underway to address prevalent barriers, it will be important to monitor whether efforts are successful in ensuring information from outside organizations can be seamlessly exchanged and used to inform patient care.
{"title":"Barriers to obtaining and using interoperable information among non-federal acute care hospitals.","authors":"Jordan Everson, Chelsea Richwine","doi":"10.1093/jamia/ocae263","DOIUrl":"10.1093/jamia/ocae263","url":null,"abstract":"<p><strong>Objective: </strong>To understand barriers to obtaining and using interoperable information at US hospitals.</p><p><strong>Materials and methods: </strong>Using 2023 nationally representative survey data on US hospitals (N = 2420), we examined major and minor barriers to exchanging information with other organizations, and how barriers vary by hospital characteristics and methods used to obtain information. Using a series of regression models, we examined how hospital experiences with barriers relate to routine use of information at responding hospitals.</p><p><strong>Results: </strong>In 2023, most hospitals experienced at least one minor (81%) or major (62%) barrier to exchange, with the most common major barriers relating to different vendors and exchange partners' capabilities. Higher-resourced hospitals and those often using network-based exchange tended to experience more minor barriers whereas lower-resourced hospitals and those often using mail/fax or direct access to outside electronic health records experienced more major barriers. In multivariate regression, hospitals indicating \"Patient matching\" and \"Costs to exchange\" were a major or minor barrier had the strongest independent negative association with the likelihood of reporting providers at their hospital frequently use information from outside organizations.</p><p><strong>Discussion: </strong>Despite progress in interoperable exchange, various barriers remain. The prevalence of barriers varied by hospital type and methods used, with barriers more often preventing exchange for lower-resourced hospitals and those using outdated exchange methods.</p><p><strong>Conclusion: </strong>While several technical and policy efforts are underway to address prevalent barriers, it will be important to monitor whether efforts are successful in ensuring information from outside organizations can be seamlessly exchanged and used to inform patient care.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"20-27"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648725/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christine A Sinsky, Lisa Rotenstein, A Jay Holmgren, Nate C Apathy
Objective: To quantify how many patient scheduled hours would result in a 40-h work week (PSH40) for ambulatory physicians and to determine how PSH40 varies by specialty and practice type.
Methods: We calculated PSH40 for 186 188 ambulatory physicians across 395 organizations from November 2021 through April 2022 stratified by specialty.
Results: Median PSH40 for the sample was 33.2 h (IQR: 28.7-36.5). PSH40 was lowest in infectious disease (26.2, IQR: 21.6-31.1), geriatrics (27.2, IQR: 21.5-32.0) and hematology (28.6, IQR: 23.6-32.6) and highest in plastic surgery (35.7, IQR: 32.8-37.7), pain medicine (35.8, IQR: 32.6-37.9) and sports medicine (36.0, IQR: 33.3-38.1).
Discussion: Health system leaders and physicians will benefit from data driven and transparent discussions about work hour expectations. The PSH40 measure can also be used to quantify the impact of variations in the clinical care environment on the in-person ambulatory patient care time available to physicians.
Conclusions: PSH40 is a novel measure that can be generated from vendor-derived metrics and used by operational leaders to inform work expectations. It can also support research into the impact of changes in the care environment on physicians' workload and capacity.
{"title":"The number of patient scheduled hours resulting in a 40-hour work week by physician specialty and setting: a cross-sectional study using electronic health record event log data.","authors":"Christine A Sinsky, Lisa Rotenstein, A Jay Holmgren, Nate C Apathy","doi":"10.1093/jamia/ocae266","DOIUrl":"10.1093/jamia/ocae266","url":null,"abstract":"<p><strong>Objective: </strong>To quantify how many patient scheduled hours would result in a 40-h work week (PSH40) for ambulatory physicians and to determine how PSH40 varies by specialty and practice type.</p><p><strong>Methods: </strong>We calculated PSH40 for 186 188 ambulatory physicians across 395 organizations from November 2021 through April 2022 stratified by specialty.</p><p><strong>Results: </strong>Median PSH40 for the sample was 33.2 h (IQR: 28.7-36.5). PSH40 was lowest in infectious disease (26.2, IQR: 21.6-31.1), geriatrics (27.2, IQR: 21.5-32.0) and hematology (28.6, IQR: 23.6-32.6) and highest in plastic surgery (35.7, IQR: 32.8-37.7), pain medicine (35.8, IQR: 32.6-37.9) and sports medicine (36.0, IQR: 33.3-38.1).</p><p><strong>Discussion: </strong>Health system leaders and physicians will benefit from data driven and transparent discussions about work hour expectations. The PSH40 measure can also be used to quantify the impact of variations in the clinical care environment on the in-person ambulatory patient care time available to physicians.</p><p><strong>Conclusions: </strong>PSH40 is a novel measure that can be generated from vendor-derived metrics and used by operational leaders to inform work expectations. It can also support research into the impact of changes in the care environment on physicians' workload and capacity.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"235-240"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648726/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective: We proposed adopting billing models for secure messaging (SM) telehealth services that move beyond time-based metrics, focusing on the complexity and clinical expertise involved in patient care.
Materials and methods: We trained 8 classification machine learning (ML) models using providers' electronic health record (EHR) audit log data for patient-initiated non-urgent messages. Mixed effect modeling (MEM) analyzed significance.
Results: Accuracy and area under the receiver operating characteristics curve scores generally exceeded 0.85, demonstrating robust performance. MEM showed that knowledge domains significantly influenced SM billing, explaining nearly 40% of the variance.
Discussion: This study demonstrates that ML models using EHR audit log data can improve and predict billing in SM telehealth services, supporting billing models that reflect clinical complexity and expertise rather than time-based metrics.
Conclusion: Our research highlights the need for SM billing models beyond time-based metrics, using EHR audit log data to capture the true value of clinical work.
目的:我们建议对安全信息传送(SM)远程医疗服务采用计费模式:我们建议对安全信息(SM)远程医疗服务采用计费模式,这种模式超越了基于时间的衡量标准,侧重于患者护理所涉及的复杂性和临床专业知识:我们使用医疗服务提供者的电子健康记录(EHR)审计日志数据,针对患者发起的非紧急信息训练了 8 个分类机器学习(ML)模型。混合效应建模(MEM)分析了显著性:结果:准确率和接收者工作特征曲线下面积得分普遍超过 0.85,显示出强大的性能。混合效应模型显示,知识域对 SM 计费有显著影响,解释了近 40% 的方差:本研究表明,使用 EHR 审计日志数据的 ML 模型可以改进和预测 SM 远程医疗服务的计费,支持反映临床复杂性和专业知识而非基于时间指标的计费模型:我们的研究强调,除了基于时间的指标外,还需要使用电子病历审计日志数据来捕捉临床工作的真正价值,从而建立 SM 计费模型。
{"title":"Secure messaging telehealth billing in the digital age: moving beyond time-based metrics.","authors":"Dong-Gil Ko, Umberto Tachinardi, Eric J Warm","doi":"10.1093/jamia/ocae250","DOIUrl":"10.1093/jamia/ocae250","url":null,"abstract":"<p><strong>Objective: </strong>We proposed adopting billing models for secure messaging (SM) telehealth services that move beyond time-based metrics, focusing on the complexity and clinical expertise involved in patient care.</p><p><strong>Materials and methods: </strong>We trained 8 classification machine learning (ML) models using providers' electronic health record (EHR) audit log data for patient-initiated non-urgent messages. Mixed effect modeling (MEM) analyzed significance.</p><p><strong>Results: </strong>Accuracy and area under the receiver operating characteristics curve scores generally exceeded 0.85, demonstrating robust performance. MEM showed that knowledge domains significantly influenced SM billing, explaining nearly 40% of the variance.</p><p><strong>Discussion: </strong>This study demonstrates that ML models using EHR audit log data can improve and predict billing in SM telehealth services, supporting billing models that reflect clinical complexity and expertise rather than time-based metrics.</p><p><strong>Conclusion: </strong>Our research highlights the need for SM billing models beyond time-based metrics, using EHR audit log data to capture the true value of clinical work.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"230-234"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648735/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jejo D Koola, Karthik Ramesh, Jialin Mao, Minyoung Ahn, Sharon E Davis, Usha Govindarajulu, Amy M Perkins, Dax Westerman, Henry Ssemaganda, Theodore Speroff, Lucila Ohno-Machado, Craig R Ramsay, Art Sedrakyan, Frederic S Resnic, Michael E Matheny
Objectives: Traditional methods for medical device post-market surveillance often fail to accurately account for operator learning effects, leading to biased assessments of device safety. These methods struggle with non-linearity, complex learning curves, and time-varying covariates, such as physician experience. To address these limitations, we sought to develop a machine learning (ML) framework to detect and adjust for operator learning effects.
Materials and methods: A gradient-boosted decision tree ML method was used to analyze synthetic datasets that replicate the complexity of clinical scenarios involving high-risk medical devices. We designed this process to detect learning effects using a risk-adjusted cumulative sum method, quantify the excess adverse event rate attributable to operator inexperience, and adjust for these alongside patient factors in evaluating device safety signals. To maintain integrity, we employed blinding between data generation and analysis teams. Synthetic data used underlying distributions and patient feature correlations based on clinical data from the Department of Veterans Affairs between 2005 and 2012. We generated 2494 synthetic datasets with widely varying characteristics including number of patient features, operators and institutions, and the operator learning form. Each dataset contained a hypothetical study device, Device B, and a reference device, Device A. We evaluated accuracy in identifying learning effects and identifying and estimating the strength of the device safety signal. Our approach also evaluated different clinically relevant thresholds for safety signal detection.
Results: Our framework accurately identified the presence or absence of learning effects in 93.6% of datasets and correctly determined device safety signals in 93.4% of cases. The estimated device odds ratios' 95% confidence intervals were accurately aligned with the specified ratios in 94.7% of datasets. In contrast, a comparative model excluding operator learning effects significantly underperformed in detecting device signals and in accuracy. Notably, our framework achieved 100% specificity for clinically relevant safety signal thresholds, although sensitivity varied with the threshold applied.
Discussion: A machine learning framework, tailored for the complexities of post-market device evaluation, may provide superior performance compared to standard parametric techniques when operator learning is present.
Conclusion: Demonstrating the capacity of ML to overcome complex evaluative challenges, our framework addresses the limitations of traditional statistical methods in current post-market surveillance processes. By offering a reliable means to detect and adjust for learning effects, it may significantly improve medical device safety evaluation.
{"title":"A machine learning framework to adjust for learning effects in medical device safety evaluation.","authors":"Jejo D Koola, Karthik Ramesh, Jialin Mao, Minyoung Ahn, Sharon E Davis, Usha Govindarajulu, Amy M Perkins, Dax Westerman, Henry Ssemaganda, Theodore Speroff, Lucila Ohno-Machado, Craig R Ramsay, Art Sedrakyan, Frederic S Resnic, Michael E Matheny","doi":"10.1093/jamia/ocae273","DOIUrl":"10.1093/jamia/ocae273","url":null,"abstract":"<p><strong>Objectives: </strong>Traditional methods for medical device post-market surveillance often fail to accurately account for operator learning effects, leading to biased assessments of device safety. These methods struggle with non-linearity, complex learning curves, and time-varying covariates, such as physician experience. To address these limitations, we sought to develop a machine learning (ML) framework to detect and adjust for operator learning effects.</p><p><strong>Materials and methods: </strong>A gradient-boosted decision tree ML method was used to analyze synthetic datasets that replicate the complexity of clinical scenarios involving high-risk medical devices. We designed this process to detect learning effects using a risk-adjusted cumulative sum method, quantify the excess adverse event rate attributable to operator inexperience, and adjust for these alongside patient factors in evaluating device safety signals. To maintain integrity, we employed blinding between data generation and analysis teams. Synthetic data used underlying distributions and patient feature correlations based on clinical data from the Department of Veterans Affairs between 2005 and 2012. We generated 2494 synthetic datasets with widely varying characteristics including number of patient features, operators and institutions, and the operator learning form. Each dataset contained a hypothetical study device, Device B, and a reference device, Device A. We evaluated accuracy in identifying learning effects and identifying and estimating the strength of the device safety signal. Our approach also evaluated different clinically relevant thresholds for safety signal detection.</p><p><strong>Results: </strong>Our framework accurately identified the presence or absence of learning effects in 93.6% of datasets and correctly determined device safety signals in 93.4% of cases. The estimated device odds ratios' 95% confidence intervals were accurately aligned with the specified ratios in 94.7% of datasets. In contrast, a comparative model excluding operator learning effects significantly underperformed in detecting device signals and in accuracy. Notably, our framework achieved 100% specificity for clinically relevant safety signal thresholds, although sensitivity varied with the threshold applied.</p><p><strong>Discussion: </strong>A machine learning framework, tailored for the complexities of post-market device evaluation, may provide superior performance compared to standard parametric techniques when operator learning is present.</p><p><strong>Conclusion: </strong>Demonstrating the capacity of ML to overcome complex evaluative challenges, our framework addresses the limitations of traditional statistical methods in current post-market surveillance processes. By offering a reliable means to detect and adjust for learning effects, it may significantly improve medical device safety evaluation.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"206-217"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648715/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142548633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Marheim Storås, Steffen Mæland, Jonas L Isaksen, Steven Alexander Hicks, Vajira Thambawita, Claus Graff, Hugo Lewi Hammer, Pål Halvorsen, Michael Alexander Riegler, Jørgen K Kanters
Objective: Evaluate popular explanation methods using heatmap visualizations to explain the predictions of deep neural networks for electrocardiogram (ECG) analysis and provide recommendations for selection of explanations methods.
Materials and methods: A residual deep neural network was trained on ECGs to predict intervals and amplitudes. Nine commonly used explanation methods (Saliency, Deconvolution, Guided backpropagation, Gradient SHAP, SmoothGrad, Input × gradient, DeepLIFT, Integrated gradients, GradCAM) were qualitatively evaluated by medical experts and objectively evaluated using a perturbation-based method.
Results: No single explanation method consistently outperformed the other methods, but some methods were clearly inferior. We found considerable disagreement between the human expert evaluation and the objective evaluation by perturbation.
Discussion: The best explanation method depended on the ECG measure. To ensure that future explanations of deep neural networks for medical data analyses are useful to medical experts, data scientists developing new explanation methods should collaborate tightly with domain experts. Because there is no explanation method that performs best in all use cases, several methods should be applied.
Conclusion: Several explanation methods should be used to determine the most suitable approach.
{"title":"Evaluating gradient-based explanation methods for neural network ECG analysis using heatmaps.","authors":"Andrea Marheim Storås, Steffen Mæland, Jonas L Isaksen, Steven Alexander Hicks, Vajira Thambawita, Claus Graff, Hugo Lewi Hammer, Pål Halvorsen, Michael Alexander Riegler, Jørgen K Kanters","doi":"10.1093/jamia/ocae280","DOIUrl":"10.1093/jamia/ocae280","url":null,"abstract":"<p><strong>Objective: </strong>Evaluate popular explanation methods using heatmap visualizations to explain the predictions of deep neural networks for electrocardiogram (ECG) analysis and provide recommendations for selection of explanations methods.</p><p><strong>Materials and methods: </strong>A residual deep neural network was trained on ECGs to predict intervals and amplitudes. Nine commonly used explanation methods (Saliency, Deconvolution, Guided backpropagation, Gradient SHAP, SmoothGrad, Input × gradient, DeepLIFT, Integrated gradients, GradCAM) were qualitatively evaluated by medical experts and objectively evaluated using a perturbation-based method.</p><p><strong>Results: </strong>No single explanation method consistently outperformed the other methods, but some methods were clearly inferior. We found considerable disagreement between the human expert evaluation and the objective evaluation by perturbation.</p><p><strong>Discussion: </strong>The best explanation method depended on the ECG measure. To ensure that future explanations of deep neural networks for medical data analyses are useful to medical experts, data scientists developing new explanation methods should collaborate tightly with domain experts. Because there is no explanation method that performs best in all use cases, several methods should be applied.</p><p><strong>Conclusion: </strong>Several explanation methods should be used to determine the most suitable approach.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"79-88"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Braja Gopal Patra, Lauren A Lepow, Praneet Kasi Reddy Jagadeesh Kumar, Veer Vekaria, Mohit Manoj Sharma, Prakash Adekkanattu, Brian Fennessy, Gavin Hynes, Isotta Landi, Jorge A Sanchez-Ruiz, Euijung Ryu, Joanna M Biernacka, Girish N Nadkarni, Ardesheer Talati, Myrna Weissman, Mark Olfson, J John Mann, Yiye Zhang, Alexander W Charney, Jyotishman Pathak
Objectives: Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented in narrative clinical notes rather than as structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of extraction of such information.
Materials and methods: Psychiatric encounter notes from Mount Sinai Health System (MSHS, n = 300) and Weill Cornell Medicine (WCM, n = 225) were annotated to create a gold-standard corpus. A rule-based system (RBS) involving lexicons and a large language model (LLM) using FLAN-T5-XL were developed to identify mentions of SS and SI and their subcategories (eg, social network, instrumental support, and loneliness).
Results: For extracting SS/SI, the RBS obtained higher macroaveraged F1-scores than the LLM at both MSHS (0.89 versus 0.65) and WCM (0.85 versus 0.82). For extracting the subcategories, the RBS also outperformed the LLM at both MSHS (0.90 versus 0.62) and WCM (0.82 versus 0.81).
Discussion and conclusion: Unexpectedly, the RBS outperformed the LLMs across all metrics. An intensive review demonstrates that this finding is due to the divergent approach taken by the RBS and LLM. The RBS was designed and refined to follow the same specific rules as the gold-standard annotations. Conversely, the LLM was more inclusive with categorization and conformed to common English-language understanding. Both approaches offer advantages, although additional replication studies are warranted.
{"title":"Extracting social support and social isolation information from clinical psychiatry notes: comparing a rule-based natural language processing system and a large language model.","authors":"Braja Gopal Patra, Lauren A Lepow, Praneet Kasi Reddy Jagadeesh Kumar, Veer Vekaria, Mohit Manoj Sharma, Prakash Adekkanattu, Brian Fennessy, Gavin Hynes, Isotta Landi, Jorge A Sanchez-Ruiz, Euijung Ryu, Joanna M Biernacka, Girish N Nadkarni, Ardesheer Talati, Myrna Weissman, Mark Olfson, J John Mann, Yiye Zhang, Alexander W Charney, Jyotishman Pathak","doi":"10.1093/jamia/ocae260","DOIUrl":"10.1093/jamia/ocae260","url":null,"abstract":"<p><strong>Objectives: </strong>Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented in narrative clinical notes rather than as structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of extraction of such information.</p><p><strong>Materials and methods: </strong>Psychiatric encounter notes from Mount Sinai Health System (MSHS, n = 300) and Weill Cornell Medicine (WCM, n = 225) were annotated to create a gold-standard corpus. A rule-based system (RBS) involving lexicons and a large language model (LLM) using FLAN-T5-XL were developed to identify mentions of SS and SI and their subcategories (eg, social network, instrumental support, and loneliness).</p><p><strong>Results: </strong>For extracting SS/SI, the RBS obtained higher macroaveraged F1-scores than the LLM at both MSHS (0.89 versus 0.65) and WCM (0.85 versus 0.82). For extracting the subcategories, the RBS also outperformed the LLM at both MSHS (0.90 versus 0.62) and WCM (0.82 versus 0.81).</p><p><strong>Discussion and conclusion: </strong>Unexpectedly, the RBS outperformed the LLMs across all metrics. An intensive review demonstrates that this finding is due to the divergent approach taken by the RBS and LLM. The RBS was designed and refined to follow the same specific rules as the gold-standard annotations. Conversely, the LLM was more inclusive with categorization and conformed to common English-language understanding. Both approaches offer advantages, although additional replication studies are warranted.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"218-226"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648716/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: Measuring interpersonal firearm violence: natural language processing methods to address limitations in criminal charge data.","authors":"","doi":"10.1093/jamia/ocae268","DOIUrl":"10.1093/jamia/ocae268","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"264"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648706/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}