首页 > 最新文献

BMJ Health & Care Informatics最新文献

英文 中文
Explainable AI for mortality prediction: a comparative study using the MIMIC-III dataset. 可解释的人工智能死亡率预测:使用MIMIC-III数据集的比较研究。
IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-26 DOI: 10.1136/bmjhci-2024-101406
Niusha Shafiabady, Dave Akume, Mohammadreza Haghighat, Fareed Ud Din, Kabir Sattarshetty, Asif Karim, Jianlong Zhou, Ethar Alsharaydeh

Objectives: Predicting mortality is vital for tailoring treatments, improving care and reducing costs. Machine learning (ML) has shown strong potential, often outperforming traditional severity-of-illness scoring systems in intensive care units (ICUs). However, the black-box nature of ML limits adoption. This study evaluates the accuracy of several ML algorithms on the Medical Information Mart for Intensive Care III (MIMIC-III) dataset and applies explainable artificial intelligence to identify variables influencing ICU mortality.

Methods: A retrospective cohort of 600 MIMIC-III patient records was analysed. ML algorithms tested included support vector machine (SVM), K-nearest neighbour (KNN), decision trees (DT), gradient boosting (GB), random forests (RF), Naive Bayes (NB), logistic regression (LR) and extra trees (ET). Models were assessed with threefold cross-validation using F1 Score, sensitivity, specificity, confusion matrix and accuracy. SHapley Additive exPlanations (SHAP) was applied to explain key factors influencing predictions.

Results: Among the eight algorithms evaluated, ET and GB achieved the highest accuracy (98.33% and 98.23%, respectively) with F1-scores above 96%. SVM also performed strongly (97.50% accuracy, F1=94.34%). RF and DT yielded accuracies of 95.00% and 94.17%, respectively. NB reached 96.67% accuracy with 100% recall, while KNN and LR showed lower discriminative performance (76.67% and 75.83% accuracy, respectively).SHAP, LIME and Shapley values consistently identified hypertension, tumours and endocrine and digestive disease as the leading predictors of mortality.

Discussion: Findings highlight ML's potential to optimise ICU decision-making and support clinicians. However, reliance on retrospective data remains a limitation, and clinical validation is required.

Conclusion: ML algorithms are highly effective for mortality prediction, and explainability is key for trust and adoption. When combined, accuracy and interpretability enable ML to safely support informed ICU decisions and improve patient outcomes.

目的:预测死亡率对调整治疗、改善护理和降低费用至关重要。机器学习(ML)已经显示出强大的潜力,通常在重症监护病房(icu)中优于传统的疾病严重程度评分系统。然而,机器学习的黑箱特性限制了它的采用。本研究评估了重症监护医学信息市场III (MIMIC-III)数据集上几种ML算法的准确性,并应用可解释的人工智能来识别影响ICU死亡率的变量。方法:对600例MIMIC-III患者进行回顾性队列分析。测试的机器学习算法包括支持向量机(SVM)、k近邻(KNN)、决策树(DT)、梯度增强(GB)、随机森林(RF)、朴素贝叶斯(NB)、逻辑回归(LR)和额外树(ET)。采用F1评分、敏感性、特异性、混淆矩阵和准确性对模型进行三重交叉验证。采用SHapley加性解释(SHAP)解释影响预测的关键因素。结果:在评估的8种算法中,ET和GB的准确率最高(分别为98.33%和98.23%),f1得分均在96%以上。SVM表现也很好(准确率97.50%,F1=94.34%)。RF和DT的准确度分别为95.00%和94.17%。NB的准确率达到96.67%,召回率为100%,而KNN和LR的区分准确率较低,分别为76.67%和75.83%。SHAP、LIME和Shapley值一致认为高血压、肿瘤、内分泌和消化系统疾病是死亡率的主要预测因素。讨论:研究结果强调ML在优化ICU决策和支持临床医生方面的潜力。然而,对回顾性数据的依赖仍然有局限性,需要临床验证。结论:ML算法对死亡率预测是非常有效的,可解释性是信任和采用的关键。当结合使用时,准确性和可解释性使ML能够安全地支持ICU的明智决策并改善患者的预后。
{"title":"Explainable AI for mortality prediction: a comparative study using the MIMIC-III dataset.","authors":"Niusha Shafiabady, Dave Akume, Mohammadreza Haghighat, Fareed Ud Din, Kabir Sattarshetty, Asif Karim, Jianlong Zhou, Ethar Alsharaydeh","doi":"10.1136/bmjhci-2024-101406","DOIUrl":"10.1136/bmjhci-2024-101406","url":null,"abstract":"<p><strong>Objectives: </strong>Predicting mortality is vital for tailoring treatments, improving care and reducing costs. Machine learning (ML) has shown strong potential, often outperforming traditional severity-of-illness scoring systems in intensive care units (ICUs). However, the black-box nature of ML limits adoption. This study evaluates the accuracy of several ML algorithms on the Medical Information Mart for Intensive Care III (MIMIC-III) dataset and applies explainable artificial intelligence to identify variables influencing ICU mortality.</p><p><strong>Methods: </strong>A retrospective cohort of 600 MIMIC-III patient records was analysed. ML algorithms tested included support vector machine (SVM), K-nearest neighbour (KNN), decision trees (DT), gradient boosting (GB), random forests (RF), Naive Bayes (NB), logistic regression (LR) and extra trees (ET). Models were assessed with threefold cross-validation using F1 Score, sensitivity, specificity, confusion matrix and accuracy. SHapley Additive exPlanations (SHAP) was applied to explain key factors influencing predictions.</p><p><strong>Results: </strong>Among the eight algorithms evaluated, ET and GB achieved the highest accuracy (98.33% and 98.23%, respectively) with F1-scores above 96%. SVM also performed strongly (97.50% accuracy, F1=94.34%). RF and DT yielded accuracies of 95.00% and 94.17%, respectively. NB reached 96.67% accuracy with 100% recall, while KNN and LR showed lower discriminative performance (76.67% and 75.83% accuracy, respectively).SHAP, LIME and Shapley values consistently identified hypertension, tumours and endocrine and digestive disease as the leading predictors of mortality.</p><p><strong>Discussion: </strong>Findings highlight ML's potential to optimise ICU decision-making and support clinicians. However, reliance on retrospective data remains a limitation, and clinical validation is required.</p><p><strong>Conclusion: </strong>ML algorithms are highly effective for mortality prediction, and explainability is key for trust and adoption. When combined, accuracy and interpretability enable ML to safely support informed ICU decisions and improve patient outcomes.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12959059/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147302218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Longitudinal multisource clinical model for early lung cancer risk stratification and screening. 早期肺癌风险分层与筛查的纵向多源临床模型。
IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-24 DOI: 10.1136/bmjhci-2025-101989
Chia-Hui Chien, Shih-Chuan Chang, Yung-Chun Chang, Yu-Chuan Li

Objectives: Lung cancer is the leading cause of cancer-related mortality worldwide, with poor prognosis largely due to late-stage diagnosis. Current screening methods such as low-dose CT face accessibility and cost barriers in resource-limited settings. This study develops a lightweight multichannel convolutional neural network for lung cancer screening support through longitudinal risk stratification using routine pre-diagnostic healthcare data.

Methods: We conducted a retrospective cohort study using Taiwan's National Health Insurance Research Database, comprising 99 615 individuals (575 lung cancer cases; 99 040 non-cancer controls). Diagnostic codes, medication records and medical orders within a 36-month observation window were extracted. Log-likelihood ratio feature selection was implemented to reduce dimensionality, achieving 99.8% reduction in computational requirements while retaining clinical relevance. A multichannel Convolutional Neural Network (CNN) architecture was designed to process these heterogeneous data modalities simultaneously.

Results: The proposed method achieved an F₁-score of 0.5738, precision of 0.7149, Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.8316 and Area Under the Precision-Recall Curve (AUPRC) of 0.1617, outperforming baseline methods in precision and F₁-score. Ablation studies confirmed that medical orders provide primary predictive value, while medication features contribute limited discriminative signal in the pre-diagnostic phase. SHapley Additive exPlanations analysis revealed that routine healthcare utilisation patterns, rather than cancer-specific features, drive risk stratification.

Discussion: The lightweight architecture enables deployment in resource-constrained clinical environments while maintaining robust performance, offering potential as a preliminary screening tool to identify high-risk individuals for further diagnostic examination.

Conclusion: Efficient deep learning models using routine clinical data can facilitate lung cancer risk stratification and screening, providing a scalable solution for clinical implementation.

目的:肺癌是世界范围内癌症相关死亡的主要原因,其预后差主要是由于晚期诊断。目前的筛查方法,如低剂量CT,在资源有限的情况下面临可及性和成本障碍。本研究开发了一个轻量级的多通道卷积神经网络,通过使用常规诊断前医疗保健数据进行纵向风险分层,为肺癌筛查提供支持。方法:我们使用台湾全民健康保险研究数据库进行回顾性队列研究,包括99615人(575例肺癌患者;99040例非癌症对照)。提取36个月观察窗口内的诊断代码、用药记录和医嘱。采用对数似然比特征选择来降低维数,在保留临床相关性的同时,计算需求降低了99.8%。设计了一种多通道卷积神经网络(CNN)架构来同时处理这些异构数据模式。结果:该方法的F₁得分为0.5738,精密度为0.7149,受试者工作特征曲线下面积(AUROC)为0.8316,精密度-召回率曲线下面积(AUPRC)为0.1617,在精密度和F₁得分方面优于基线方法。消融研究证实医嘱提供主要的预测价值,而药物特征在诊断前阶段提供有限的鉴别信号。SHapley加性解释分析显示,驱动风险分层的是常规的医疗保健利用模式,而不是癌症的特定特征。讨论:轻量级体系结构可以在资源受限的临床环境中部署,同时保持强大的性能,提供了作为初步筛选工具的潜力,可以识别高风险个体,以便进行进一步的诊断检查。结论:基于常规临床数据的高效深度学习模型可以促进肺癌风险分层和筛查,为临床实施提供可扩展的解决方案。
{"title":"Longitudinal multisource clinical model for early lung cancer risk stratification and screening.","authors":"Chia-Hui Chien, Shih-Chuan Chang, Yung-Chun Chang, Yu-Chuan Li","doi":"10.1136/bmjhci-2025-101989","DOIUrl":"10.1136/bmjhci-2025-101989","url":null,"abstract":"<p><strong>Objectives: </strong>Lung cancer is the leading cause of cancer-related mortality worldwide, with poor prognosis largely due to late-stage diagnosis. Current screening methods such as low-dose CT face accessibility and cost barriers in resource-limited settings. This study develops a lightweight multichannel convolutional neural network for lung cancer screening support through longitudinal risk stratification using routine pre-diagnostic healthcare data.</p><p><strong>Methods: </strong>We conducted a retrospective cohort study using Taiwan's National Health Insurance Research Database, comprising 99 615 individuals (575 lung cancer cases; 99 040 non-cancer controls). Diagnostic codes, medication records and medical orders within a 36-month observation window were extracted. Log-likelihood ratio feature selection was implemented to reduce dimensionality, achieving 99.8% reduction in computational requirements while retaining clinical relevance. A multichannel Convolutional Neural Network (CNN) architecture was designed to process these heterogeneous data modalities simultaneously.</p><p><strong>Results: </strong>The proposed method achieved an F₁-score of 0.5738, precision of 0.7149, Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.8316 and Area Under the Precision-Recall Curve (AUPRC) of 0.1617, outperforming baseline methods in precision and F₁-score. Ablation studies confirmed that medical orders provide primary predictive value, while medication features contribute limited discriminative signal in the pre-diagnostic phase. SHapley Additive exPlanations analysis revealed that routine healthcare utilisation patterns, rather than cancer-specific features, drive risk stratification.</p><p><strong>Discussion: </strong>The lightweight architecture enables deployment in resource-constrained clinical environments while maintaining robust performance, offering potential as a preliminary screening tool to identify high-risk individuals for further diagnostic examination.</p><p><strong>Conclusion: </strong>Efficient deep learning models using routine clinical data can facilitate lung cancer risk stratification and screening, providing a scalable solution for clinical implementation.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12933799/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147282428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bringing AI to the OR: integrating a machine learning predictive model in the EHR - a pilot on user-friendliness. 将人工智能带入手术室:在电子病历中集成机器学习预测模型-用户友好试点。
IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-24 DOI: 10.1136/bmjhci-2025-101831
Sara Ben Hmido, Ewout Ingwersen, Houssam Abder Rahim, Martin van Maanen, Sander Groot, Matthijs Schakel, Annika Rausch, Geert Kazemier, Freek Daams

Objectives: Evaluate the technical integration and usability of an intraoperative predictive machine learning model for colorectal anastomotic leakage within the Epic electronic health records (EHRs) at a single academic centre, with outputs blinded.

Methods: The system used 28 data elements from patient records, intraoperative monitoring equipment and structured operating room (OR) observations. Data were collected every 15 min and processed in the cloud with encrypted, pseudonymised transfer. Usability was assessed using the System Usability Scale (SUS) and additional questions addressing access, clarity, responsiveness, workflow impact, safety and training needs. Convenience sampling was used, with all available OR staff involved in eligible procedures invited to participate.

Results: 15 procedures (9 October 2024 to 6 March 2025) were included. Nine unique users responded (≈75% of 12 exposed; four surgeons, five OR assistants). The interface was accessed in all cases, and predictions were generated each time from all three sources. Mean SUS was 79.2 (SD 10.4; 95% CI 71.2 to 87.2). Diagnostic items favoured access speed and clarity; prediction responsiveness scored lower.

Discussion: The system could be implemented and operated reliably within a single centre and EHR environment and was perceived as easy to use, despite the small sample size. However, findings may not generalise to other hospitals or EHRs without adaptation and further multi-site evaluation.

Conclusion: The system functioned reliably and was positively received, supporting readiness for activation in real-time clinical use and prospective evaluation. Future deployment should incorporate regulatory planning and a quality-management framework to monitor performance, safety and changes in model behaviour over time.

目的:评估单个学术中心Epic电子健康记录(EHRs)中结肠直肠吻合口漏术中预测机器学习模型的技术集成和可用性,输出为盲法。方法:系统使用患者病历、术中监护设备和结构化手术室(OR)观察的28个数据元素。数据每15分钟收集一次,并在云端以加密、假名传输的方式进行处理。可用性评估使用系统可用性量表(SUS)和附加问题解决访问,清晰度,响应能力,工作流程影响,安全性和培训需求。采用方便抽样,邀请所有符合条件的手术室工作人员参与。结果:纳入15例手术(2024年10月9日至2025年3月6日)。9个独立用户做出了回应(约占12个用户的75%;4个外科医生,5个手术室助理)。在所有情况下都访问了接口,并且每次都从所有三个源生成预测。平均SUS为79.2 (SD 10.4; 95% CI 71.2 - 87.2)。诊断项目有利于获取速度和清晰度;预测反应性得分较低。讨论:该系统可以在单一中心和电子病历环境中可靠地实施和操作,并且被认为易于使用,尽管样本量小。然而,如果没有适应和进一步的多地点评估,研究结果可能无法推广到其他医院或电子病历。结论:该系统运行可靠,受到好评,为实时临床应用和前瞻性评估提供了支持。未来的部署应该包括监管规划和质量管理框架,以监测性能、安全性和模型行为随时间的变化。
{"title":"Bringing AI to the OR: integrating a machine learning predictive model in the EHR - a pilot on user-friendliness.","authors":"Sara Ben Hmido, Ewout Ingwersen, Houssam Abder Rahim, Martin van Maanen, Sander Groot, Matthijs Schakel, Annika Rausch, Geert Kazemier, Freek Daams","doi":"10.1136/bmjhci-2025-101831","DOIUrl":"10.1136/bmjhci-2025-101831","url":null,"abstract":"<p><strong>Objectives: </strong>Evaluate the technical integration and usability of an intraoperative predictive machine learning model for colorectal anastomotic leakage within the Epic electronic health records (EHRs) at a single academic centre, with outputs blinded.</p><p><strong>Methods: </strong>The system used 28 data elements from patient records, intraoperative monitoring equipment and structured operating room (OR) observations. Data were collected every 15 min and processed in the cloud with encrypted, pseudonymised transfer. Usability was assessed using the System Usability Scale (SUS) and additional questions addressing access, clarity, responsiveness, workflow impact, safety and training needs. Convenience sampling was used, with all available OR staff involved in eligible procedures invited to participate.</p><p><strong>Results: </strong>15 procedures (9 October 2024 to 6 March 2025) were included. Nine unique users responded (≈75% of 12 exposed; four surgeons, five OR assistants). The interface was accessed in all cases, and predictions were generated each time from all three sources. Mean SUS was 79.2 (SD 10.4; 95% CI 71.2 to 87.2). Diagnostic items favoured access speed and clarity; prediction responsiveness scored lower.</p><p><strong>Discussion: </strong>The system could be implemented and operated reliably within a single centre and EHR environment and was perceived as easy to use, despite the small sample size. However, findings may not generalise to other hospitals or EHRs without adaptation and further multi-site evaluation.</p><p><strong>Conclusion: </strong>The system functioned reliably and was positively received, supporting readiness for activation in real-time clinical use and prospective evaluation. Future deployment should incorporate regulatory planning and a quality-management framework to monitor performance, safety and changes in model behaviour over time.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12933791/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147282415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial intelligence translation in healthcare: an urgent call for evidence-informed policy frameworks. 医疗保健中的人工智能翻译:紧急呼吁建立循证政策框架。
IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-18 DOI: 10.1136/bmjhci-2025-102007
Chukwuebuka Anyaegbuna, Natasha Steele, April Shichu Liang, Stephen P Ma, Ivan Lopez, Nymisha Chilukuri, Kavita Patel, Kevin Schulman, Jonathan H Chen

The deployment of artificial intelligence (AI) translation tools in healthcare is accelerating rapidly, yet regulatory frameworks lag dangerously behind clinical practice. Recent data reveal that 57% of US physicians are already using or planning to adopt AI translation services within the next year. This creates a critical policy vacuum where clinicians deploy tools with variable performance across languages, risking patient safety and deepening health inequities. We examine the fractured regulatory landscape, document performance disparities between well-resourced and digitally under-represented languages, and argue for an urgent, evidence-informed policy framework centred on patient comprehension rather than linguistic accuracy.We delineate a risk-stratified validation approach comprising two distinct tracks: a 'Streamlined Pathway' for tool-language combinations with robust existing evidence (eg, Spanish) and a 'Standard Pathway' requiring independent, prospective validation for digitally under-represented languages (eg, Haitian Creole). To ensure accountability, we propose establishing oversight bodies within the U.S. Department of Health and Human Services (HHS) or the Food and Drug Administration (FDA) to mandate pre-deployment validation and post-market monitoring. Without such action, AI translation risks creating a two-tier system where the 25.7 million Americans with non-English language preferences receive dramatically different care quality based solely on the language they speak.

人工智能(AI)翻译工具在医疗保健领域的部署正在迅速加速,但监管框架却严重落后于临床实践。最近的数据显示,57%的美国医生已经在使用或计划在明年采用人工智能翻译服务。这造成了一个关键的政策真空,临床医生在不同语言之间部署不同性能的工具,这将危及患者安全并加剧卫生不平等。我们研究了支离破碎的监管格局,记录了资源充足和数字代表性不足的语言之间的表现差异,并主张建立一个紧急的、循证的政策框架,以患者理解为中心,而不是语言准确性。我们描述了一种风险分层验证方法,包括两种不同的路径:一种是“流线型路径”,用于具有可靠现有证据的工具语言组合(如西班牙语),另一种是“标准路径”,要求对数字代表性不足的语言(如海地克里奥尔语)进行独立的前瞻性验证。为了确保问责制,我们建议在美国卫生与公众服务部(HHS)或食品和药物管理局(FDA)内建立监督机构,以强制执行部署前验证和上市后监督。如果不采取这样的行动,人工智能翻译可能会创造一个双层系统,在这个系统中,2570万非英语语言偏好的美国人仅根据他们所说的语言就会获得截然不同的护理质量。
{"title":"Artificial intelligence translation in healthcare: an urgent call for evidence-informed policy frameworks.","authors":"Chukwuebuka Anyaegbuna, Natasha Steele, April Shichu Liang, Stephen P Ma, Ivan Lopez, Nymisha Chilukuri, Kavita Patel, Kevin Schulman, Jonathan H Chen","doi":"10.1136/bmjhci-2025-102007","DOIUrl":"10.1136/bmjhci-2025-102007","url":null,"abstract":"<p><p>The deployment of artificial intelligence (AI) translation tools in healthcare is accelerating rapidly, yet regulatory frameworks lag dangerously behind clinical practice. Recent data reveal that 57% of US physicians are already using or planning to adopt AI translation services within the next year. This creates a critical policy vacuum where clinicians deploy tools with variable performance across languages, risking patient safety and deepening health inequities. We examine the fractured regulatory landscape, document performance disparities between well-resourced and digitally under-represented languages, and argue for an urgent, evidence-informed policy framework centred on patient comprehension rather than linguistic accuracy.We delineate a risk-stratified validation approach comprising two distinct tracks: a 'Streamlined Pathway' for tool-language combinations with robust existing evidence (eg, Spanish) and a 'Standard Pathway' requiring independent, prospective validation for digitally under-represented languages (eg, Haitian Creole). To ensure accountability, we propose establishing oversight bodies within the U.S. Department of Health and Human Services (HHS) or the Food and Drug Administration (FDA) to mandate pre-deployment validation and post-market monitoring. Without such action, AI translation risks creating a two-tier system where the 25.7 million Americans with non-English language preferences receive dramatically different care quality based solely on the language they speak.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12918658/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146218489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
By the book or beyond? Lessons on implementation fidelity in remote patient monitoring within a hybrid hospital-at-home feasibility study. 照章办事还是不照章办事?在混合医院-家庭可行性研究中实施远程病人监测保真度的经验教训。
IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-18 DOI: 10.1136/bmjhci-2025-101832
Tatjana Sandreva Dreisig, Maria Larsen, Charlotte von Sydow, Thyge Nielsen, Thea K Fischer, Gritt Overbeck, Sarah Villadsen

Objectives: Hybrid hospital-at-home (HaH) models, combining remote patient monitoring (RPM) with home visits, offer an alternative to inpatient care. Yet, evidence on how RPM is delivered is limited. This paper reports a substudy embedded within a feasibility study, examining RPM quality in a hybrid HaH programme for patients with lower respiratory tract infections before a large trial.

Methods: We analysed 19 patient trajectories in a multimethod feasibility study (April 2022-May 2023). We hypothesised that effective RPM implementation enables the timely detection of deterioration, prompting patients' return to the hospital and preventing harm. Quality was assessed via implementation fidelity and patient safety. Fidelity referred to adherence to RPM protocols for virtual ward rounds and alert management. Data sources include telemedicine logs, electronic health records, clinical observations and clinician feedback.

Results: No severe adverse events occurred. Four patients (21%) returned to the hospital. A total of 48 virtual ward rounds were scheduled, and 46 (96%) were provided, 37 of which (77%) were conducted via video. Most RPM alerts were logged outside protocol-defined timeframes, as clinicians prioritised clinical action over documentation.

Discussion: We demonstrate a gap between protocol assumptions and real-world clinical workflows. Ultimately, rigid performance metrics may overlook essential adaptive practices, underestimating true quality.

Conclusions: Despite protocol deviations, our findings suggest that RPM practices may have supported timely detection of deterioration during early-phase testing. This study underscores the need to balance protocol adherence with clinical flexibility, emphasising core functions over administrative compliance in complex interventions like hybrid HaH.

Trial registration number: NCT05087082.

目的:混合医院-家庭(HaH)模式,结合远程患者监测(RPM)与家访,提供了住院治疗的替代方案。然而,关于RPM如何传递的证据是有限的。本文报告了可行性研究中的一个子研究,在大型试验之前检查下呼吸道感染患者的混合HaH方案的RPM质量。方法:我们在一项多方法可行性研究中分析了19例患者的轨迹(2022年4月至2023年5月)。我们假设有效的RPM实施能够及时发现恶化,促使患者返回医院并防止伤害。质量通过实施保真度和患者安全性进行评估。保真指的是对虚拟查房和警报管理的RPM协议的遵守。数据来源包括远程医疗日志、电子健康记录、临床观察和临床医生反馈。结果:未发生严重不良事件。4名患者(21%)返回医院。总共安排了48次虚拟查房,提供了46次(96%),其中37次(77%)通过视频进行。大多数RPM警报在协议定义的时间范围之外被记录,因为临床医生优先考虑临床行动而不是文档。讨论:我们展示了协议假设和现实世界临床工作流程之间的差距。最终,严格的性能度量可能会忽略基本的适应性实践,低估真正的质量。结论:尽管协议偏差,我们的研究结果表明RPM实践可能支持在早期测试中及时发现恶化。这项研究强调了平衡方案依从性与临床灵活性的必要性,强调了混合HaH等复杂干预措施的核心功能而不是行政依从性。试验注册号:NCT05087082。
{"title":"By the book or beyond? Lessons on implementation fidelity in remote patient monitoring within a hybrid hospital-at-home feasibility study.","authors":"Tatjana Sandreva Dreisig, Maria Larsen, Charlotte von Sydow, Thyge Nielsen, Thea K Fischer, Gritt Overbeck, Sarah Villadsen","doi":"10.1136/bmjhci-2025-101832","DOIUrl":"10.1136/bmjhci-2025-101832","url":null,"abstract":"<p><strong>Objectives: </strong>Hybrid hospital-at-home (HaH) models, combining remote patient monitoring (RPM) with home visits, offer an alternative to inpatient care. Yet, evidence on how RPM is delivered is limited. This paper reports a substudy embedded within a feasibility study, examining RPM quality in a hybrid HaH programme for patients with lower respiratory tract infections before a large trial.</p><p><strong>Methods: </strong>We analysed 19 patient trajectories in a multimethod feasibility study (April 2022-May 2023). We hypothesised that effective RPM implementation enables the timely detection of deterioration, prompting patients' return to the hospital and preventing harm. Quality was assessed via implementation fidelity and patient safety. Fidelity referred to adherence to RPM protocols for virtual ward rounds and alert management. Data sources include telemedicine logs, electronic health records, clinical observations and clinician feedback.</p><p><strong>Results: </strong>No severe adverse events occurred. Four patients (21%) returned to the hospital. A total of 48 virtual ward rounds were scheduled, and 46 (96%) were provided, 37 of which (77%) were conducted via video. Most RPM alerts were logged outside protocol-defined timeframes, as clinicians prioritised clinical action over documentation.</p><p><strong>Discussion: </strong>We demonstrate a gap between protocol assumptions and real-world clinical workflows. Ultimately, rigid performance metrics may overlook essential adaptive practices, underestimating true quality.</p><p><strong>Conclusions: </strong>Despite protocol deviations, our findings suggest that RPM practices may have supported timely detection of deterioration during early-phase testing. This study underscores the need to balance protocol adherence with clinical flexibility, emphasising core functions over administrative compliance in complex interventions like hybrid HaH.</p><p><strong>Trial registration number: </strong>NCT05087082.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12918672/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146218626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Early detection of female-specific cancers using longitudinal healthcare records with a multichannel convolutional neural network. 利用多通道卷积神经网络纵向医疗记录进行女性特异性癌症的早期检测。
IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-12 DOI: 10.1136/bmjhci-2025-101874
Chia-Hui Chien, Shih-Chuan Chang, Yung-Chun Chang, Yu-Chuan Li

Objectives: Female-specific cancers, including breast, ovarian, cervical and uterine malignancies, lack comprehensive early detection approaches, particularly for ovarian and endometrial cancers where effective population-level screening remains limited. This study aimed to develop and validate a computational method for early detection of female-specific cancers using longitudinal healthcare records.

Methods: We developed a multichannel convolutional neural network (MCNN) to analyse 36-month pre-diagnostic healthcare records from Taiwan's National Health Insurance Research Database. The study included 19 954 female patients (596 cancer cases, 19 358 controls) from 1999 to 2013. Log-likelihood ratio feature selection identified top 10 features across three data modalities (diagnostic codes, medications, medical orders). The six-channel architecture processed temporal patterns through stratified 10-fold cross-validation, with performance compared against nine baseline algorithms.

Results: MCNN achieved superior balanced performance with Macro-F₁ score of 0.8443, precision of 0.9135 and recall of 0.7978, outperforming traditional machine learning and deep learning approaches. Feature analysis revealed clinically relevant patterns including tamoxifen therapy, immunohistochemical procedures and cancer-specific diagnostic codes. SHapley Additive exPlanations (SHAP) interpretability analysis demonstrated the model's ability to identify pre-diagnostic phases through temporal healthcare utilisation patterns. Systematic feature selection reduced computational requirements by over 99%, enabling validation on Taiwan's population-scale National Health Insurance Research Database (NHIRD).

Discussion: The multichannel deep learning approach enables unified early detection across four female cancer types using routine administrative data, addressing detection gaps for ovarian and endometrial cancers while providing complementary risk stratification for existing screening programmes.

Conclusion: Clinical implementation through electronic health record (EHR) integration offers practical pathways for accessible cancer risk assessment during routine healthcare encounters.

目的:女性特有的癌症,包括乳腺癌、卵巢癌、子宫颈癌和子宫恶性肿瘤,缺乏全面的早期检测方法,特别是卵巢癌和子宫内膜癌,有效的人群水平筛查仍然有限。本研究旨在开发和验证一种利用纵向医疗记录进行女性特异性癌症早期检测的计算方法。​该研究纳入1999 - 2013年的19954例女性患者(596例癌症患者,19358例对照组)。对数似然比特征选择确定了三种数据模式(诊断代码、药物、医嘱)中的前10个特征。六通道架构通过分层的10倍交叉验证处理时间模式,并将性能与九种基线算法进行比较。结果:MCNN取得了优异的平衡性能,Macro-F₁得分为0.8443,精度为0.9135,召回率为0.7978,优于传统的机器学习和深度学习方法。特征分析揭示了临床相关模式,包括他莫昔芬治疗、免疫组织化学程序和癌症特异性诊断代码。SHapley加性解释(SHAP)可解释性分析证明了该模型通过时间医疗保健利用模式识别诊断前阶段的能力。系统的特征选择减少了99%以上的计算需求,使台湾人口规模的国民健康保险研究数据库(NHIRD)得以验证。讨论:多渠道深度学习方法可以使用常规管理数据对四种女性癌症类型进行统一的早期检测,解决卵巢癌和子宫内膜癌的检测差距,同时为现有筛查方案提供补充风险分层。结论:通过电子健康记录(EHR)集成的临床实施为常规医疗就诊期间的癌症风险评估提供了切实可行的途径。
{"title":"Early detection of female-specific cancers using longitudinal healthcare records with a multichannel convolutional neural network.","authors":"Chia-Hui Chien, Shih-Chuan Chang, Yung-Chun Chang, Yu-Chuan Li","doi":"10.1136/bmjhci-2025-101874","DOIUrl":"10.1136/bmjhci-2025-101874","url":null,"abstract":"<p><strong>Objectives: </strong>Female-specific cancers, including breast, ovarian, cervical and uterine malignancies, lack comprehensive early detection approaches, particularly for ovarian and endometrial cancers where effective population-level screening remains limited. This study aimed to develop and validate a computational method for early detection of female-specific cancers using longitudinal healthcare records.</p><p><strong>Methods: </strong>We developed a multichannel convolutional neural network (MCNN) to analyse 36-month pre-diagnostic healthcare records from Taiwan's National Health Insurance Research Database. The study included 19 954 female patients (596 cancer cases, 19 358 controls) from 1999 to 2013. Log-likelihood ratio feature selection identified top 10 features across three data modalities (diagnostic codes, medications, medical orders). The six-channel architecture processed temporal patterns through stratified 10-fold cross-validation, with performance compared against nine baseline algorithms.</p><p><strong>Results: </strong>MCNN achieved superior balanced performance with Macro-F₁ score of 0.8443, precision of 0.9135 and recall of 0.7978, outperforming traditional machine learning and deep learning approaches. Feature analysis revealed clinically relevant patterns including tamoxifen therapy, immunohistochemical procedures and cancer-specific diagnostic codes. SHapley Additive exPlanations (SHAP) interpretability analysis demonstrated the model's ability to identify pre-diagnostic phases through temporal healthcare utilisation patterns. Systematic feature selection reduced computational requirements by over 99%, enabling validation on Taiwan's population-scale National Health Insurance Research Database (NHIRD).</p><p><strong>Discussion: </strong>The multichannel deep learning approach enables unified early detection across four female cancer types using routine administrative data, addressing detection gaps for ovarian and endometrial cancers while providing complementary risk stratification for existing screening programmes.</p><p><strong>Conclusion: </strong>Clinical implementation through electronic health record (EHR) integration offers practical pathways for accessible cancer risk assessment during routine healthcare encounters.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12911670/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146194132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical evaluation of MiADE: a natural language processing system for assisting structured diagnosis recording at the point of care. MiADE的临床评估:在护理点协助结构化诊断记录的自然语言处理系统。
IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-11 DOI: 10.1136/bmjhci-2025-101801
Mairead McErlean, Jack Ross, Jonathan Kossoff, Maisarah Amran, James Brandreth, Leilei Zhu, Gary Philippo, Wai Keong Wong, Folkert W Asselbergs, Richard J B Dobson, Yogini Jani, Enrico Costanza, Anoop Dinesh Shah

Objectives: To evaluate the usability, usefulness and impact of a novel point of care natural language processing (NLP) system, Medical information AI Data Extractor (MiADE), to assist structured diagnosis recording in electronic health records.

Methods: Mixed methods evaluation of the implementation of MiADE in a major National Health Service hospital, with surveys, interviews and observed outpatient consultations. The number of structured diagnoses recorded per outpatient encounter was compared before and after MiADE, and completeness of inpatient problem lists was evaluated using billing diagnoses as a gold standard.

Results: 85 clinicians consented to the study and were provided access to MiADE and 24 used MiADE to receive structured data suggestions during the study period. Baseline survey data and observations showed wide variation in structured data recording despite clinicians considering it to be important. Half of postimplementation survey respondents considered MiADE to be 'very' or 'moderately' useful. Multilevel quasi-Poisson regression of 12 309 outpatient encounters (accounting for time and clustering by clinician) estimated that the post-MiADE period was associated with 23.7% more diagnoses recorded per encounter. No improvement was seen in the inpatient setting.

Discussion: Structured recording of key information such as diagnoses using a clinical terminology is essential for safe, efficient patient care, but is currently done incompletely because it is time-consuming for clinicians. MiADE was associated with an increase in outpatient structured diagnosis recording despite low uptake of the tool.

Conclusion: Point of care NLP using MiADE can potentially improve structured data recording, but further development and better clinician engagement are needed to maximise its impact.

Trial registration number: ISRCTN58300671.

目的:评估一种新型护理点自然语言处理(NLP)系统——医疗信息人工智能数据提取器(MiADE)的可用性、有用性和影响,以协助电子健康档案中的结构化诊断记录。方法:采用问卷调查、访谈和门诊观察等方法对某大型国民卫生服务医院MiADE实施情况进行综合评价。在MiADE之前和之后,比较了每个门诊记录的结构化诊断的数量,并使用计费诊断作为金标准来评估住院患者问题清单的完整性。结果:85名临床医生同意这项研究,并在研究期间获得了MiADE的访问权限,24名临床医生使用MiADE接受结构化数据建议。基线调查数据和观察显示,尽管临床医生认为结构化数据记录很重要,但结构化数据记录的差异很大。在实施后的调查中,有一半的受访者认为MiADE“非常”或“一般”有用。对12309次门诊就诊的多水平准泊松回归(考虑临床医生的时间和聚类)估计,miade后的时间与每次就诊记录的诊断增加23.7%相关。住院病人的情况没有改善。讨论:使用临床术语对诊断等关键信息进行结构化记录对于安全、有效的患者护理是必不可少的,但由于临床医生耗时长,目前还没有完全完成。尽管MiADE的使用率较低,但它与门诊结构化诊断记录的增加有关。结论:使用MiADE的护理点NLP可以潜在地改善结构化数据记录,但需要进一步发展和更好的临床医生参与,以最大限度地发挥其影响。试验注册号:ISRCTN58300671。
{"title":"Clinical evaluation of MiADE: a natural language processing system for assisting structured diagnosis recording at the point of care.","authors":"Mairead McErlean, Jack Ross, Jonathan Kossoff, Maisarah Amran, James Brandreth, Leilei Zhu, Gary Philippo, Wai Keong Wong, Folkert W Asselbergs, Richard J B Dobson, Yogini Jani, Enrico Costanza, Anoop Dinesh Shah","doi":"10.1136/bmjhci-2025-101801","DOIUrl":"10.1136/bmjhci-2025-101801","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the usability, usefulness and impact of a novel point of care natural language processing (NLP) system, Medical information AI Data Extractor (MiADE), to assist structured diagnosis recording in electronic health records.</p><p><strong>Methods: </strong>Mixed methods evaluation of the implementation of MiADE in a major National Health Service hospital, with surveys, interviews and observed outpatient consultations. The number of structured diagnoses recorded per outpatient encounter was compared before and after MiADE, and completeness of inpatient problem lists was evaluated using billing diagnoses as a gold standard.</p><p><strong>Results: </strong>85 clinicians consented to the study and were provided access to MiADE and 24 used MiADE to receive structured data suggestions during the study period. Baseline survey data and observations showed wide variation in structured data recording despite clinicians considering it to be important. Half of postimplementation survey respondents considered MiADE to be 'very' or 'moderately' useful. Multilevel quasi-Poisson regression of 12 309 outpatient encounters (accounting for time and clustering by clinician) estimated that the post-MiADE period was associated with 23.7% more diagnoses recorded per encounter. No improvement was seen in the inpatient setting.</p><p><strong>Discussion: </strong>Structured recording of key information such as diagnoses using a clinical terminology is essential for safe, efficient patient care, but is currently done incompletely because it is time-consuming for clinicians. MiADE was associated with an increase in outpatient structured diagnosis recording despite low uptake of the tool.</p><p><strong>Conclusion: </strong>Point of care NLP using MiADE can potentially improve structured data recording, but further development and better clinician engagement are needed to maximise its impact.</p><p><strong>Trial registration number: </strong>ISRCTN58300671.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12911726/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146163861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mechanistic interpretability of reinforcement learning in Medicaid care coordination. 强化学习在医疗补助协调中的机制可解释性。
IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-10 DOI: 10.1136/bmjhci-2025-101935
Sanjay Basu, Sadiq Patel, Parth Sheth, Bhairavi Muralidharan, Namrata Elamaran, Aakriti Kinra, Rajaie Batniji

Objective: To expose reasoning pathways of a reinforcement learning policy for Medicaid care coordination, develop an error taxonomy and implement fairness-aware guardrails.

Design: Retrospective interpretability audit using attention analysis, Shapley explanations, sparse autoencoder feature discovery and blinded clinician adjudication.

Setting: Medicaid care coordination programmes in Washington, Virginia and Ohio (July 2023-June 2025).

Participants: 250 000 intervention decisions; 200 divergent cases reviewed by five clinicians.

Main outcome measures: Calibrated harm prediction; algorithmic clearance and residual harm rates; error taxonomy frequencies; subgroup fairness metrics.

Results: The conformal model achieved area under the receiver operating characteristic curve of 0.80 (95% CI 0.78 to 0.82), clearing 89.5% (95% CI 88.9% to 90.1%) of decisions with 1.22% (95% CI 1.14% to 1.30%) residual harm versus 6.67% (95% CI 6.02% to 7.32%) for flagged decisions. Sparse autoencoders identified seven reasoning motifs linking social determinants to clinical cascades. The error taxonomy revealed premise errors (48%, 95% CI 41% to 55%), calibration failures (27%, 95% CI 21% to 33%) and contextual blind spots (25%, 95% CI 19% to 31%). Divergence was higher for telehealth visits (11.2%) and behavioural health patients (10.7% vs 6.9%, p<0.001). Fairness optimisation reduced race-group disparity by 37% (95% CI 22% to 48%) and sex-group disparity by 28% (95% CI 14% to 39%). Reviewers rated 23% (95% CI 17% to 29%) of overridden recommendations as well-matched, confirming appropriate human oversight.

Conclusions: Mechanistic interpretability transforms opaque algorithmic assistance into auditable decision support, providing a governance scaffold for clinical artificial intelligence deployment.

目的:揭示医疗补助护理协调强化学习策略的推理途径,开发错误分类并实施公平意识护栏。设计:回顾性可解释性审计,使用注意分析、沙普利解释、稀疏自编码器特征发现和盲法临床医生裁决。背景:华盛顿州、弗吉尼亚州和俄亥俄州的医疗补助协调项目(2023年7月至2025年6月)。参与者:25万干预决策;5位临床医生审查了200个不同的病例。主要结局指标:校准伤害预测;算法清除率和残留危害率;错误分类频率;子组公平性指标。结果:保形模型在受试者工作特征曲线下的面积为0.80 (95% CI 0.78至0.82),清除了89.5% (95% CI 88.9%至90.1%)的决策,残留危害为1.22% (95% CI 1.14%至1.30%),而标记决策的残留危害为6.67% (95% CI 6.02%至7.32%)。稀疏自编码器确定了将社会决定因素与临床级联联系起来的七个推理基序。错误分类揭示了前提错误(48%,95% CI 41%至55%),校准失败(27%,95% CI 21%至33%)和上下文盲点(25%,95% CI 19%至31%)。远程医疗访问(11.2%)和行为健康患者(10.7% vs 6.9%)的差异更高。结论:机制可解释性将不透明的算法辅助转化为可审计的决策支持,为临床人工智能部署提供了治理框架。
{"title":"Mechanistic interpretability of reinforcement learning in Medicaid care coordination.","authors":"Sanjay Basu, Sadiq Patel, Parth Sheth, Bhairavi Muralidharan, Namrata Elamaran, Aakriti Kinra, Rajaie Batniji","doi":"10.1136/bmjhci-2025-101935","DOIUrl":"10.1136/bmjhci-2025-101935","url":null,"abstract":"<p><strong>Objective: </strong>To expose reasoning pathways of a reinforcement learning policy for Medicaid care coordination, develop an error taxonomy and implement fairness-aware guardrails.</p><p><strong>Design: </strong>Retrospective interpretability audit using attention analysis, Shapley explanations, sparse autoencoder feature discovery and blinded clinician adjudication.</p><p><strong>Setting: </strong>Medicaid care coordination programmes in Washington, Virginia and Ohio (July 2023-June 2025).</p><p><strong>Participants: </strong>250 000 intervention decisions; 200 divergent cases reviewed by five clinicians.</p><p><strong>Main outcome measures: </strong>Calibrated harm prediction; algorithmic clearance and residual harm rates; error taxonomy frequencies; subgroup fairness metrics.</p><p><strong>Results: </strong>The conformal model achieved area under the receiver operating characteristic curve of 0.80 (95% CI 0.78 to 0.82), clearing 89.5% (95% CI 88.9% to 90.1%) of decisions with 1.22% (95% CI 1.14% to 1.30%) residual harm versus 6.67% (95% CI 6.02% to 7.32%) for flagged decisions. Sparse autoencoders identified seven reasoning motifs linking social determinants to clinical cascades. The error taxonomy revealed premise errors (48%, 95% CI 41% to 55%), calibration failures (27%, 95% CI 21% to 33%) and contextual blind spots (25%, 95% CI 19% to 31%). Divergence was higher for telehealth visits (11.2%) and behavioural health patients (10.7% vs 6.9%, p<0.001). Fairness optimisation reduced race-group disparity by 37% (95% CI 22% to 48%) and sex-group disparity by 28% (95% CI 14% to 39%). Reviewers rated 23% (95% CI 17% to 29%) of overridden recommendations as well-matched, confirming appropriate human oversight.</p><p><strong>Conclusions: </strong>Mechanistic interpretability transforms opaque algorithmic assistance into auditable decision support, providing a governance scaffold for clinical artificial intelligence deployment.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12911724/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146155996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data pipeline quality: development and validation of a quality assessment tool for data-driven algorithms and artificial intelligence in healthcare. 数据管道质量:为医疗保健领域的数据驱动算法和人工智能开发和验证质量评估工具。
IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-02 DOI: 10.1136/bmjhci-2025-101608
Eris van Twist, Brian van Winden, Rogier de Jonge, H Rob Taal, Matthijs de Hoog, Alfred Schouten, David Tax, Jan Willem Kuiper

Objectives: To develop and validate a tool for standardised quality assessment of data-driven algorithms in healthcare, focusing on the underlying data pipeline.

Methods: Data Assessment Tool for Algorithm Critical Appraisal and Robust Evidence (DATA-CARE) was iteratively developed from the established Quality In Prognosis Studies framework, selected after reviewing 10 existing quality assessment tools for observational and artificial intelligence studies. DATA-CARE evaluates five quality domains of the data pipeline: study population, data, algorithm, outcome and report transparency. Each domain comprises three to five quality criteria. With a total score of 75 points, study quality is categorised as low (<45), moderate (45-59) or high (≥60). DATA-CARE was validated during a systematic review on data-driven algorithms using continuous physiological monitoring data within the paediatric intensive care unit. Two independent reviewers performed quality assessment using DATA-CARE of included studies. Tool validation was evaluated using inter-rater agreement and intraclass correlation coefficient (ICC).

Results: DATA-CARE demonstrated robust inter-rater agreement (93.5%) with ICC 0.98 (95% CI 0.96 to 0.99). Of 3858 screened studies, 31 were reviewed in the use case, describing diverse algorithms. Studies were predominantly low (32.3%) to moderate (41.9%) and sporadically (25.8%) high quality.

Discussion: Predominance of low-to-moderate quality studies reveals critical barriers to clinical implementation of data-driven algorithms, including low quality data capture and processing, lacking validation strategies and non-transparent reporting of findings.

Conclusions: DATA-CARE allows standardised and reliable critical appraisal for a wide variety of algorithms, addressing current gaps in standardised and reproducible algorithm development.

目标:开发和验证用于医疗保健中数据驱动算法的标准化质量评估的工具,重点关注底层数据管道。方法:算法关键评价和可靠证据数据评估工具(Data - care)是从已建立的预后质量研究框架中迭代开发的,该框架是在审查了10个现有的观察和人工智能研究质量评估工具后选择的。data - care评估数据管道的五个质量领域:研究人口、数据、算法、结果和报告透明度。每个领域包括三到五个质量标准。总分为75分,研究质量被归类为低(结果:DATA-CARE显示出强大的评分者间一致性(93.5%),ICC为0.98 (95% CI 0.96至0.99)。在3858项被筛选的研究中,有31项在用例中进行了审查,描述了不同的算法。研究主要为低质量(32.3%)至中等质量(41.9%),偶尔为高质量(25.8%)。讨论:低到中等质量研究的优势揭示了临床实施数据驱动算法的关键障碍,包括低质量的数据捕获和处理,缺乏验证策略和不透明的结果报告。结论:DATA-CARE允许对各种算法进行标准化和可靠的关键评估,解决了标准化和可重复算法开发中的当前差距。
{"title":"Data pipeline quality: development and validation of a quality assessment tool for data-driven algorithms and artificial intelligence in healthcare.","authors":"Eris van Twist, Brian van Winden, Rogier de Jonge, H Rob Taal, Matthijs de Hoog, Alfred Schouten, David Tax, Jan Willem Kuiper","doi":"10.1136/bmjhci-2025-101608","DOIUrl":"10.1136/bmjhci-2025-101608","url":null,"abstract":"<p><strong>Objectives: </strong>To develop and validate a tool for standardised quality assessment of data-driven algorithms in healthcare, focusing on the underlying data pipeline.</p><p><strong>Methods: </strong>Data Assessment Tool for Algorithm Critical Appraisal and Robust Evidence (DATA-CARE) was iteratively developed from the established Quality In Prognosis Studies framework, selected after reviewing 10 existing quality assessment tools for observational and artificial intelligence studies. DATA-CARE evaluates five quality domains of the data pipeline: study population, data, algorithm, outcome and report transparency. Each domain comprises three to five quality criteria. With a total score of 75 points, study quality is categorised as low (<45), moderate (45-59) or high (≥60). DATA-CARE was validated during a systematic review on data-driven algorithms using continuous physiological monitoring data within the paediatric intensive care unit. Two independent reviewers performed quality assessment using DATA-CARE of included studies. Tool validation was evaluated using inter-rater agreement and intraclass correlation coefficient (ICC).</p><p><strong>Results: </strong>DATA-CARE demonstrated robust inter-rater agreement (93.5%) with ICC 0.98 (95% CI 0.96 to 0.99). Of 3858 screened studies, 31 were reviewed in the use case, describing diverse algorithms. Studies were predominantly low (32.3%) to moderate (41.9%) and sporadically (25.8%) high quality.</p><p><strong>Discussion: </strong>Predominance of low-to-moderate quality studies reveals critical barriers to clinical implementation of data-driven algorithms, including low quality data capture and processing, lacking validation strategies and non-transparent reporting of findings.</p><p><strong>Conclusions: </strong>DATA-CARE allows standardised and reliable critical appraisal for a wide variety of algorithms, addressing current gaps in standardised and reproducible algorithm development.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12878310/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146104064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Electronic health record use in physiotherapy: adoption, perceived relevance and utilisation patterns-a cross-sectional study. 电子健康记录在物理治疗中的使用:采用、感知相关性和利用模式——一项横断面研究
IF 4.4 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-02-01 DOI: 10.1136/bmjhci-2025-101604
Sara Luísa Vaz, Pedro Vargues de Aguiar, Carla Pereira, André Moreira-Rosário

Objectives: This study aims to assess electronic health record (EHR) use in physiotherapy, identify factors influencing its adoption and evaluate physiotherapists' perceptions of its relevance.

Methods: A cross-sectional study was conducted with 138 licensed physiotherapists recruited through digital platforms. EHR utilisation was evaluated using the RSEFisio scale, a validated instrument designed to capture multiple dimensions of EHR use in physiotherapy. Descriptive and inferential statistical analyses were applied to examine usage patterns and contextual factors. The study followed the Strengthening the Reporting of Observational Studies in Epidemiology guidelines.

Results: The EHR utilisation rate was 78.3%. Higher utilisation was significantly associated with adequate time allocated for documentation (p=0.001), systematic recording for all patients (p=0.013) and multi-professional access to records (p=0.043). The frequency of documentation was closely linked to the perceived clinical relevance of recorded items.

Discussion: Despite the high level of EHR utilisation, physiotherapy documentation remains incomplete and driven by perceived clinical relevance. Utilisation improves with adequate time, standardised recording and interprofessional access. Inconsistent data quality undermines continuity of care and limits secondary uses, including artificial intelligence integration. Strengthening documentation is essential to improve clinical workflows and support data-driven decision-making in physiotherapy.

Conclusion: Physiotherapists recognise the value of comprehensive documentation, but report limited time and incomplete records. The disconnect between awareness and practice highlights the need for practical, system-level strategies to support more consistent and effective EHR use in physiotherapy.

目的:本研究旨在评估电子健康记录(EHR)在物理治疗中的使用情况,确定影响其采用的因素,并评估物理治疗师对其相关性的看法。方法:对通过数字平台招募的138名执业物理治疗师进行横断面研究。使用RSEFisio量表评估电子病历的使用情况,RSEFisio量表是一种经过验证的工具,旨在捕捉物理治疗中电子病历使用的多个维度。描述性和推断性统计分析应用于检查使用模式和上下文因素。该研究遵循《加强流行病学观察性研究报告指南》。结果:电子病历使用率为78.3%。较高的利用率与分配足够的时间用于记录(p=0.001)、对所有患者进行系统记录(p=0.013)和多专业人员访问记录(p=0.043)显著相关。记录的频率与所记录项目的临床相关性密切相关。讨论:尽管电子病历的使用率很高,但物理治疗文献仍然不完整,并且受到临床相关性的影响。利用充分的时间,标准化的记录和跨专业访问提高。不一致的数据质量破坏了护理的连续性,并限制了二次使用,包括人工智能集成。加强文档对于改善临床工作流程和支持物理治疗中数据驱动的决策至关重要。结论:物理治疗师认识到综合文献的价值,但报告时间有限,记录不完整。意识和实践之间的脱节突出了需要实用的系统级战略,以支持在物理治疗中更一致和有效地使用电子病历。
{"title":"Electronic health record use in physiotherapy: adoption, perceived relevance and utilisation patterns-a cross-sectional study.","authors":"Sara Luísa Vaz, Pedro Vargues de Aguiar, Carla Pereira, André Moreira-Rosário","doi":"10.1136/bmjhci-2025-101604","DOIUrl":"10.1136/bmjhci-2025-101604","url":null,"abstract":"<p><strong>Objectives: </strong>This study aims to assess electronic health record (EHR) use in physiotherapy, identify factors influencing its adoption and evaluate physiotherapists' perceptions of its relevance.</p><p><strong>Methods: </strong>A cross-sectional study was conducted with 138 licensed physiotherapists recruited through digital platforms. EHR utilisation was evaluated using the RSEFisio scale, a validated instrument designed to capture multiple dimensions of EHR use in physiotherapy. Descriptive and inferential statistical analyses were applied to examine usage patterns and contextual factors. The study followed the Strengthening the Reporting of Observational Studies in Epidemiology guidelines.</p><p><strong>Results: </strong>The EHR utilisation rate was 78.3%. Higher utilisation was significantly associated with adequate time allocated for documentation (p<i>=</i>0.001), systematic recording for all patients (p<i>=</i>0.013) and multi-professional access to records (p<i>=</i>0.043). The frequency of documentation was closely linked to the perceived clinical relevance of recorded items.</p><p><strong>Discussion: </strong>Despite the high level of EHR utilisation, physiotherapy documentation remains incomplete and driven by perceived clinical relevance. Utilisation improves with adequate time, standardised recording and interprofessional access. Inconsistent data quality undermines continuity of care and limits secondary uses, including artificial intelligence integration. Strengthening documentation is essential to improve clinical workflows and support data-driven decision-making in physiotherapy.</p><p><strong>Conclusion: </strong>Physiotherapists recognise the value of comprehensive documentation, but report limited time and incomplete records. The disconnect between awareness and practice highlights the need for practical, system-level strategies to support more consistent and effective EHR use in physiotherapy.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863365/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146104128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
BMJ Health & Care Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1