Pub Date : 2026-03-23DOI: 10.1136/bmjhci-2025-101877
Janan Arslan, Kurt Benke, Sebastian Andres Cajas Ordones, Rowell Castro, Leo Anthony Celi, Gustavo Adolfo Cruz Suarez, Roben Delos Reyes, Justin Engelmann, Ari Ercole, Almog Hilel, Leo Kinyera, Maximin Lange, Torleif Markussen Lunde, Mackenzie J Meni, Felipe Ocampo Osorio, Anna E Premo, Jana Sedlakova, Pritika Vig
We present BODHI (Balanced, Open-minded, Diagnostic, Humble, and Inquisitive), an engineering framework for curiosity driven and humble clinical decision support artificial intelligence (AI) systems. Despite growing capabilities, large language models (LLMs) often express inappropriate confidence, conflating statistical pattern recognition with genuine medical understanding. BODHI addresses this through a dual reflective architecture that: (1) decomposes epistemic uncertainty into task specific dimensions, and (2) constrains model responses using virtue based stance rules derived from a Virtue Activation Matrix. We validate the framework through controlled evaluation on 200 clinical vignettes from HealthBench Hard, assessing GPT-4o-mini and GPT-4.1-mini across 5 random seeds (2000 total observations). Statistical analysis included bootstrap resampling, paired t tests, and effect size computation. BODHI improved overall clinical response quality (GPT-4.1-mini: +16.6 pp, p<0.0001, Cohen's d=11.56; GPT-4o-mini: +2.2 pp, p<0.0001, Cohen's d=1.56) and achieved very large effect sizes on curiosity (context seeking rate: Cohen's d=16.38 and 19.54) and humility (hedging: d=5.80 for GPT-4.1-mini) metrics. Crucially, 97.3% of GPT-4.1-mini responses and 73.5% of GPT-4o-mini responses included appropriate clarifying questions, compared with 7.8% and 0.0% at baseline, demonstrating the framework's effectiveness in eliciting information gathering behaviour. Findings suggest LLMs can be reliably constrained to operate within epistemic boundaries when provided with structured uncertainty decomposition and virtue aligned response rules, offering a pathway towards safer clinical AI deployment.
{"title":"Engineering framework for curiosity-driven and humble AI in clinical decision support.","authors":"Janan Arslan, Kurt Benke, Sebastian Andres Cajas Ordones, Rowell Castro, Leo Anthony Celi, Gustavo Adolfo Cruz Suarez, Roben Delos Reyes, Justin Engelmann, Ari Ercole, Almog Hilel, Leo Kinyera, Maximin Lange, Torleif Markussen Lunde, Mackenzie J Meni, Felipe Ocampo Osorio, Anna E Premo, Jana Sedlakova, Pritika Vig","doi":"10.1136/bmjhci-2025-101877","DOIUrl":"https://doi.org/10.1136/bmjhci-2025-101877","url":null,"abstract":"<p><p>We present BODHI (Balanced, Open-minded, Diagnostic, Humble, and Inquisitive), an engineering framework for curiosity driven and humble clinical decision support artificial intelligence (AI) systems. Despite growing capabilities, large language models (LLMs) often express inappropriate confidence, conflating statistical pattern recognition with genuine medical understanding. BODHI addresses this through a dual reflective architecture that: (1) decomposes epistemic uncertainty into task specific dimensions, and (2) constrains model responses using virtue based stance rules derived from a Virtue Activation Matrix. We validate the framework through controlled evaluation on 200 clinical vignettes from HealthBench Hard, assessing GPT-4o-mini and GPT-4.1-mini across 5 random seeds (2000 total observations). Statistical analysis included bootstrap resampling, paired t tests, and effect size computation. BODHI improved overall clinical response quality (GPT-4.1-mini: +16.6 pp, p<0.0001, Cohen's d=11.56; GPT-4o-mini: +2.2 pp, p<0.0001, Cohen's d=1.56) and achieved very large effect sizes on curiosity (context seeking rate: Cohen's d=16.38 and 19.54) and humility (hedging: d=5.80 for GPT-4.1-mini) metrics. Crucially, 97.3% of GPT-4.1-mini responses and 73.5% of GPT-4o-mini responses included appropriate clarifying questions, compared with 7.8% and 0.0% at baseline, demonstrating the framework's effectiveness in eliciting information gathering behaviour. Findings suggest LLMs can be reliably constrained to operate within epistemic boundaries when provided with structured uncertainty decomposition and virtue aligned response rules, offering a pathway towards safer clinical AI deployment.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147502571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objectives: The overwhelmed situation under the COVID-19 pandemic has worsened the quality of emergency medical care and the mortality rate due to out-of-hospital cardiac arrest (OHCA). However, there has been no research conducted for the validation of prognostic prediction models for OHCA patients using data collected during the pandemic. We sought to develop a pre-hospital prediction model for neurological outcome at 1 month in adult patients following OHCA using a machine-learning technique and validate the model for data collected during the pandemic.
Methods: The data of 1 740 212 adult OHCA patients from a nationwide registry in Japan between 2005 and 2019 were used for developing a prediction model. Neurological outcome at 1 month after OHCA was set as the prediction target. We validated the model using 96 525 patient data collected during the pandemic from March to December 2020.
Results: The optimal predictive factors were all ascertained at the emergency scene. Although the neurological outcome was less favourable during the pandemic compared with the corresponding pre-pandemic periods, the model yielded substantially high performance with precise calibration: the area under the receiver operating characteristics curve of 0.94 and 0.95 before and during the pandemic, respectively.
Discussion: The model will improve the quality of emergency care by enabling accurate triage and swift preparation for advanced life-saving care regardless of overwhelmed situations due to disastrous circumstances.
Conclusion: We developed a prediction model for neurological outcome in OHCA patients using machine learning techniques, which was adaptable to the medical situation during the COVID-19 pandemic.
{"title":"Practical adaptability of a pre-hospital prognostic prediction model for patients following out-of-hospital cardiac arrest during the COVID-19 pandemic.","authors":"Masahiro Nishi, Akira Shikuma, Eiichiro Uchino, Satoaki Matoba, Yonemoto Naohiro, Yoshio Tahara, Takanori Ikeda","doi":"10.1136/bmjhci-2025-101802","DOIUrl":"https://doi.org/10.1136/bmjhci-2025-101802","url":null,"abstract":"<p><strong>Objectives: </strong>The overwhelmed situation under the COVID-19 pandemic has worsened the quality of emergency medical care and the mortality rate due to out-of-hospital cardiac arrest (OHCA). However, there has been no research conducted for the validation of prognostic prediction models for OHCA patients using data collected during the pandemic. We sought to develop a pre-hospital prediction model for neurological outcome at 1 month in adult patients following OHCA using a machine-learning technique and validate the model for data collected during the pandemic.</p><p><strong>Methods: </strong>The data of 1 740 212 adult OHCA patients from a nationwide registry in Japan between 2005 and 2019 were used for developing a prediction model. Neurological outcome at 1 month after OHCA was set as the prediction target. We validated the model using 96 525 patient data collected during the pandemic from March to December 2020.</p><p><strong>Results: </strong>The optimal predictive factors were all ascertained at the emergency scene. Although the neurological outcome was less favourable during the pandemic compared with the corresponding pre-pandemic periods, the model yielded substantially high performance with precise calibration: the area under the receiver operating characteristics curve of 0.94 and 0.95 before and during the pandemic, respectively.</p><p><strong>Discussion: </strong>The model will improve the quality of emergency care by enabling accurate triage and swift preparation for advanced life-saving care regardless of overwhelmed situations due to disastrous circumstances.</p><p><strong>Conclusion: </strong>We developed a prediction model for neurological outcome in OHCA patients using machine learning techniques, which was adaptable to the medical situation during the COVID-19 pandemic.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147479915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-18DOI: 10.1136/bmjhci-2025-101510
Shraboni Ghosal, Mengying Zhang, Angeliki Bogosian, Elizabeth Marsh, Trudi Edginton, Emma Stanmore, Siobhan O'Connor
Background: Mindfulness can positively impact physical and mental health, but face-to-face programmes are limited by poor accessibility, availability and cost. Virtual reality (VR) offers immersive audiovisual environments that could improve mindfulness practice.
Objectives: To evaluate commercially available VR apps related to mindfulness.
Methods: App stores and relevant online platforms were searched for VR apps related to mindfulness. Results were screened against eligibility criteria and relevant data extracted. Six raters used the Mobile App Rating Scale (MARS) to assess the quality of VR apps.
Results: Five VR apps related to mindfulness were included, that is, Headspace XR, Hoame, Innerworld, Maloka and TRIPP. These provided access to meditative and mindfulness sessions, guided by virtual instructors in some cases and situated in a range of virtual landscapes accompanied by sound or music. TRIPP received the highest average MARS score (4.1), followed by Hoame (3.8), Maloka (3.6), Headspace XR (3.4) and Innerworld (3.3). Most VR apps scored the highest on functionality (3.4-4.2), while the information category scored the lowest (3.1-3.7). The intraclass correlation was moderate.
Conclusion: This review provides important insights into VR apps related to mindfulness such as their availability and quality. Only five VR apps were identified related to mindfulness practice with an overall moderate MARS quality score (3.62/5.00). These may provide a convenient and immersive way to access and engage in regular mindfulness practice, particularly for novices. Rigorous scientific research should assess the effectiveness of these VR apps in improving physical and mental health through immersive digital mindfulness practice.
{"title":"Virtual reality-based mindfulness applications: a commercial health app review.","authors":"Shraboni Ghosal, Mengying Zhang, Angeliki Bogosian, Elizabeth Marsh, Trudi Edginton, Emma Stanmore, Siobhan O'Connor","doi":"10.1136/bmjhci-2025-101510","DOIUrl":"https://doi.org/10.1136/bmjhci-2025-101510","url":null,"abstract":"<p><strong>Background: </strong>Mindfulness can positively impact physical and mental health, but face-to-face programmes are limited by poor accessibility, availability and cost. Virtual reality (VR) offers immersive audiovisual environments that could improve mindfulness practice.</p><p><strong>Objectives: </strong>To evaluate commercially available VR apps related to mindfulness.</p><p><strong>Methods: </strong>App stores and relevant online platforms were searched for VR apps related to mindfulness. Results were screened against eligibility criteria and relevant data extracted. Six raters used the Mobile App Rating Scale (MARS) to assess the quality of VR apps.</p><p><strong>Results: </strong>Five VR apps related to mindfulness were included, that is, Headspace XR, Hoame, Innerworld, Maloka and TRIPP. These provided access to meditative and mindfulness sessions, guided by virtual instructors in some cases and situated in a range of virtual landscapes accompanied by sound or music. TRIPP received the highest average MARS score (4.1), followed by Hoame (3.8), Maloka (3.6), Headspace XR (3.4) and Innerworld (3.3). Most VR apps scored the highest on functionality (3.4-4.2), while the information category scored the lowest (3.1-3.7). The intraclass correlation was moderate.</p><p><strong>Conclusion: </strong>This review provides important insights into VR apps related to mindfulness such as their availability and quality. Only five VR apps were identified related to mindfulness practice with an overall moderate MARS quality score (3.62/5.00). These may provide a convenient and immersive way to access and engage in regular mindfulness practice, particularly for novices. Rigorous scientific research should assess the effectiveness of these VR apps in improving physical and mental health through immersive digital mindfulness practice.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147479912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-13DOI: 10.1136/bmjhci-2025-101587
Richard David Barker, Refik Gökmen, Daisy Naylor, James T Teo
Objective: Digital health apps and patient portals are proposed as part of the drive from 'analogue to digital' care for the National Health Service (NHS) 10-Year Plan. Without mitigation strategies, digital inequalities could arise as a result, and more evidence is needed to understand how to mitigate this.
Methods: As part of an equality impact assessment, a retrospective cross-sectional analysis was conducted examining patient portal activation among patients invited to outpatient appointments at two large south-east London Hospital Trusts between 1 May and 1 November 2024.
Results: Of the 503 688 patients invited to attend outpatient clinics during the study period, 52.7% had activated the patient portal. Availability of email contact details was the strongest determinant of onboarding likelihood (OR 10.86). Multivariate logistic regression models showed that the following groups were less likely to activate the patient portal: men (OR 0.84), individuals at the extremes of age (71-80 or 11-20 years), those of mixed or undefined ethnicity (OR 0.58), those of black ethnicity (OR 0.62) and those with the highest degree of socioeconomic deprivation (Index of Multiple Deprivation group 1; OR 0.68).
Conclusion: This large-scale roll-out of a digital health portal provides empirical evidence of factors that drive digital inequalities for patients of two major London NHS Trusts. The observed disparities across demographic and socioeconomic dimensions and simple reliable digital contact mechanisms highlight the risk that digital healthcare initiatives may inadvertently produce new types of inequalities.
{"title":"Unlocking digital health: inequalities in the adoption of a patient portal.","authors":"Richard David Barker, Refik Gökmen, Daisy Naylor, James T Teo","doi":"10.1136/bmjhci-2025-101587","DOIUrl":"10.1136/bmjhci-2025-101587","url":null,"abstract":"<p><strong>Objective: </strong>Digital health apps and patient portals are proposed as part of the drive from 'analogue to digital' care for the National Health Service (NHS) 10-Year Plan. Without mitigation strategies, digital inequalities could arise as a result, and more evidence is needed to understand how to mitigate this.</p><p><strong>Methods: </strong>As part of an equality impact assessment, a retrospective cross-sectional analysis was conducted examining patient portal activation among patients invited to outpatient appointments at two large south-east London Hospital Trusts between 1 May and 1 November 2024.</p><p><strong>Results: </strong>Of the 503 688 patients invited to attend outpatient clinics during the study period, 52.7% had activated the patient portal. Availability of email contact details was the strongest determinant of onboarding likelihood (OR 10.86). Multivariate logistic regression models showed that the following groups were less likely to activate the patient portal: men (OR 0.84), individuals at the extremes of age (71-80 or 11-20 years), those of mixed or undefined ethnicity (OR 0.58), those of black ethnicity (OR 0.62) and those with the highest degree of socioeconomic deprivation (Index of Multiple Deprivation group 1; OR 0.68).</p><p><strong>Conclusion: </strong>This large-scale roll-out of a digital health portal provides empirical evidence of factors that drive digital inequalities for patients of two major London NHS Trusts. The observed disparities across demographic and socioeconomic dimensions and simple reliable digital contact mechanisms highlight the risk that digital healthcare initiatives may inadvertently produce new types of inequalities.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12993344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147455696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-12DOI: 10.1136/bmjhci-2025-101949
Elena Lammila-Escalera, Gabriele Kerr, Geva Greenfield, Benedict Hayhoe, Natalie Brewer, Carla Hearsum, Grazia Antonacci, Natasha Dsouza, Azeem Majeed, Ana Luisa Neves
Objectives: To evaluate the National Health Service (NHS) Federated Data Platform (FDP) Inpatient (IP) Care Coordination Solution (CCS) digital scheduling tool on elective theatre utilisation.
Methods: An interrupted time series assessed changes in theatre utilisation and cancellations following tool adoption (January 2022). Weekly data spanned 90 weeks (April 2021-December 2023). Outcomes included weekly median theatre utilisation (actual, booked and bookings per session) and the percentage of cancelled bookings. Models incorporated a 5-week lag and estimated level (step-change) and trend (slope) effects.
Results: Postintervention level and trend increases were observed for booked (β=4.40, p=0.045; β=0.26, p=0.002) and actual (β=3.98, p=0.064; β=0.23, p=0.006) utilisation. Bookings per session showed a significant level increase (β=0.34, p=0.002) with no trend change (β=0.00, p=0.790). Across the postintervention period, compared with counterfactual estimates, booked and actual utilisation were 15.0% (95% CI 13.4% to 16.5%, p<0.0001) and 12.2% (95% CI 10.8% to 13.5%, p<0.0001) higher, while bookings per session were 10.9% (95% CI 9.5% to 12.4%, p<0.0001) higher. Significant positive effects were observed for urology, general surgery, gynaecology, plastic surgery and ophthalmology. A significant upward trend in cancellation rates was associated with the introduction of the tool (β=2.1, p=0.001).
Discussion: Findings suggest that centralised digital scheduling tools can improve theatre capacity by enabling more efficient use of existing capacity through improved scheduling visibility. Future research should explore differences in specialty-level usage and long-term sustainability of gains.
Conclusion: The introduction of the NHS FDP IP CCS product was associated with improved elective theatre utilisation.
目的:评估国民健康服务(NHS)联邦数据平台(FDP)住院病人(IP)护理协调解决方案(CCS)数字调度工具在选择性手术室利用方面的作用。方法:中断时间序列评估了采用工具后剧院利用率和取消的变化(2022年1月)。每周数据跨度为90周(2021年4月至2023年12月)。结果包括每周剧院利用率的中位数(实际的、预定的和每次的预订)和取消预订的百分比。模型纳入了5周的滞后和估计水平(阶跃变化)和趋势(斜率)效应。结果:干预后,预约利用率(β=4.40, p=0.045; β=0.26, p=0.002)和实际利用率(β=3.98, p=0.064; β=0.23, p=0.006)水平和趋势均有所上升。每次的预订量显示出显著的水平增加(β=0.34, p=0.002),没有趋势变化(β=0.00, p=0.790)。在整个干预后期间,与反事实估计相比,预定利用率和实际利用率为15.0% (95% CI 13.4%至16.5%)。讨论:研究结果表明,集中式数字调度工具可以通过提高调度可见性来更有效地利用现有容量,从而提高剧院容量。未来的研究应该探索专业水平使用的差异和收益的长期可持续性。结论:NHS FDP IP CCS产品的引入与选择性手术室利用率的提高有关。
{"title":"Impact of the Federated Data Platform's digital surgery scheduling system on elective theatre utilisation at an NHS Trust: an interrupted time series analysis.","authors":"Elena Lammila-Escalera, Gabriele Kerr, Geva Greenfield, Benedict Hayhoe, Natalie Brewer, Carla Hearsum, Grazia Antonacci, Natasha Dsouza, Azeem Majeed, Ana Luisa Neves","doi":"10.1136/bmjhci-2025-101949","DOIUrl":"10.1136/bmjhci-2025-101949","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the National Health Service (NHS) Federated Data Platform (FDP) Inpatient (IP) Care Coordination Solution (CCS) digital scheduling tool on elective theatre utilisation.</p><p><strong>Methods: </strong>An interrupted time series assessed changes in theatre utilisation and cancellations following tool adoption (January 2022). Weekly data spanned 90 weeks (April 2021-December 2023). Outcomes included weekly median theatre utilisation (actual, booked and bookings per session) and the percentage of cancelled bookings. Models incorporated a 5-week lag and estimated level (step-change) and trend (slope) effects.</p><p><strong>Results: </strong>Postintervention level and trend increases were observed for booked (β=4.40, p=0.045; β=0.26, p=0.002) and actual (β=3.98, p=0.064; β=0.23, p=0.006) utilisation. Bookings per session showed a significant level increase (β=0.34, p=0.002) with no trend change (β=0.00, p=0.790). Across the postintervention period, compared with counterfactual estimates, booked and actual utilisation were 15.0% (95% CI 13.4% to 16.5%, p<0.0001) and 12.2% (95% CI 10.8% to 13.5%, p<0.0001) higher, while bookings per session were 10.9% (95% CI 9.5% to 12.4%, p<0.0001) higher. Significant positive effects were observed for urology, general surgery, gynaecology, plastic surgery and ophthalmology. A significant upward trend in cancellation rates was associated with the introduction of the tool (β=2.1, p=0.001).</p><p><strong>Discussion: </strong>Findings suggest that centralised digital scheduling tools can improve theatre capacity by enabling more efficient use of existing capacity through improved scheduling visibility. Future research should explore differences in specialty-level usage and long-term sustainability of gains.</p><p><strong>Conclusion: </strong>The introduction of the NHS FDP IP CCS product was associated with improved elective theatre utilisation.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12983765/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147442634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objectives: To evaluate the ability of large language models (LLMs) to simulate multidisciplinary team (MDT) decision-making in colorectal cancer, a malignancy that often requires complex treatment planning.
Methods: We retrospectively analysed 1423 colorectal cancer cases discussed at MDT meetings at Peking University Cancer Hospital between January 2023 and December 2024. Three LLMs-OpenAI o3-mini-2025-01-31, DeepSeek-R1 671b and Qwen qwq-plus-2025-03-05-were tested for their ability to replicate MDT recommendations using a standardised treatment categorisation framework. Each case was processed three times per model; only cases with consistent outputs across all three runs were included. Concordance between AI-generated decisions and expert MDT consensus was assessed using agreement percentages and Cohen's kappa.
Results: O3 demonstrated the highest intramodel stability, with an agreement rate of 81.0% (Fleiss' kappa=0.794), yielding 1153 cases with consistent outputs. Concordance with MDT consensus was comparable across the three models, ranging from 62.5% to 65.4%. Multivariable analysis of O3 outputs identified treatment-naïve status, non-metastatic disease and colon tumour location as independent predictors of higher concordance with experts.
Discussion: LLMs showed fair overall agreement with expert MDT decisions, with stronger performance in standardised and less complex clinical scenarios. Areas of higher concordance included treatment-naïve non-metastatic colon cancer, treated non-metastatic rectal cancer and treated non-metastatic colon cancer.
Conclusion: LLMs can partially replicate expert MDT recommendations in colorectal cancer. Their integration into clinical workflows should aim to complement, rather than replace, human expertise.
{"title":"Comparison of large language models and expert multidisciplinary team decisions in colorectal cancer.","authors":"Boyang Qu, Longhao Cao, Chen Wu, Yongjiu Chen, Tingting Sun, Junpeng Pei, Lei Huang, Xiaotong Hou, Dawei Li, Aiwen Wu","doi":"10.1136/bmjhci-2025-101780","DOIUrl":"10.1136/bmjhci-2025-101780","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the ability of large language models (LLMs) to simulate multidisciplinary team (MDT) decision-making in colorectal cancer, a malignancy that often requires complex treatment planning.</p><p><strong>Methods: </strong>We retrospectively analysed 1423 colorectal cancer cases discussed at MDT meetings at Peking University Cancer Hospital between January 2023 and December 2024. Three LLMs-OpenAI o3-mini-2025-01-31, DeepSeek-R1 671b and Qwen qwq-plus-2025-03-05-were tested for their ability to replicate MDT recommendations using a standardised treatment categorisation framework. Each case was processed three times per model; only cases with consistent outputs across all three runs were included. Concordance between AI-generated decisions and expert MDT consensus was assessed using agreement percentages and Cohen's kappa.</p><p><strong>Results: </strong>O3 demonstrated the highest intramodel stability, with an agreement rate of 81.0% (Fleiss' kappa=0.794), yielding 1153 cases with consistent outputs. Concordance with MDT consensus was comparable across the three models, ranging from 62.5% to 65.4%. Multivariable analysis of O3 outputs identified treatment-naïve status, non-metastatic disease and colon tumour location as independent predictors of higher concordance with experts.</p><p><strong>Discussion: </strong>LLMs showed fair overall agreement with expert MDT decisions, with stronger performance in standardised and less complex clinical scenarios. Areas of higher concordance included treatment-naïve non-metastatic colon cancer, treated non-metastatic rectal cancer and treated non-metastatic colon cancer.</p><p><strong>Conclusion: </strong>LLMs can partially replicate expert MDT recommendations in colorectal cancer. Their integration into clinical workflows should aim to complement, rather than replace, human expertise.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12983819/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147430244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-04DOI: 10.1136/bmjhci-2025-101723
Sini Kuitunen, Bruna L Alves, Hanna Peitsoma, Muhammad Z Saleem, Lotta Schepel, Anna-Riia Holmström
Objectives: To explore the effects of bidirectional interoperability between electronic health records (EHR) and smart infusion pumps on medication errors (MEs), system compliance and workflow efficiency and economic aspects.
Methods: This systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 criteria. The literature search on Scopus, MEDLINE (Ovid), Web of Science, Cumulated Index in Nursing and Allied Health Literature, and Evidence-Based Medicine Reviews was conducted on 3 October 2024. Peer-reviewed studies considering bidirectional interoperability between EHR and smart pumps in hospitals were included. Study selection according to a predetermined Population, Intervention, Comparison(s) and Outcome tool, data extraction and evidence quality assessment (Joanna Briggs Institute critical appraisal tool assessment and the Grading of Recommendations Assessment, Development and Evaluation approach) were carried out by two individual reviewers.
Results: Seven studies from the USA, published between 2011 and 2024, were included. The studies used variable designs to compare the effects of bidirectional interoperability between smart infusion pumps and EHR on system compliance and workflow efficiency (n=6 studies), MEs (n=3) and economic outcomes (n=2) before and after implementation. The observed effects were mainly positive; however, evidence quality was low because of the observational nature of studies.
Discussion: The interoperability between EHR systems and smart infusion pumps remains a relatively novel research topic. Evidence is geographically concentrated, limiting its generalisability to different healthcare systems, regulatory environments and technology adoption patterns.
Conclusion: While bidirectional interoperability may reduce MEs, improve system compliance and workflow efficiency and enhance hospitals' charging accuracy of provided care, future studies should prioritise controlled designs, robust data and economic outcomes to justify the investment.
Prospero registration number: CRD42024538518.
目的:探讨电子健康档案(EHR)与智能输液泵双向互操作对用药差错(MEs)、系统合规性、工作流程效率和经济方面的影响。方法:本系统评价遵循系统评价和元分析2020标准的首选报告项目。于2024年10月3日在Scopus、MEDLINE (Ovid)、Web of Science、护理及相关健康文献累积索引和循证医学评论等平台进行文献检索。同行评议的研究考虑了医院中电子病历和智能泵之间的双向互操作性。根据预先确定的人群、干预、比较和结果工具、数据提取和证据质量评估(乔安娜布里格斯研究所关键评估工具评估和建议分级评估、发展和评估方法)进行研究选择。结果:纳入了2011年至2024年间发表的7项美国研究。这些研究采用变量设计来比较智能输液泵和电子病历双向互操作性在实施前后对系统依从性和工作流程效率(n=6项研究)、MEs (n=3)和经济结果(n=2)的影响。观察到的效应以正效应为主;然而,由于研究的观察性,证据质量较低。讨论:电子病历系统和智能输液泵之间的互操作性仍然是一个相对较新的研究课题。证据在地理上集中,限制了其在不同医疗保健系统、监管环境和技术采用模式中的普遍性。结论:虽然双向互操作性可以减少MEs,提高系统合规性和工作流程效率,并提高医院对所提供护理的收费准确性,但未来的研究应优先考虑受控设计,稳健的数据和经济结果,以证明投资的合理性。普洛斯彼罗注册号:CRD42024538518。
{"title":"Effects of a bidirectional interoperability between electronic health records and smart infusion pumps in hospital settings: a systematic review.","authors":"Sini Kuitunen, Bruna L Alves, Hanna Peitsoma, Muhammad Z Saleem, Lotta Schepel, Anna-Riia Holmström","doi":"10.1136/bmjhci-2025-101723","DOIUrl":"10.1136/bmjhci-2025-101723","url":null,"abstract":"<p><strong>Objectives: </strong>To explore the effects of bidirectional interoperability between electronic health records (EHR) and smart infusion pumps on medication errors (MEs), system compliance and workflow efficiency and economic aspects.</p><p><strong>Methods: </strong>This systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 criteria. The literature search on Scopus, MEDLINE (Ovid), Web of Science, Cumulated Index in Nursing and Allied Health Literature, and Evidence-Based Medicine Reviews was conducted on 3 October 2024. Peer-reviewed studies considering bidirectional interoperability between EHR and smart pumps in hospitals were included. Study selection according to a predetermined Population, Intervention, Comparison(s) and Outcome tool, data extraction and evidence quality assessment (Joanna Briggs Institute critical appraisal tool assessment and the Grading of Recommendations Assessment, Development and Evaluation approach) were carried out by two individual reviewers.</p><p><strong>Results: </strong>Seven studies from the USA, published between 2011 and 2024, were included. The studies used variable designs to compare the effects of bidirectional interoperability between smart infusion pumps and EHR on system compliance and workflow efficiency (n=6 studies), MEs (n=3) and economic outcomes (n=2) before and after implementation. The observed effects were mainly positive; however, evidence quality was low because of the observational nature of studies.</p><p><strong>Discussion: </strong>The interoperability between EHR systems and smart infusion pumps remains a relatively novel research topic. Evidence is geographically concentrated, limiting its generalisability to different healthcare systems, regulatory environments and technology adoption patterns.</p><p><strong>Conclusion: </strong>While bidirectional interoperability may reduce MEs, improve system compliance and workflow efficiency and enhance hospitals' charging accuracy of provided care, future studies should prioritise controlled designs, robust data and economic outcomes to justify the investment.</p><p><strong>Prospero registration number: </strong>CRD42024538518.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12970065/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147353552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-03DOI: 10.1136/bmjhci-2025-101896
Lily C Taylor, Niels Peek, Ari Ercole, Georgios Lyratzopoulos, Juliet A Usher-Smith
Objectives: To develop recommendations to inform development and integration of predictive digital health and artificial intelligence tools in primary care.
Methods: Recommendation development involved two stages. The initial scoping phase comprised an umbrella review to identify barriers to implementation for risk prediction tools in primary care. The consensus phase involved a stakeholder workshop with 22 stakeholders. The draft recommendations were then refined via a stakeholder survey completed by 13 participants and three online meetings attended by 14 individuals to generate the final output.
Results: The umbrella review included 12 reviews and identified 15 barriers to implementation of risk prediction models, including lack of integration with electronic health records and poor interoperability across them. The final recommendations include 14 core features of risk prediction models and tools, including the need for codesign with clinicians and the public and integration with digital infrastructure and workflows.
Discussion: These findings particularly emphasise the value of early engagement with key stakeholders and health record system providers, and a need for shared understanding of the needs of end-users.
Conclusions: We have developed recommendations detailing 14 key characteristics for a digital risk prediction model to be successfully used in primary care settings. This profile should be used to guide development of new risk prediction tools and is also applicable more widely to other digital health innovations within primary care. Future research should work to resolve the identified system-level barriers to implementation.
{"title":"Enabling digital multifactorial risk assessment in primary care: an umbrella review and recommendations for design and implementation.","authors":"Lily C Taylor, Niels Peek, Ari Ercole, Georgios Lyratzopoulos, Juliet A Usher-Smith","doi":"10.1136/bmjhci-2025-101896","DOIUrl":"10.1136/bmjhci-2025-101896","url":null,"abstract":"<p><strong>Objectives: </strong>To develop recommendations to inform development and integration of predictive digital health and artificial intelligence tools in primary care.</p><p><strong>Methods: </strong>Recommendation development involved two stages. The initial scoping phase comprised an umbrella review to identify barriers to implementation for risk prediction tools in primary care. The consensus phase involved a stakeholder workshop with 22 stakeholders. The draft recommendations were then refined via a stakeholder survey completed by 13 participants and three online meetings attended by 14 individuals to generate the final output.</p><p><strong>Results: </strong>The umbrella review included 12 reviews and identified 15 barriers to implementation of risk prediction models, including lack of integration with electronic health records and poor interoperability across them. The final recommendations include 14 core features of risk prediction models and tools, including the need for codesign with clinicians and the public and integration with digital infrastructure and workflows.</p><p><strong>Discussion: </strong>These findings particularly emphasise the value of early engagement with key stakeholders and health record system providers, and a need for shared understanding of the needs of end-users.</p><p><strong>Conclusions: </strong>We have developed recommendations detailing 14 key characteristics for a digital risk prediction model to be successfully used in primary care settings. This profile should be used to guide development of new risk prediction tools and is also applicable more widely to other digital health innovations within primary care. Future research should work to resolve the identified system-level barriers to implementation.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12958886/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147347593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-27DOI: 10.1136/bmjhci-2025-101761
Lukasz S Wylezinski, Jamieson D Gray, Charles F Spurlock
Objectives: To develop and evaluate a machine learning (ML) model that predicts Crohn's disease (CD) patients responsible for the top quartile of healthcare spending.
Methods: De-identified commercial claims (2016-2018) from ~267 000 continuously enrolled members in a Midwestern state were analysed, including 994 CD cases. Monthly data for each patient was aggregated into data points that included healthcare spending amounts, encounter interactions, demographics and binary flags for diagnoses, procedures and drug codes. Seven algorithm families were tuned using five-fold cross-validation (January 2016 to September 2017) and tested prospectively (November 2017 to February 2018). Monthly performance evaluations assessed the accuracy of predicting high-cost healthcare spending, using 4-month and 1-month historical cost analyses for comparison.
Results: ML models predicted an average of 80% of the dollars spent by top-quartile members during the 4-month evaluation period, compared with 67% for the 4-month baseline and 62% for the prior-month benchmark. The models identified an average of 51 new members entering the high-cost group each month, nearly double the yield of the 4-month historical method. These ML models more accurately anticipated inpatient encounters that drove excess spending.
Discussion: Claims-based ML offers actionable lead time for payers and clinicians to enhance monitoring, adjust biological therapy or schedule elective care before emergency admissions occur. Because this framework relies exclusively on standard claim fields, it can be quickly extended to other episodic, high-variance conditions.
Conclusion: Prospectively tested, claims-only ML models enhance short-term risk stratification in CD by identifying future high-cost patients. Future studies should confirm the clinical impact, cost savings and ensure equitable performance across diverse populations.
{"title":"Machine learning framework for early identification of high-spending Crohn's disease patients using administrative claims.","authors":"Lukasz S Wylezinski, Jamieson D Gray, Charles F Spurlock","doi":"10.1136/bmjhci-2025-101761","DOIUrl":"10.1136/bmjhci-2025-101761","url":null,"abstract":"<p><strong>Objectives: </strong>To develop and evaluate a machine learning (ML) model that predicts Crohn's disease (CD) patients responsible for the top quartile of healthcare spending.</p><p><strong>Methods: </strong>De-identified commercial claims (2016-2018) from ~267 000 continuously enrolled members in a Midwestern state were analysed, including 994 CD cases. Monthly data for each patient was aggregated into data points that included healthcare spending amounts, encounter interactions, demographics and binary flags for diagnoses, procedures and drug codes. Seven algorithm families were tuned using five-fold cross-validation (January 2016 to September 2017) and tested prospectively (November 2017 to February 2018). Monthly performance evaluations assessed the accuracy of predicting high-cost healthcare spending, using 4-month and 1-month historical cost analyses for comparison.</p><p><strong>Results: </strong>ML models predicted an average of 80% of the dollars spent by top-quartile members during the 4-month evaluation period, compared with 67% for the 4-month baseline and 62% for the prior-month benchmark. The models identified an average of 51 new members entering the high-cost group each month, nearly double the yield of the 4-month historical method. These ML models more accurately anticipated inpatient encounters that drove excess spending.</p><p><strong>Discussion: </strong>Claims-based ML offers actionable lead time for payers and clinicians to enhance monitoring, adjust biological therapy or schedule elective care before emergency admissions occur. Because this framework relies exclusively on standard claim fields, it can be quickly extended to other episodic, high-variance conditions.</p><p><strong>Conclusion: </strong>Prospectively tested, claims-only ML models enhance short-term risk stratification in CD by identifying future high-cost patients. Future studies should confirm the clinical impact, cost savings and ensure equitable performance across diverse populations.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12970074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147316276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-26DOI: 10.1136/bmjhci-2025-101678
Success Kamuhanda, Rebecca Melisa Nakitandwe, Dafala Kezimbira, Clare Kahuma Allelua, Michael Kateregga, James Serubugo, Irene Wanyana
Objectives: Interoperability, the seamless exchange and use of data across digital health systems, is essential for integrated, efficient healthcare delivery. However, evidence on its adoption in Africa remains limited and fragmented. This scoping review aimed to map existing evidence, identify key barriers and highlight emerging opportunities for strengthening interoperability across all levels on the continent.
Methods: We conducted the review in line with the Joanna Briggs Institute (JBI) methodology and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) framework. Searches were carried out across PubMed/MEDLINE, IEEE Xplore, African Index Medicus and Google Scholar, focusing on English-language publications from January 2010 to March 2025. Eligible sources included peer-reviewed articles, conference papers and relevant policy documents.
Results: Sixteen studies met the inclusion criteria. The findings revealed wide disparities in the adoption of interoperability standards, with countries such as Uganda, South Africa and Kenya showing greater momentum due to national digital strategies and health information exchange initiatives. Common challenges included limited technical capacity, fragmented infrastructure and inadequate regulatory support. However, there were encouraging developments around the use of open-source platforms like OpenHIE, regional policy alignment through the African Union Digital Health Strategy and growing public-private partnerships.
Discussion: Progress remains uneven, shaped by each country's digital maturity, workforce capabilities and policy landscape. Capacity-building and better alignment with global standards could bridge current gaps.
Conclusion: To build resilient digital health systems, African countries must strengthen governance, invest in infrastructure and develop technical expertise. Future work should assess how interoperability influences clinical care and explore regional readiness for cross-border data exchange.
目标:互操作性,即跨数字卫生系统无缝交换和使用数据,对于综合、高效的卫生保健服务至关重要。然而,关于非洲采用这种方法的证据仍然有限和零散。这次范围审查的目的是绘制现有证据,确定主要障碍,并强调加强非洲大陆各层面互操作性的新机会。方法:我们按照乔安娜布里格斯研究所(JBI)的方法和PRISMA-ScR(系统评价和荟萃分析扩展范围评价的首选报告项目)框架进行了综述。通过PubMed/MEDLINE、IEEE explore、African Index Medicus和谷歌Scholar进行检索,重点关注2010年1月至2025年3月的英语出版物。合格的来源包括同行评审的文章、会议论文和相关政策文件。结果:16项研究符合纳入标准。调查结果显示,在采用互操作性标准方面存在很大差异,乌干达、南非和肯尼亚等国由于国家数字战略和卫生信息交换举措而表现出更大的势头。共同的挑战包括技术能力有限、基础设施分散和监管支持不足。然而,在使用OpenHIE等开源平台、通过非洲联盟数字卫生战略进行区域政策协调以及不断增长的公私伙伴关系方面,也取得了令人鼓舞的进展。讨论:各国的数字成熟度、劳动力能力和政策格局决定了进展仍然不平衡。能力建设和更好地与全球标准保持一致可以弥合目前的差距。结论:为了建立有弹性的数字卫生系统,非洲国家必须加强治理,投资于基础设施并发展技术专长。未来的工作应评估互操作性如何影响临床护理,并探索跨界数据交换的区域准备情况。
{"title":"Adoption, barriers and opportunities of interoperability and eHealth standards in Africa: a scoping review.","authors":"Success Kamuhanda, Rebecca Melisa Nakitandwe, Dafala Kezimbira, Clare Kahuma Allelua, Michael Kateregga, James Serubugo, Irene Wanyana","doi":"10.1136/bmjhci-2025-101678","DOIUrl":"10.1136/bmjhci-2025-101678","url":null,"abstract":"<p><strong>Objectives: </strong>Interoperability, the seamless exchange and use of data across digital health systems, is essential for integrated, efficient healthcare delivery. However, evidence on its adoption in Africa remains limited and fragmented. This scoping review aimed to map existing evidence, identify key barriers and highlight emerging opportunities for strengthening interoperability across all levels on the continent.</p><p><strong>Methods: </strong>We conducted the review in line with the Joanna Briggs Institute (JBI) methodology and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) framework. Searches were carried out across PubMed/MEDLINE, IEEE Xplore, African Index Medicus and Google Scholar, focusing on English-language publications from January 2010 to March 2025. Eligible sources included peer-reviewed articles, conference papers and relevant policy documents.</p><p><strong>Results: </strong>Sixteen studies met the inclusion criteria. The findings revealed wide disparities in the adoption of interoperability standards, with countries such as Uganda, South Africa and Kenya showing greater momentum due to national digital strategies and health information exchange initiatives. Common challenges included limited technical capacity, fragmented infrastructure and inadequate regulatory support. However, there were encouraging developments around the use of open-source platforms like OpenHIE, regional policy alignment through the African Union Digital Health Strategy and growing public-private partnerships.</p><p><strong>Discussion: </strong>Progress remains uneven, shaped by each country's digital maturity, workforce capabilities and policy landscape. Capacity-building and better alignment with global standards could bridge current gaps.</p><p><strong>Conclusion: </strong>To build resilient digital health systems, African countries must strengthen governance, invest in infrastructure and develop technical expertise. Future work should assess how interoperability influences clinical care and explore regional readiness for cross-border data exchange.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2026-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12958898/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147302244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}