Pub Date : 2026-01-07eCollection Date: 2026-01-01DOI: 10.1371/journal.pdig.0001179
Nicholas Dietrich, David McShannon, Mark F Rzepka
Traditional deep learning models for lung sound analysis require large, labeled datasets, whereas multimodal large language models (LLMs) may offer a flexible, prompt-based alternative. This study aimed to evaluate the utility of a general-purpose multimodal LLM, GPT-4o, for lung sound classification from mel-spectrograms and assess whether a few-shot prompt approach improves performance over zero-shot prompting. Using the ICBHI 2017 Respiratory Sound Database, 6898 annotated respiratory cycles were converted into mel-spectrograms. GPT-4o was prompted to classify each spectrogram using both zero-shot and few-shot strategies. Model outputs were evaluated against ground truth labels using performance metrics including accuracy, precision, recall, and F1-score. Few-shot prompting improved overall accuracy (0.363 vs. 0.320) and yielded modest gains in precision (0.316 vs. 0.283), recall (0.300 vs. 0.287), and F1-score (0.308 vs. 0.285) across labels. McNemar's test indicated a statistically significant difference in performance between prompting strategies (p < 0.001). Model repeatability analysis demonstrated high agreement (κ = 0.76-0.88; agreement: 89-96%), indicating excellent consistency. GPT-4o demonstrated limited but statistically significant performance gains using few-shot prompting for lung sound classification. While current performance remains insufficient for clinical deployment, this prompt-based approach provides a baseline for spectrogram-based multimodal tasks and a foundation for future exploration of prompt-based multimodal inference.
用于肺音分析的传统深度学习模型需要大型标记数据集,而多模态大语言模型(llm)可能提供灵活的、基于提示的替代方案。本研究旨在评估通用多模态LLM gpt - 40从mel谱图中进行肺音分类的效用,并评估少量提示方法是否比零提示方法提高了性能。使用ICBHI 2017呼吸声数据库,将6898个注释呼吸周期转换为mel谱图。提示gpt - 40使用零射击和少射击策略对每个频谱图进行分类。模型输出通过使用包括准确性、精密度、召回率和f1分数在内的性能指标来评估真实值标签。几次提示提高了总体准确率(0.363 vs. 0.320),并在各标签上获得了适度的精度(0.316 vs. 0.283)、召回率(0.300 vs. 0.287)和f1分数(0.308 vs. 0.285)。McNemar的测验显示,不同的提示策略在表现上有显著的统计学差异(p
{"title":"Evaluating few-shot prompting for spectrogram-based lung sound classification using a multimodal language model.","authors":"Nicholas Dietrich, David McShannon, Mark F Rzepka","doi":"10.1371/journal.pdig.0001179","DOIUrl":"10.1371/journal.pdig.0001179","url":null,"abstract":"<p><p>Traditional deep learning models for lung sound analysis require large, labeled datasets, whereas multimodal large language models (LLMs) may offer a flexible, prompt-based alternative. This study aimed to evaluate the utility of a general-purpose multimodal LLM, GPT-4o, for lung sound classification from mel-spectrograms and assess whether a few-shot prompt approach improves performance over zero-shot prompting. Using the ICBHI 2017 Respiratory Sound Database, 6898 annotated respiratory cycles were converted into mel-spectrograms. GPT-4o was prompted to classify each spectrogram using both zero-shot and few-shot strategies. Model outputs were evaluated against ground truth labels using performance metrics including accuracy, precision, recall, and F1-score. Few-shot prompting improved overall accuracy (0.363 vs. 0.320) and yielded modest gains in precision (0.316 vs. 0.283), recall (0.300 vs. 0.287), and F1-score (0.308 vs. 0.285) across labels. McNemar's test indicated a statistically significant difference in performance between prompting strategies (p < 0.001). Model repeatability analysis demonstrated high agreement (κ = 0.76-0.88; agreement: 89-96%), indicating excellent consistency. GPT-4o demonstrated limited but statistically significant performance gains using few-shot prompting for lung sound classification. While current performance remains insufficient for clinical deployment, this prompt-based approach provides a baseline for spectrogram-based multimodal tasks and a foundation for future exploration of prompt-based multimodal inference.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"5 1","pages":"e0001179"},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12779135/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145918953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07eCollection Date: 2026-01-01DOI: 10.1371/journal.pdig.0001119
Bo Jiang, Weijun Situ, Zhichao Feng, Jianmin Yuan, Yina Wang, Xiaofan Chen, Xiong Wu, Kai Deng, Haitao Yang, Xiao Xiao, Xi Guo, Junjiao Hu
This study aimed to develop and validate an artificial intelligence (AI) model for the non-invasive early detection of dyslipidemia using liver chemical shift-encoded MRI (CSE-MRI) fat maps. An automated AI pipeline was developed to predict abnormalities in four lipid indicators: triglyceride, total cholesterol, low-density lipoprotein cholesterol, and high-density lipoprotein cholesterol. The study utilized 1,757 liver CSE-MRI fat images from 89 patients who underwent MRI scans and contemporaneous blood lipid testing. Transfer learning was applied using several pre-trained networks, including ResNet18, MobileNet, DenseNet, AlexNet, and SqueezeNet. Model performance was evaluated via 8-fold cross-validation, with the optimal model further assessed on a held-out test set using confusion matrices and derived metrics. Significant performance differences were observed among models. The optimal model, based on ResNet18, demonstrated high accuracy in the internal validation set. On the independent test set, this model achieved accuracies of 0.853 for triglyceride, 0.833 for total cholesterol, 0.937 for low-density lipoprotein cholesterol, and 0.936 for high-density lipoprotein cholesterol, with corresponding F1-Scores of 0.885, 0.571, 0.886, and 0.897. The AI model based on liver CSE-MRI fat maps shows high accuracy and generalization in predicting abnormalities for three key lipid indices, validating its potential as an early warning tool for dyslipidemia. Expanding the training dataset could further enhance performance for all lipid indices.
{"title":"Development and validation of an artificial intelligence model based on liver CSE-MRI fat maps for predicting dyslipidemia.","authors":"Bo Jiang, Weijun Situ, Zhichao Feng, Jianmin Yuan, Yina Wang, Xiaofan Chen, Xiong Wu, Kai Deng, Haitao Yang, Xiao Xiao, Xi Guo, Junjiao Hu","doi":"10.1371/journal.pdig.0001119","DOIUrl":"10.1371/journal.pdig.0001119","url":null,"abstract":"<p><p>This study aimed to develop and validate an artificial intelligence (AI) model for the non-invasive early detection of dyslipidemia using liver chemical shift-encoded MRI (CSE-MRI) fat maps. An automated AI pipeline was developed to predict abnormalities in four lipid indicators: triglyceride, total cholesterol, low-density lipoprotein cholesterol, and high-density lipoprotein cholesterol. The study utilized 1,757 liver CSE-MRI fat images from 89 patients who underwent MRI scans and contemporaneous blood lipid testing. Transfer learning was applied using several pre-trained networks, including ResNet18, MobileNet, DenseNet, AlexNet, and SqueezeNet. Model performance was evaluated via 8-fold cross-validation, with the optimal model further assessed on a held-out test set using confusion matrices and derived metrics. Significant performance differences were observed among models. The optimal model, based on ResNet18, demonstrated high accuracy in the internal validation set. On the independent test set, this model achieved accuracies of 0.853 for triglyceride, 0.833 for total cholesterol, 0.937 for low-density lipoprotein cholesterol, and 0.936 for high-density lipoprotein cholesterol, with corresponding F1-Scores of 0.885, 0.571, 0.886, and 0.897. The AI model based on liver CSE-MRI fat maps shows high accuracy and generalization in predicting abnormalities for three key lipid indices, validating its potential as an early warning tool for dyslipidemia. Expanding the training dataset could further enhance performance for all lipid indices.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"5 1","pages":"e0001119"},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12779152/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145919029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06eCollection Date: 2026-01-01DOI: 10.1371/journal.pdig.0001154
Grace Williamson, Toslima Khatun, Kate King, Amos Simms, Simon Dymond, Laura Goodwin, Ewan Carr, Nicola T Fear, Dominic Murphy, Daniel Leightley
Frontline occupations, including military, healthcare, and first responders, often include frequent exposure to traumatic events, increasing the risk of substance use disorders (SUDs). Research has shown that those in high-intensity occupations are at higher risk of developing SUDs compared to the general population. Women face unique experiences related to substance use, including greater functional impairment and barriers to treatment access. Yet, understanding of the effectiveness of digital health technologies in addressing substance use among women in frontline occupations is limited. This systematic review evaluates the effectiveness of digital health interventions in reducing substance use among women in frontline roles. Four databases (PsycINFO, Ovid MEDLINE, Embase, PsycArticles) were searched for English language full-text articles (2007-2024) that (1) evaluated a digital intervention designed to reduce substance use, (2) reported changes in substance use outcomes such as frequency, intensity or duration, using validated tools (3) included current or former frontline public service workers, and (4) included women as the primary target population or as a subgroup within the sample. 13 papers met inclusion criteria, focusing on eight distinct web and mobile-based interventions for alcohol, tobacco and illicit substances. Most studies (n = 11) reported substantial post-intervention reductions in alcohol and tobacco use, although results for PTSD symptoms, illicit drug use, and quality of life were mixed. This review highlights the potential of digital health interventions for reducing substance use but underscores significant gaps in research. The scarcity of studies focused on women, small and heterogeneous samples, and focus on veterans limits the generalisability to women in frontline roles. These gaps present a pressing challenge in understanding gender-specific digital intervention efficacy. Future research should prioritise larger, representative samples of women across diverse frontline occupations to drive the development of digital technologies tailored to the unique challenges faced by women in these roles.
{"title":"Digital health interventions for women in frontline public service roles: A systematic review of effectiveness in reducing substance use.","authors":"Grace Williamson, Toslima Khatun, Kate King, Amos Simms, Simon Dymond, Laura Goodwin, Ewan Carr, Nicola T Fear, Dominic Murphy, Daniel Leightley","doi":"10.1371/journal.pdig.0001154","DOIUrl":"10.1371/journal.pdig.0001154","url":null,"abstract":"<p><p>Frontline occupations, including military, healthcare, and first responders, often include frequent exposure to traumatic events, increasing the risk of substance use disorders (SUDs). Research has shown that those in high-intensity occupations are at higher risk of developing SUDs compared to the general population. Women face unique experiences related to substance use, including greater functional impairment and barriers to treatment access. Yet, understanding of the effectiveness of digital health technologies in addressing substance use among women in frontline occupations is limited. This systematic review evaluates the effectiveness of digital health interventions in reducing substance use among women in frontline roles. Four databases (PsycINFO, Ovid MEDLINE, Embase, PsycArticles) were searched for English language full-text articles (2007-2024) that (1) evaluated a digital intervention designed to reduce substance use, (2) reported changes in substance use outcomes such as frequency, intensity or duration, using validated tools (3) included current or former frontline public service workers, and (4) included women as the primary target population or as a subgroup within the sample. 13 papers met inclusion criteria, focusing on eight distinct web and mobile-based interventions for alcohol, tobacco and illicit substances. Most studies (n = 11) reported substantial post-intervention reductions in alcohol and tobacco use, although results for PTSD symptoms, illicit drug use, and quality of life were mixed. This review highlights the potential of digital health interventions for reducing substance use but underscores significant gaps in research. The scarcity of studies focused on women, small and heterogeneous samples, and focus on veterans limits the generalisability to women in frontline roles. These gaps present a pressing challenge in understanding gender-specific digital intervention efficacy. Future research should prioritise larger, representative samples of women across diverse frontline occupations to drive the development of digital technologies tailored to the unique challenges faced by women in these roles.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"5 1","pages":"e0001154"},"PeriodicalIF":7.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12773817/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145914164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06eCollection Date: 2026-01-01DOI: 10.1371/journal.pdig.0001172
Jennifer K Bertrand, Margaret L McNeely, Jack Bates, Joshua Joy, Jenil Kanani, Victor E Ezeugwu, Puneeta Tandon
Physical performance tests such as the 30-second Sit-to-Stand (30s-STS), Timed Up and Go (TUG), and Short Physical Performance Battery (SPPB) are widely used to assess physical function in older adults and are predictive of key health outcomes. However, their routine use in clinical practice is limited by time, resource, and personnel constraints. This study aimed to validate the automated scoring of physical performance assessments using a mobile, markerless motion capture (MMC) app compared to scoring by a certified exercise physiologist (CEP), and to quantify the rate and reasons for technology-related data loss. 228 adults (mean age = 61.6 ± 11.9 years) with at least one chronic medical condition were enrolled. Participants completed seven performance assessments: 30s-STS, TUG, and all components of the SPPB (Side-by-Side, Semi-Tandem and Tandem balance stands, 5-times Sit-to-Stand (5xSTS), and Gait Speed). All tests were scored simultaneously by a CEP and the MMC app using a Light Detection and Ranging (LiDAR)-enabled iPad. Agreement was assessed using intraclass correlation coefficients (ICCs) and weighted Cohen's kappa. Agreement between the MMC app and CEP was good to excellent for all assessments. ICCs ranged from 0.812 (Tandem Stand) to 0.995 (5xSTS). The overall SPPB score showed almost perfect agreement (κ = 0.808). Perfect agreement with no variability was observed for the Side-by-Side and Semi-Tandem balance tests. The overall tech-related data loss rate was low (3.1%), with the most common issue being poor motion tracking quality (1.3%). Automated scoring of physical performance tests using a self-contained MMC app demonstrated high agreement with expert scoring and low data loss in a cohort of participants with a range of chronic medical conditions. These findings support the use of MMC-enabled mobile applications for scalable, accessible, and objective assessment of physical function in clinical settings, with future potential for remote and asynchronous use.
{"title":"Validation of a markerless motion capture app for automated scoring of sit-to-stand, timed up and go, and short physical performance battery tests in adults with chronic disease.","authors":"Jennifer K Bertrand, Margaret L McNeely, Jack Bates, Joshua Joy, Jenil Kanani, Victor E Ezeugwu, Puneeta Tandon","doi":"10.1371/journal.pdig.0001172","DOIUrl":"10.1371/journal.pdig.0001172","url":null,"abstract":"<p><p>Physical performance tests such as the 30-second Sit-to-Stand (30s-STS), Timed Up and Go (TUG), and Short Physical Performance Battery (SPPB) are widely used to assess physical function in older adults and are predictive of key health outcomes. However, their routine use in clinical practice is limited by time, resource, and personnel constraints. This study aimed to validate the automated scoring of physical performance assessments using a mobile, markerless motion capture (MMC) app compared to scoring by a certified exercise physiologist (CEP), and to quantify the rate and reasons for technology-related data loss. 228 adults (mean age = 61.6 ± 11.9 years) with at least one chronic medical condition were enrolled. Participants completed seven performance assessments: 30s-STS, TUG, and all components of the SPPB (Side-by-Side, Semi-Tandem and Tandem balance stands, 5-times Sit-to-Stand (5xSTS), and Gait Speed). All tests were scored simultaneously by a CEP and the MMC app using a Light Detection and Ranging (LiDAR)-enabled iPad. Agreement was assessed using intraclass correlation coefficients (ICCs) and weighted Cohen's kappa. Agreement between the MMC app and CEP was good to excellent for all assessments. ICCs ranged from 0.812 (Tandem Stand) to 0.995 (5xSTS). The overall SPPB score showed almost perfect agreement (κ = 0.808). Perfect agreement with no variability was observed for the Side-by-Side and Semi-Tandem balance tests. The overall tech-related data loss rate was low (3.1%), with the most common issue being poor motion tracking quality (1.3%). Automated scoring of physical performance tests using a self-contained MMC app demonstrated high agreement with expert scoring and low data loss in a cohort of participants with a range of chronic medical conditions. These findings support the use of MMC-enabled mobile applications for scalable, accessible, and objective assessment of physical function in clinical settings, with future potential for remote and asynchronous use.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"5 1","pages":"e0001172"},"PeriodicalIF":7.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12773808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145914159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06eCollection Date: 2026-01-01DOI: 10.1371/journal.pdig.0001192
Maryann Rogers, Lindsay Hedden, Kimberlyn McGrail, Michael R Law
Upon the emergence of COVID-19, virtual alternatives to in-person care developed quickly to meet the need of physicians to maintain medical distancing from patients. Virtual care has since become a mainstay in the landscape of primary care with many physicians providing both virtual and in-person visit options within their practice. However, due to its rapid development, questions have been raised regarding the quality of virtual care compared to its in-person alternative, particularly in terms of prescribing appropriateness. Thus, we examined whether global prescribing patterns differed between virtual and in-person physician visits following the onset of the COVID-19 pandemic. We conducted a scoping review of global literature with narrative synthesis to assess whether and how prescribing patterns differed between virtual and in-person care. This review revealed mixed findings, with the majority of studies reporting no significant difference in medication or antibiotic prescribing rates. Some weak evidence suggested virtual care may be associated with greater adherence to clinical guidelines. However, the predominance of United States based studies and methodological limitations precluded strong conclusions, particularly for the Canadian context. Our scoping review found no consensus in the global literature on how prescribing patterns differ between virtual and in-person care. The methodological weaknesses and limited generalizability of the existing body of evidence highlights the need for further high-quality research in a broader range of settings.
{"title":"The impact of virtual care on drug prescribing practices: A scoping review.","authors":"Maryann Rogers, Lindsay Hedden, Kimberlyn McGrail, Michael R Law","doi":"10.1371/journal.pdig.0001192","DOIUrl":"10.1371/journal.pdig.0001192","url":null,"abstract":"<p><p>Upon the emergence of COVID-19, virtual alternatives to in-person care developed quickly to meet the need of physicians to maintain medical distancing from patients. Virtual care has since become a mainstay in the landscape of primary care with many physicians providing both virtual and in-person visit options within their practice. However, due to its rapid development, questions have been raised regarding the quality of virtual care compared to its in-person alternative, particularly in terms of prescribing appropriateness. Thus, we examined whether global prescribing patterns differed between virtual and in-person physician visits following the onset of the COVID-19 pandemic. We conducted a scoping review of global literature with narrative synthesis to assess whether and how prescribing patterns differed between virtual and in-person care. This review revealed mixed findings, with the majority of studies reporting no significant difference in medication or antibiotic prescribing rates. Some weak evidence suggested virtual care may be associated with greater adherence to clinical guidelines. However, the predominance of United States based studies and methodological limitations precluded strong conclusions, particularly for the Canadian context. Our scoping review found no consensus in the global literature on how prescribing patterns differ between virtual and in-person care. The methodological weaknesses and limited generalizability of the existing body of evidence highlights the need for further high-quality research in a broader range of settings.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"5 1","pages":"e0001192"},"PeriodicalIF":7.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12773812/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145914091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06eCollection Date: 2026-01-01DOI: 10.1371/journal.pdig.0001193
Laurel O'Connor, Leah Dunkel, Andrew C Weitz, Allan Walkey, Peter K Lindenauer, Apurv Soni
Digital health technologies (DHTs) expand healthcare access, improve care coordination, and reduce costs. However, integrating these tools into care faces complex barriers. Understanding the perspectives of health system leaders is essential for developing sustainable DHTs. The objective of this project is to explore the experiences and priorities of health system stakeholders regarding the implementation of DHTs. The study team conducted semi-structured interviews with 12 stakeholders from diverse U.S. health systems, including clinical, operational, and executive leadership. Interviewees were selected using purposeful and snowball sampling. Interviews were transcribed and analyzed thematically using the Consolidated Framework for Implementation Research (CFIR). A constant comparative coding process was used to identify and organize key themes. Participants viewed DHTs as a way to enhance healthcare access and efficiency and improve public health operations, especially in rural or underserved settings. However, several major adoption challenges emerged: (1) integrating DHTs into existing workflows and electronic health records is operationally burdensome; (2) digital care can introduce risks to quality, continuity, and equity; and (3) external factors (reimbursement policy, regulatory constraints, infrastructure investment) are critical to long-term adoption. Digital health is seen as essential to the future of healthcare delivery, but meaningful integration requires alignment across clinical, operational, and policy domains. Coordinated investment, regulatory reform, and robust data infrastructure are needed to ensure DHTs are scalable and sustainable.
{"title":"Digitally delivered, systemically challenged: A qualitative study of health system readiness for digital care.","authors":"Laurel O'Connor, Leah Dunkel, Andrew C Weitz, Allan Walkey, Peter K Lindenauer, Apurv Soni","doi":"10.1371/journal.pdig.0001193","DOIUrl":"10.1371/journal.pdig.0001193","url":null,"abstract":"<p><p>Digital health technologies (DHTs) expand healthcare access, improve care coordination, and reduce costs. However, integrating these tools into care faces complex barriers. Understanding the perspectives of health system leaders is essential for developing sustainable DHTs. The objective of this project is to explore the experiences and priorities of health system stakeholders regarding the implementation of DHTs. The study team conducted semi-structured interviews with 12 stakeholders from diverse U.S. health systems, including clinical, operational, and executive leadership. Interviewees were selected using purposeful and snowball sampling. Interviews were transcribed and analyzed thematically using the Consolidated Framework for Implementation Research (CFIR). A constant comparative coding process was used to identify and organize key themes. Participants viewed DHTs as a way to enhance healthcare access and efficiency and improve public health operations, especially in rural or underserved settings. However, several major adoption challenges emerged: (1) integrating DHTs into existing workflows and electronic health records is operationally burdensome; (2) digital care can introduce risks to quality, continuity, and equity; and (3) external factors (reimbursement policy, regulatory constraints, infrastructure investment) are critical to long-term adoption. Digital health is seen as essential to the future of healthcare delivery, but meaningful integration requires alignment across clinical, operational, and policy domains. Coordinated investment, regulatory reform, and robust data infrastructure are needed to ensure DHTs are scalable and sustainable.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"5 1","pages":"e0001193"},"PeriodicalIF":7.7,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12773818/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145914082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05eCollection Date: 2026-01-01DOI: 10.1371/journal.pdig.0001013
Sebastián Andrés Cajas Ordóñez, Rowell Castro, Leo Anthony Celi, Roben Delos Reyes, Justin Engelmann, Ari Ercole, Almog Hilel, Mahima Kalla, Leo Kinyera, Maximin Lange, Torleif Markussen Lunde, Mackenzie J Meni, Anna E Premo, Jana Sedlakova
Contemporary medical AI systems exhibit a critical vulnerability: they deliver confident predictions without mechanisms to express uncertainty or acknowledge limitations, leading to dangerous overreliance in clinical settings. This paper introduces the BODHI (Bridging, Open, Discerning, Humble, Inquiring) framework, a dual-reflective architecture grounded in two essential epistemic virtues: curiosity and humility, as foundational design principles for healthcare AI. Curiosity drives systems to actively explore diagnostic uncertainty, seek additional information when faced with ambiguous presentations, and recognize when training distributions fail to match clinical reality. Humility provides complementary restraint, enabling uncertainty quantification, boundary recognition, and appropriate deference to human expertise. We demonstrate how these virtues function synergistically in a dynamic feedback loop, preventing both reckless exploration and excessive caution while supporting collaborative clinical decision-making. Drawing from psychological theories of curiosity and cross-species evidence of epistemic humility, we argue that these capacities represent fundamental biological design principles essential for systems operating in high-stakes, uncertain environments. The BODHI framework addresses systemic failures in medical AI deployment, from biased training data to institutional workflow pressures, by embedding uncertainty awareness and collaborative restraint into foundational system architecture. Key implementation features include calibrated confidence measures, out-of-distribution detection, curiosity-driven escalation protocols, and transparency mechanisms that adapt to clinical context. Rather than pursuing algorithmic perfection through pure optimization, we advocate for human-AI partnerships that enhance clinical reasoning through mutual accountability and calibrated trust. This approach represents a paradigm shift from overconfident automation toward collaborative systems that embody the wisdom to pause, reflect, and defer when appropriate.
{"title":"Beyond overconfidence: Embedding curiosity and humility for ethical medical AI.","authors":"Sebastián Andrés Cajas Ordóñez, Rowell Castro, Leo Anthony Celi, Roben Delos Reyes, Justin Engelmann, Ari Ercole, Almog Hilel, Mahima Kalla, Leo Kinyera, Maximin Lange, Torleif Markussen Lunde, Mackenzie J Meni, Anna E Premo, Jana Sedlakova","doi":"10.1371/journal.pdig.0001013","DOIUrl":"10.1371/journal.pdig.0001013","url":null,"abstract":"<p><p>Contemporary medical AI systems exhibit a critical vulnerability: they deliver confident predictions without mechanisms to express uncertainty or acknowledge limitations, leading to dangerous overreliance in clinical settings. This paper introduces the BODHI (Bridging, Open, Discerning, Humble, Inquiring) framework, a dual-reflective architecture grounded in two essential epistemic virtues: curiosity and humility, as foundational design principles for healthcare AI. Curiosity drives systems to actively explore diagnostic uncertainty, seek additional information when faced with ambiguous presentations, and recognize when training distributions fail to match clinical reality. Humility provides complementary restraint, enabling uncertainty quantification, boundary recognition, and appropriate deference to human expertise. We demonstrate how these virtues function synergistically in a dynamic feedback loop, preventing both reckless exploration and excessive caution while supporting collaborative clinical decision-making. Drawing from psychological theories of curiosity and cross-species evidence of epistemic humility, we argue that these capacities represent fundamental biological design principles essential for systems operating in high-stakes, uncertain environments. The BODHI framework addresses systemic failures in medical AI deployment, from biased training data to institutional workflow pressures, by embedding uncertainty awareness and collaborative restraint into foundational system architecture. Key implementation features include calibrated confidence measures, out-of-distribution detection, curiosity-driven escalation protocols, and transparency mechanisms that adapt to clinical context. Rather than pursuing algorithmic perfection through pure optimization, we advocate for human-AI partnerships that enhance clinical reasoning through mutual accountability and calibrated trust. This approach represents a paradigm shift from overconfident automation toward collaborative systems that embody the wisdom to pause, reflect, and defer when appropriate.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"5 1","pages":"e0001013"},"PeriodicalIF":7.7,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12768375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145907384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-02eCollection Date: 2026-01-01DOI: 10.1371/journal.pdig.0001177
Jamboor K Vishwanatha, Allison Christian, Usha Sambamoorthi, Erika L Thompson, Katie Stinson, Toufeeq Ahmed Syed
[This corrects the article DOI: 10.1371/journal.pdig.0000288.].
[这更正了文章DOI: 10.1371/journal.pdig.0000288.]。
{"title":"Correction: Community perspectives on AI/ML and health equity: AIM-AHEAD nationwide stakeholder listening sessions.","authors":"Jamboor K Vishwanatha, Allison Christian, Usha Sambamoorthi, Erika L Thompson, Katie Stinson, Toufeeq Ahmed Syed","doi":"10.1371/journal.pdig.0001177","DOIUrl":"10.1371/journal.pdig.0001177","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.1371/journal.pdig.0000288.].</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"5 1","pages":"e0001177"},"PeriodicalIF":7.7,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758787/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145892656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-02eCollection Date: 2026-01-01DOI: 10.1371/journal.pdig.0001170
Rose Wing Lai So, Kit Ying Chan, Christopher Chi Wai Cheng, Ngan Yin Chan, Shirley Xin Li, Joey Wing Yan Chan, Steven Wai Ho Chau, Yun Kwok Wing, Tim Man Ho Li
Digital cognitive behavioral therapy for insomnia (dCBT-I) is effective in treating insomnia, but adherence remains a major challenge in real-world applications. Machine learning (ML) offers potential in predicting healthcare utilization. This study applied ML techniques to predict adherence to dCBT-I based on participant baseline characteristics. This pilot real-world study included 75 individuals (69% female; 41% aged 35-44 years) with insomnia symptoms (Insomnia Severity Index, ISI ≥ 8) who participated in a 28-day chatbot-delivered dCBT-I program. ML models, including logistic regression with elastic-net penalty, support vector machine, random forest, and gradient boosting, analyzed participant baseline characteristics to predict adherence to dCBT-I in terms of session completion, usage duration, and response volume. These models were fine-tuned using grid search and evaluated with cross-validation. The synthetic minority over-sampling technique was applied to address data imbalances in the training set. Baseline depressive symptoms were the most predictive of non-adherence. Higher depressive symptoms were associated with shorter overall usage duration (β = -3.57, 95% CI: -5.82 to -1.33, p = .002). Longer sleep onset latency and wake time after sleep onset from the previous night increased the number of responses and longer usage duration on the following day (β = 0.01-0.05, p < .05). No significant associations were found between daytime and bedtime usage and sleep parameters for that specific night. ML models predicted overall adherence, with AUCs of 0.65-0.91 (p < .05). ML models also predicted next-day adherence, with AUCs of 0.56-0.74 (p < .05). This real-world study demonstrates the potential of ML to predict user adherence to dCBT-I and provides clinical insights for personalizing sleep-focused treatments. The study also investigated daily usage and adherence patterns in dCBT-I to predict next-day adherence.
{"title":"Predicting adherence to fully-automated, chatbot-delivered digital cognitive behavioral therapy for insomnia (dCBT-I) using machine learning: A pilot real-world study.","authors":"Rose Wing Lai So, Kit Ying Chan, Christopher Chi Wai Cheng, Ngan Yin Chan, Shirley Xin Li, Joey Wing Yan Chan, Steven Wai Ho Chau, Yun Kwok Wing, Tim Man Ho Li","doi":"10.1371/journal.pdig.0001170","DOIUrl":"10.1371/journal.pdig.0001170","url":null,"abstract":"<p><p>Digital cognitive behavioral therapy for insomnia (dCBT-I) is effective in treating insomnia, but adherence remains a major challenge in real-world applications. Machine learning (ML) offers potential in predicting healthcare utilization. This study applied ML techniques to predict adherence to dCBT-I based on participant baseline characteristics. This pilot real-world study included 75 individuals (69% female; 41% aged 35-44 years) with insomnia symptoms (Insomnia Severity Index, ISI ≥ 8) who participated in a 28-day chatbot-delivered dCBT-I program. ML models, including logistic regression with elastic-net penalty, support vector machine, random forest, and gradient boosting, analyzed participant baseline characteristics to predict adherence to dCBT-I in terms of session completion, usage duration, and response volume. These models were fine-tuned using grid search and evaluated with cross-validation. The synthetic minority over-sampling technique was applied to address data imbalances in the training set. Baseline depressive symptoms were the most predictive of non-adherence. Higher depressive symptoms were associated with shorter overall usage duration (β = -3.57, 95% CI: -5.82 to -1.33, p = .002). Longer sleep onset latency and wake time after sleep onset from the previous night increased the number of responses and longer usage duration on the following day (β = 0.01-0.05, p < .05). No significant associations were found between daytime and bedtime usage and sleep parameters for that specific night. ML models predicted overall adherence, with AUCs of 0.65-0.91 (p < .05). ML models also predicted next-day adherence, with AUCs of 0.56-0.74 (p < .05). This real-world study demonstrates the potential of ML to predict user adherence to dCBT-I and provides clinical insights for personalizing sleep-focused treatments. The study also investigated daily usage and adherence patterns in dCBT-I to predict next-day adherence.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"5 1","pages":"e0001170"},"PeriodicalIF":7.7,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758786/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145892722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-02eCollection Date: 2026-01-01DOI: 10.1371/journal.pdig.0000863
Tungamirirai Simbini, Emma Adimado, Samuel Adjorlolo, Lorena Guerrero-Torres, Prashanth Srinivas, Simukai Zizhou, Taddese Zerfu
Digital Health Interventions (DHIs) refer to discrete technological functionalities designed to achieve specific objectives in addressing health system challenges. These interventions are considered tools for strengthening health systems, particularly in low- and middle-income countries. This study consolidates findings from Ethiopia, Ghana, and Zimbabwe, examining how three distinct digital health applications with varying intervention components implemented in primary healthcare settings contribute to health system strengthening. The interventions analyzed include Ethiopia's District Health Information System 2 (DHIS2), Ghana's District Health Information Management System (DHIMS) and the Lightwave Health Information Management System (LHIMS), and Zimbabwe's Impilo Electronic Health Record (E-HR) system. In Ethiopia, DHIS2 enhanced health system accountability and data quality by streamlining district-level data aggregation, reporting, and performance monitoring. This led to more informed decision-making and improved resource distribution. In Ghana, DHIMSs functions as a public health-level DHI, facilitating national data-driven performance monitoring, while LHIMS operates at the patient level, supporting patient tracking and management, improving patient workflows and resource tracking. However, a lack of interoperability between these two systems has led to data duplication challenges. Zimbabwe's Impilo E-HR, a patient-level DHI, has streamlined clinical workflows, improved information sharing, and enhanced decision-making at the point of care. Despite these successes, challenges persist across the three contexts: infrastructure limitations, high staff turnover, and insufficient user technical capacity. Interoperability issues, particularly in Ghana and Ethiopia, hinder seamless data exchange, while sustainability concerns such as funding gaps and inadequate government support undermine the systems' full potential. The study findings demonstrate that investments in DHIs in primary healthcare may not result in health systems strengthening without addressing baseline conditions for their implementation and sustainability.
{"title":"Digital health interventions in strengthening primary healthcare systems in Sub-Saharan Africa: Insights from Ethiopia, Ghana, and Zimbabwe.","authors":"Tungamirirai Simbini, Emma Adimado, Samuel Adjorlolo, Lorena Guerrero-Torres, Prashanth Srinivas, Simukai Zizhou, Taddese Zerfu","doi":"10.1371/journal.pdig.0000863","DOIUrl":"10.1371/journal.pdig.0000863","url":null,"abstract":"<p><p>Digital Health Interventions (DHIs) refer to discrete technological functionalities designed to achieve specific objectives in addressing health system challenges. These interventions are considered tools for strengthening health systems, particularly in low- and middle-income countries. This study consolidates findings from Ethiopia, Ghana, and Zimbabwe, examining how three distinct digital health applications with varying intervention components implemented in primary healthcare settings contribute to health system strengthening. The interventions analyzed include Ethiopia's District Health Information System 2 (DHIS2), Ghana's District Health Information Management System (DHIMS) and the Lightwave Health Information Management System (LHIMS), and Zimbabwe's Impilo Electronic Health Record (E-HR) system. In Ethiopia, DHIS2 enhanced health system accountability and data quality by streamlining district-level data aggregation, reporting, and performance monitoring. This led to more informed decision-making and improved resource distribution. In Ghana, DHIMSs functions as a public health-level DHI, facilitating national data-driven performance monitoring, while LHIMS operates at the patient level, supporting patient tracking and management, improving patient workflows and resource tracking. However, a lack of interoperability between these two systems has led to data duplication challenges. Zimbabwe's Impilo E-HR, a patient-level DHI, has streamlined clinical workflows, improved information sharing, and enhanced decision-making at the point of care. Despite these successes, challenges persist across the three contexts: infrastructure limitations, high staff turnover, and insufficient user technical capacity. Interoperability issues, particularly in Ghana and Ethiopia, hinder seamless data exchange, while sustainability concerns such as funding gaps and inadequate government support undermine the systems' full potential. The study findings demonstrate that investments in DHIs in primary healthcare may not result in health systems strengthening without addressing baseline conditions for their implementation and sustainability.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"5 1","pages":"e0000863"},"PeriodicalIF":7.7,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758792/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145892611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}