Pub Date : 2025-12-10eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0000955
T'ng Chang Kwok, Chao Chen, Jayaprakash Veeravalli, Carol A C Coupland, Edmund Juszczak, Jonathan Garibaldi, Kirsten Mitchell, Kate L Francis, Christopher J D McKinlay, Brett J Manley, Don Sharkey
Decision-making in perinatal management of extremely preterm infants is challenging. Mortality prediction tools may support decision-making. We used population-based routinely entered electronic patient record data from 25,902 infants born between 23+0-27+6 weeks' gestation and admitted to 185 English and Welsh neonatal units from 2010-2020 to develop and internally validate an online tool to predict mortality before neonatal discharge. Comparing nine machine learning approaches, we developed an explainable tool based on stepwise backward logistic regression (https://premoutcome.shinyapps.io/Death/). The tool demonstrated good discrimination (area under the receiver operating characteristics curve (95% confidence interval) of 0.746 (0.729-0.762)) and calibration with superior net benefit across probability thresholds of 10%-70%. Our tool also demonstrated superior calibration and utility performance than previously published models. Acceptable performance was demonstrated in a multinational, external validation cohort of preterm infants. This tool may be useful to support high-risk perinatal decision-making following further evaluation.
{"title":"Developing and validating an explainable digital mortality prediction tool for extremely preterm infants.","authors":"T'ng Chang Kwok, Chao Chen, Jayaprakash Veeravalli, Carol A C Coupland, Edmund Juszczak, Jonathan Garibaldi, Kirsten Mitchell, Kate L Francis, Christopher J D McKinlay, Brett J Manley, Don Sharkey","doi":"10.1371/journal.pdig.0000955","DOIUrl":"10.1371/journal.pdig.0000955","url":null,"abstract":"<p><p>Decision-making in perinatal management of extremely preterm infants is challenging. Mortality prediction tools may support decision-making. We used population-based routinely entered electronic patient record data from 25,902 infants born between 23+0-27+6 weeks' gestation and admitted to 185 English and Welsh neonatal units from 2010-2020 to develop and internally validate an online tool to predict mortality before neonatal discharge. Comparing nine machine learning approaches, we developed an explainable tool based on stepwise backward logistic regression (https://premoutcome.shinyapps.io/Death/). The tool demonstrated good discrimination (area under the receiver operating characteristics curve (95% confidence interval) of 0.746 (0.729-0.762)) and calibration with superior net benefit across probability thresholds of 10%-70%. Our tool also demonstrated superior calibration and utility performance than previously published models. Acceptable performance was demonstrated in a multinational, external validation cohort of preterm infants. This tool may be useful to support high-risk perinatal decision-making following further evaluation.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0000955"},"PeriodicalIF":7.7,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12694798/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0001106
Moran Sorka, Alon Gorenshtein, Dvir Aran, Shahar Shelly
Large language models (LLMs) have demonstrated impressive capabilities in medical domains, yet their ability to handle the specialized reasoning patterns required in clinical neurology warrants systematic evaluation. Neurological assessment presents distinctive challenges that combine anatomical localization, temporal pattern recognition, and nuanced symptom interpretation-cognitive processes that are specifically tested in board certification examinations. We developed a comprehensive benchmark comprising 305 questions from Israeli Board Certification Exams in Neurology and classified each along three dimensions of complexity: factual knowledge depth, clinical concept integration, and reasoning complexity. We evaluated ten LLMs of varying architectures and specializations using this benchmark, testing base models, retrieval-augmented generation (RAG) enhancement, and a novel multi-agent system. Our analysis revealed significant performance variation across models and methodologies. The OpenAI-o1 model achieved the highest base performance (90.9% accuracy), while specialized medical models performed surprisingly poorly (52.9% for Meditron-70B). RAG enhancement provided variable benefits across models; substantial improvements for mid-tier models like GPT-4o (80.5% to 87.3%) and smaller models, but limited effectiveness on the highest complexity questions regardless of model size. In contrast, our multi-agent framework-which decomposes neurological reasoning into specialized cognitive functions including question analysis, knowledge retrieval, answer synthesis, and validation-achieved dramatic improvements, especially for mid-range models. The LLaMA 3.3-70B-based agentic system reached 89.2% accuracy compared to 69.5% for its base model, with particularly substantial gains on level 3 complexity questions across all dimensions. External validation on MedQA revealed dataset-specific RAG effects: while RAG improved board certification performance, it showed minimal benefit on MedQA questions (LLaMA 3.3-70B: + 1.4% vs + 3.9% on board exams), reflecting alignment between our specialized neurology textbook and board examination content rather than the broader medical knowledge required for MedQA. Most notably, the multi-agent approach transformed inconsistent subspecialty performance into remarkably uniform excellence, effectively addressing the neurological reasoning challenges that persisted even with RAG enhancement. We further validated our approach using an independent dataset comprising 155 neurological cases extracted from MedQA. The results confirm that structured multi-agent approaches designed to emulate specialized cognitive processes significantly enhance complex medical reasoning offering promising directions for AI assistance in challenging clinical contexts.
{"title":"A multi-agent approach to neurological clinical reasoning.","authors":"Moran Sorka, Alon Gorenshtein, Dvir Aran, Shahar Shelly","doi":"10.1371/journal.pdig.0001106","DOIUrl":"10.1371/journal.pdig.0001106","url":null,"abstract":"<p><p>Large language models (LLMs) have demonstrated impressive capabilities in medical domains, yet their ability to handle the specialized reasoning patterns required in clinical neurology warrants systematic evaluation. Neurological assessment presents distinctive challenges that combine anatomical localization, temporal pattern recognition, and nuanced symptom interpretation-cognitive processes that are specifically tested in board certification examinations. We developed a comprehensive benchmark comprising 305 questions from Israeli Board Certification Exams in Neurology and classified each along three dimensions of complexity: factual knowledge depth, clinical concept integration, and reasoning complexity. We evaluated ten LLMs of varying architectures and specializations using this benchmark, testing base models, retrieval-augmented generation (RAG) enhancement, and a novel multi-agent system. Our analysis revealed significant performance variation across models and methodologies. The OpenAI-o1 model achieved the highest base performance (90.9% accuracy), while specialized medical models performed surprisingly poorly (52.9% for Meditron-70B). RAG enhancement provided variable benefits across models; substantial improvements for mid-tier models like GPT-4o (80.5% to 87.3%) and smaller models, but limited effectiveness on the highest complexity questions regardless of model size. In contrast, our multi-agent framework-which decomposes neurological reasoning into specialized cognitive functions including question analysis, knowledge retrieval, answer synthesis, and validation-achieved dramatic improvements, especially for mid-range models. The LLaMA 3.3-70B-based agentic system reached 89.2% accuracy compared to 69.5% for its base model, with particularly substantial gains on level 3 complexity questions across all dimensions. External validation on MedQA revealed dataset-specific RAG effects: while RAG improved board certification performance, it showed minimal benefit on MedQA questions (LLaMA 3.3-70B: + 1.4% vs + 3.9% on board exams), reflecting alignment between our specialized neurology textbook and board examination content rather than the broader medical knowledge required for MedQA. Most notably, the multi-agent approach transformed inconsistent subspecialty performance into remarkably uniform excellence, effectively addressing the neurological reasoning challenges that persisted even with RAG enhancement. We further validated our approach using an independent dataset comprising 155 neurological cases extracted from MedQA. The results confirm that structured multi-agent approaches designed to emulate specialized cognitive processes significantly enhance complex medical reasoning offering promising directions for AI assistance in challenging clinical contexts.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001106"},"PeriodicalIF":7.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12677565/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145679720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1371/journal.pdig.0001109
Prajwal Ghimire, Keyoumars Ashkan
The term "artificial" implies an inherent dichotomy from the natural or organic. However, AI, as we know it, is a product of organic ingenuity-designed, implemented, and iteratively improved by human cognition. The very principles that underpin AI systems, from neural networks to decision-making algorithms, are inspired by the organic intelligence embedded in human neurobiology and evolutionary processes. The path from "organic" to "artificial" intelligence in digital health is neither mystical nor merely a matter of parameter count-it is fundamentally about organization and adaption. Thus, the boundaries between "artificial" and "organic" are far less distinct than the nomenclature suggests.
{"title":"From artificial to organic: Rethinking the roots of intelligence for digital health.","authors":"Prajwal Ghimire, Keyoumars Ashkan","doi":"10.1371/journal.pdig.0001109","DOIUrl":"10.1371/journal.pdig.0001109","url":null,"abstract":"<p><p>The term \"artificial\" implies an inherent dichotomy from the natural or organic. However, AI, as we know it, is a product of organic ingenuity-designed, implemented, and iteratively improved by human cognition. The very principles that underpin AI systems, from neural networks to decision-making algorithms, are inspired by the organic intelligence embedded in human neurobiology and evolutionary processes. The path from \"organic\" to \"artificial\" intelligence in digital health is neither mystical nor merely a matter of parameter count-it is fundamentally about organization and adaption. Thus, the boundaries between \"artificial\" and \"organic\" are far less distinct than the nomenclature suggests.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001109"},"PeriodicalIF":7.7,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12668481/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26eCollection Date: 2025-11-01DOI: 10.1371/journal.pdig.0000951
John Tayu Lee, Sheng Hui Hsu, Vincent Cheng-Sheng Li, Kanya Anindya, Meng-Huan Chen, Charlotte Wang, Toby Kai-Bo Shen, Valerie Tzu Ning Liu, Hsiao-Hui Chen, Rifat Atun
Machine learning (ML) models are increasingly applied to predict body mass index (BMI) and related outcomes, yet their fairness across socioeconomic and caste groups remains uncertain, particularly in contexts of structural inequality. Using nationally representative data from more than 55,000 adults aged 45 years and older in the Longitudinal Ageing Study in India (LASI), we evaluated the accuracy and fairness of multiple ML algorithms-including Random Forest, XGBoost, Gradient Boosting, LightGBM, Deep Neural Networks, and Deep Cross Networks-alongside logistic regression for predicting underweight, overweight, and central adiposity. Models were trained on 80% of the data and tested on 20%, with performance assessed using AUROC, accuracy, sensitivity, specificity, and precision. Fairness was evaluated through subgroup analyses across socioeconomic and caste groups and equity-based metrics such as Equalized Odds and Demographic Parity. Feature importance was examined using SHAP values, and bias-mitigation methods were implemented at pre-processing, in-processing, and post-processing stages. Tree-based models, particularly LightGBM and Gradient Boosting, achieved the highest AUROC values (0.79-0.84). Incorporating socioeconomic and health-related variables improved prediction, but fairness gaps persisted: performance declined for scheduled tribes and lower socioeconomic groups. SHAP analyses identified grip strength, gender, and residence as key drivers of prediction differences. Among mitigation strategies, Reject Option Classification and Equalized Odds Post-processing moderately reduced subgroup disparities but sometimes decreased overall performance, whereas other approaches yielded minimal gains. ML models can effectively predict obesity and adiposity risk in India, but addressing bias is essential for equitable application. Continued refinement of fairness-aware ML methods is needed to support inclusive and effective public-health decision-making.
{"title":"Evaluating algorithmic fairness of machine learning models in predicting underweight, overweight, and adiposity across socioeconomic and caste groups in India: evidence from the longitudinal ageing study in India.","authors":"John Tayu Lee, Sheng Hui Hsu, Vincent Cheng-Sheng Li, Kanya Anindya, Meng-Huan Chen, Charlotte Wang, Toby Kai-Bo Shen, Valerie Tzu Ning Liu, Hsiao-Hui Chen, Rifat Atun","doi":"10.1371/journal.pdig.0000951","DOIUrl":"https://doi.org/10.1371/journal.pdig.0000951","url":null,"abstract":"<p><p>Machine learning (ML) models are increasingly applied to predict body mass index (BMI) and related outcomes, yet their fairness across socioeconomic and caste groups remains uncertain, particularly in contexts of structural inequality. Using nationally representative data from more than 55,000 adults aged 45 years and older in the Longitudinal Ageing Study in India (LASI), we evaluated the accuracy and fairness of multiple ML algorithms-including Random Forest, XGBoost, Gradient Boosting, LightGBM, Deep Neural Networks, and Deep Cross Networks-alongside logistic regression for predicting underweight, overweight, and central adiposity. Models were trained on 80% of the data and tested on 20%, with performance assessed using AUROC, accuracy, sensitivity, specificity, and precision. Fairness was evaluated through subgroup analyses across socioeconomic and caste groups and equity-based metrics such as Equalized Odds and Demographic Parity. Feature importance was examined using SHAP values, and bias-mitigation methods were implemented at pre-processing, in-processing, and post-processing stages. Tree-based models, particularly LightGBM and Gradient Boosting, achieved the highest AUROC values (0.79-0.84). Incorporating socioeconomic and health-related variables improved prediction, but fairness gaps persisted: performance declined for scheduled tribes and lower socioeconomic groups. SHAP analyses identified grip strength, gender, and residence as key drivers of prediction differences. Among mitigation strategies, Reject Option Classification and Equalized Odds Post-processing moderately reduced subgroup disparities but sometimes decreased overall performance, whereas other approaches yielded minimal gains. ML models can effectively predict obesity and adiposity risk in India, but addressing bias is essential for equitable application. Continued refinement of fairness-aware ML methods is needed to support inclusive and effective public-health decision-making.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 11","pages":"e0000951"},"PeriodicalIF":7.7,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12654920/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24eCollection Date: 2025-11-01DOI: 10.1371/journal.pdig.0001075
R Constance Wiener, Bayan J Abuhalimeh
Online/digital health literacy is important for individuals to evaluate the influence of such input in their care and consent for treatment. The purpose of this systematic review is to examine the digital health literacy level among adults in studies that used the eHealth Literacy Scale (eHEALS) as a measure of digital health literacy. The authors searched Google Scholar, PubMed, Scopus, and Web of Science for evidence following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Statement, 2020 (PRISMA). Included were articles in which the researchers evaluated the level of digital health literacy using eHEALS, were peer reviewed, written in English or in which English translation was provided, and were published between 2020-2025. There were 200 articles initially identified in the search, 180 were excluded resulting in a sample of 20 publications. EHEALS scores, with possibilities from 8-40, had a weighted mean of 24.3 (95%CI: 17.1-31.6). The lowest mean score was 12.57; and the highest mean score was 35.1. The highest eHEALS score was from a qualitative interview study. Nine other studies reported overall means ≥ 30. There were three with eHEALS scores below 20. Globally, there is a wide range of reported digital health literacy levels. It is critical that the public gains skill and confidence in digital health literacy for healthcare decisions. The results of this study provide evidence of a large range of digital health literacy.
在线/数字卫生素养对于个人评估此类投入对其护理和治疗同意的影响非常重要。本系统综述的目的是在使用电子健康素养量表(eHEALS)作为数字健康素养衡量标准的研究中检查成人的数字健康素养水平。作者检索了谷歌Scholar、PubMed、Scopus和Web of Science,寻找遵循2020年系统评价和元分析声明首选报告项目(PRISMA)的证据。其中包括研究人员使用eHEALS评估数字健康素养水平的文章,这些文章经过同行评审,用英文撰写或提供英文翻译,并在2020-2025年之间发表。最初在检索中确定了200篇文章,其中180篇被排除在外,最终得到了20篇出版物的样本。EHEALS评分的可能性范围为8-40,加权平均值为24.3 (95%CI: 17.1-31.6)。最低平均评分为12.57分;最高平均得分为35.1分。最高的eHEALS得分来自定性访谈研究。另有9项研究报告总平均值≥30。eHEALS得分低于20分的有3个。在全球范围内,报告的数字卫生素养水平差异很大。至关重要的是,公众在数字卫生素养方面获得技能和信心,从而做出医疗保健决策。这项研究的结果为数字健康素养的广泛普及提供了证据。
{"title":"Evaluating adult digital health literacy, 2020-2025: A systematic review.","authors":"R Constance Wiener, Bayan J Abuhalimeh","doi":"10.1371/journal.pdig.0001075","DOIUrl":"10.1371/journal.pdig.0001075","url":null,"abstract":"<p><p>Online/digital health literacy is important for individuals to evaluate the influence of such input in their care and consent for treatment. The purpose of this systematic review is to examine the digital health literacy level among adults in studies that used the eHealth Literacy Scale (eHEALS) as a measure of digital health literacy. The authors searched Google Scholar, PubMed, Scopus, and Web of Science for evidence following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Statement, 2020 (PRISMA). Included were articles in which the researchers evaluated the level of digital health literacy using eHEALS, were peer reviewed, written in English or in which English translation was provided, and were published between 2020-2025. There were 200 articles initially identified in the search, 180 were excluded resulting in a sample of 20 publications. EHEALS scores, with possibilities from 8-40, had a weighted mean of 24.3 (95%CI: 17.1-31.6). The lowest mean score was 12.57; and the highest mean score was 35.1. The highest eHEALS score was from a qualitative interview study. Nine other studies reported overall means ≥ 30. There were three with eHEALS scores below 20. Globally, there is a wide range of reported digital health literacy levels. It is critical that the public gains skill and confidence in digital health literacy for healthcare decisions. The results of this study provide evidence of a large range of digital health literacy.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 11","pages":"e0001075"},"PeriodicalIF":7.7,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12643288/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145598222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-21eCollection Date: 2025-11-01DOI: 10.1371/journal.pdig.0001107
Shahmir H Ali, Hein Thu
{"title":"App fatigue in mHealth: Beyond improving apps, advance equity by meeting people where they are.","authors":"Shahmir H Ali, Hein Thu","doi":"10.1371/journal.pdig.0001107","DOIUrl":"10.1371/journal.pdig.0001107","url":null,"abstract":"","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 11","pages":"e0001107"},"PeriodicalIF":7.7,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12637926/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145575094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20eCollection Date: 2025-11-01DOI: 10.1371/journal.pdig.0001074
Breno Guerra Zancan, José Andery Carneiro, Caio Uehara Martins, Camila Tirapelli, Camila Porto Capel, Eliana Dantas da Costa, Hugo Gaêta-Araujo, José Augusto Baranauskas, Alessandra Alaniz Macedo
In the healthcare domain, images play a pivotal role in clinical diagnoses, treatment planning, surgical procedures, and epidemiological insights. Nevertheless, challenges such as limited experience among healthcare professionals, risk of misdiagnosis and subjective interpretation, and factors like stress and fatigue may jeopardize the precision with which patients are assessed. In this regard, professionals in the field of Dentistry face analogous challenges given that distinguishing anatomical structures in dental imaging requires expert interpretation and precise analysis. Convolutional Neural Networks (CNNs) offer promising opportunities to analyze images during patient care and can enhance diagnostic accuracy and clinical decision-making, benefiting both patients and healthcare providers. Here, we aimed to develop a specialized analyzer for digital dental radiography, that focuses on numbering teeth and detecting tooth cavities. The system is designed to achieve high precision, recall, accuracy, specificity, and F1-score, to ensure that diagnosis is reliable and accurate. In this study, we specifically explore Inception-v3 and InceptionResNet-v2 to discern cavitated teeth and tooth positions in dental panoramic radiographic images (PANs). On the basis of 935 PANs sourced from routine patient care, annotated by dentists at the Faculty of Dentistry of Ribeirão Preto in Brazil, our approach achieved precision of 0.98, recall of 0.98, accuracy of 0.998, specificity of 0.999 and F1-score of 0.98 for tooth numbering. Concerning identification of cavitated teeth, our approach reached precision of 0.96, recall of 0.91, accuracy of 0.94, specificity of 0.96 and F1-score of 0.94. By addressing the critical challenges and reaching high performance, our study serves as a benchmark that relates innovative research and real-world applications, fostering advancements in dental diagnosis. The performance reported herein demonstrates that our initiatives can modulate image analysis tasks and select a more suitable CNN for the job.
在医疗保健领域,图像在临床诊断、治疗计划、外科手术和流行病学见解中发挥着关键作用。然而,诸如医疗保健专业人员经验有限、误诊和主观解释的风险以及压力和疲劳等因素等挑战可能会危及对患者进行评估的准确性。在这方面,牙科领域的专业人员面临着类似的挑战,因为在牙科成像中区分解剖结构需要专家的解释和精确的分析。卷积神经网络(cnn)为在患者护理过程中分析图像提供了有希望的机会,可以提高诊断准确性和临床决策,使患者和医疗保健提供者都受益。在这里,我们的目标是开发一种专门的分析仪,用于数字牙科放射摄影,重点是牙齿编号和检测蛀牙。该系统具有较高的精密度、召回率、准确性、特异性和f1评分,确保诊断的可靠性和准确性。在本研究中,我们专门研究了Inception-v3和inception - resnet -v2在牙科全景放射图像(pan)中识别蛀牙和牙齿位置的方法。基于巴西ribebe o Preto牙科学院牙医注释的935份来自患者常规护理的pan,我们的方法在牙齿编号方面的精度为0.98,召回率为0.98,准确度为0.998,特异性为0.999,f1评分为0.98。对于空化牙的鉴定,我们的方法的精密度为0.96,召回率为0.91,准确度为0.94,特异性为0.96,f1评分为0.94。通过解决关键挑战和达到高性能,我们的研究作为一个基准,将创新研究和现实世界的应用联系起来,促进牙科诊断的进步。本文报告的性能表明,我们的举措可以调制图像分析任务,并选择更适合的CNN。
{"title":"AI-powered precision in dental radiographic analysis using tailored CNNs for tooth numbering and cavity detection.","authors":"Breno Guerra Zancan, José Andery Carneiro, Caio Uehara Martins, Camila Tirapelli, Camila Porto Capel, Eliana Dantas da Costa, Hugo Gaêta-Araujo, José Augusto Baranauskas, Alessandra Alaniz Macedo","doi":"10.1371/journal.pdig.0001074","DOIUrl":"10.1371/journal.pdig.0001074","url":null,"abstract":"<p><p>In the healthcare domain, images play a pivotal role in clinical diagnoses, treatment planning, surgical procedures, and epidemiological insights. Nevertheless, challenges such as limited experience among healthcare professionals, risk of misdiagnosis and subjective interpretation, and factors like stress and fatigue may jeopardize the precision with which patients are assessed. In this regard, professionals in the field of Dentistry face analogous challenges given that distinguishing anatomical structures in dental imaging requires expert interpretation and precise analysis. Convolutional Neural Networks (CNNs) offer promising opportunities to analyze images during patient care and can enhance diagnostic accuracy and clinical decision-making, benefiting both patients and healthcare providers. Here, we aimed to develop a specialized analyzer for digital dental radiography, that focuses on numbering teeth and detecting tooth cavities. The system is designed to achieve high precision, recall, accuracy, specificity, and F1-score, to ensure that diagnosis is reliable and accurate. In this study, we specifically explore Inception-v3 and InceptionResNet-v2 to discern cavitated teeth and tooth positions in dental panoramic radiographic images (PANs). On the basis of 935 PANs sourced from routine patient care, annotated by dentists at the Faculty of Dentistry of Ribeirão Preto in Brazil, our approach achieved precision of 0.98, recall of 0.98, accuracy of 0.998, specificity of 0.999 and F1-score of 0.98 for tooth numbering. Concerning identification of cavitated teeth, our approach reached precision of 0.96, recall of 0.91, accuracy of 0.94, specificity of 0.96 and F1-score of 0.94. By addressing the critical challenges and reaching high performance, our study serves as a benchmark that relates innovative research and real-world applications, fostering advancements in dental diagnosis. The performance reported herein demonstrates that our initiatives can modulate image analysis tasks and select a more suitable CNN for the job.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 11","pages":"e0001074"},"PeriodicalIF":7.7,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12633937/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20eCollection Date: 2025-11-01DOI: 10.1371/journal.pdig.0001093
Chibuike K Uwakwe, Ekanath Srihari Rangan, Satyajit Kumar, Georg Gutjahr, Xuhui Miao, Andrew W Brooks, Peter Maguire, Tejaswini Mishra, Lettie McGuire, Michael P Snyder
Despite the millions of individuals struggling with persistent symptoms, Long COVID has remained difficult to diagnose due to limited objective biomarkers, often leading to underdiagnosis or even misdiagnosis. To bridge this gap, we investigated the potential of utilizing wearable sensor data to aid in the diagnosis of Long COVID. We analyzed longitudinal heart rate (HR) data from 126 individuals with acute SARS-CoV-2 infections to develop machine learning models capable of predicting Long COVID status using derived HR features, symptom features, or a combination of both feature sets. The HR features were derived across six analytical categories, including time-domain, Poincaré nonlinear, raw signal, Kullback-Leibler (KL) divergence, variational mode decomposition (VMD), and the Shannon energy envelope (SEE), enabling the capture of heart rate dynamics over various temporal scales and the quantification of day-to-day shifts in HR distributions. The symptom features used in the final models included chest pain, vomiting, excessive sweating, memory loss, brain fog, heart palpitations, and loss of smell. The combined HR- and symptom-feature model demonstrated robust predictive performance, achieving an area under the Receiver Operating Characteristic curve (ROC-AUC) of 95.1% and an area under the Precision-Recall curve (PR-AUC) of 85.9%. These values represent a significant improvement of approximately 5% in both the ROC-AUC and PR-AUC over the symptoms-only model. At the population level, this improvement in discrimination could lead to clinically meaningful reductions in misclassification and improved patient outcomes, achieved through a non-invasive diagnostic tool. These findings suggest that wearable HR data could be used to derive an objective biomarker for Long COVID, thereby enhancing diagnostic precision.
{"title":"Longitudinal wearable sensor data enhance precision of Long COVID detection.","authors":"Chibuike K Uwakwe, Ekanath Srihari Rangan, Satyajit Kumar, Georg Gutjahr, Xuhui Miao, Andrew W Brooks, Peter Maguire, Tejaswini Mishra, Lettie McGuire, Michael P Snyder","doi":"10.1371/journal.pdig.0001093","DOIUrl":"10.1371/journal.pdig.0001093","url":null,"abstract":"<p><p>Despite the millions of individuals struggling with persistent symptoms, Long COVID has remained difficult to diagnose due to limited objective biomarkers, often leading to underdiagnosis or even misdiagnosis. To bridge this gap, we investigated the potential of utilizing wearable sensor data to aid in the diagnosis of Long COVID. We analyzed longitudinal heart rate (HR) data from 126 individuals with acute SARS-CoV-2 infections to develop machine learning models capable of predicting Long COVID status using derived HR features, symptom features, or a combination of both feature sets. The HR features were derived across six analytical categories, including time-domain, Poincaré nonlinear, raw signal, Kullback-Leibler (KL) divergence, variational mode decomposition (VMD), and the Shannon energy envelope (SEE), enabling the capture of heart rate dynamics over various temporal scales and the quantification of day-to-day shifts in HR distributions. The symptom features used in the final models included chest pain, vomiting, excessive sweating, memory loss, brain fog, heart palpitations, and loss of smell. The combined HR- and symptom-feature model demonstrated robust predictive performance, achieving an area under the Receiver Operating Characteristic curve (ROC-AUC) of 95.1% and an area under the Precision-Recall curve (PR-AUC) of 85.9%. These values represent a significant improvement of approximately 5% in both the ROC-AUC and PR-AUC over the symptoms-only model. At the population level, this improvement in discrimination could lead to clinically meaningful reductions in misclassification and improved patient outcomes, achieved through a non-invasive diagnostic tool. These findings suggest that wearable HR data could be used to derive an objective biomarker for Long COVID, thereby enhancing diagnostic precision.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 11","pages":"e0001093"},"PeriodicalIF":7.7,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12633932/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Induction of labor (IOL) is a common yet complex clinical procedure associated with varying risks, including cesarean section (CS). Accurate prediction models may help support more informed, personalized decision-making. This study aimed to develop and validate an explainable machine learning prediction model for CS following IOL. We used population-based administrative perinatal datasets from two Australian states (New South Wales (NSW) and Queensland) covering all births between 2016 and 2019 for model development. Temporal validation was conducted using 2020 births from NSW, and geographical validation using 2016-2018 births from Victoria. We included women with singleton, cephalic, term, live births who attempted IOL and had no prior CS. Seven models (logistic regression, random forest, gradient boosting, LightGBM, XGBoost, CatBoost, and AdaBoost) were developed with hyperparameter tuning and feature selection. Performance was assessed using the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve, calibration plot (overall and across sociodemographic subgroups), decision curve analysis, Brier Score, and model parsimony. SHAP (SHapley Additive exPlanations) values were used to explain predictor contributions. A total of 180,700 women were included in model development (mean age 31 ± 5 years; CS = 20.8%). The optimal model, developed using XGBoost with ten predictors, achieved AUROCs of 0.76 (95% CI: 0.75-0.77) and 0.75 (95% CI: 0.74-0.76) in temporal (n = 14,527; CS = 22.5%) and geographical (n = 14,755; CS = 19.0%) validations, respectively. The most influential predictors were nulliparity, pre-pregnancy body mass index, and maternal age, while diabetes and hypertension (pre-existing or pregnancy-related) contributed least. Women with higher predicted CS probabilities had increased inpatient costs and maternal morbidity, regardless of actual mode of birth. The final model is accessible via an interactive web application (https://csai-8ccf2690242c.herokuapp.com/). This model demonstrates strong predictive performance using routinely collected maternal factors. Further co-design and implementation research is needed before potential clinical adoption.
{"title":"Explainable machine learning model for predicting cesarean section following induction of labor: Development and external validation using real-world data.","authors":"Yanan Hu, Xin Zhang, Valerie Slavin, Joanne Enticott, Emily Callander","doi":"10.1371/journal.pdig.0001061","DOIUrl":"10.1371/journal.pdig.0001061","url":null,"abstract":"<p><p>Induction of labor (IOL) is a common yet complex clinical procedure associated with varying risks, including cesarean section (CS). Accurate prediction models may help support more informed, personalized decision-making. This study aimed to develop and validate an explainable machine learning prediction model for CS following IOL. We used population-based administrative perinatal datasets from two Australian states (New South Wales (NSW) and Queensland) covering all births between 2016 and 2019 for model development. Temporal validation was conducted using 2020 births from NSW, and geographical validation using 2016-2018 births from Victoria. We included women with singleton, cephalic, term, live births who attempted IOL and had no prior CS. Seven models (logistic regression, random forest, gradient boosting, LightGBM, XGBoost, CatBoost, and AdaBoost) were developed with hyperparameter tuning and feature selection. Performance was assessed using the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve, calibration plot (overall and across sociodemographic subgroups), decision curve analysis, Brier Score, and model parsimony. SHAP (SHapley Additive exPlanations) values were used to explain predictor contributions. A total of 180,700 women were included in model development (mean age 31 ± 5 years; CS = 20.8%). The optimal model, developed using XGBoost with ten predictors, achieved AUROCs of 0.76 (95% CI: 0.75-0.77) and 0.75 (95% CI: 0.74-0.76) in temporal (n = 14,527; CS = 22.5%) and geographical (n = 14,755; CS = 19.0%) validations, respectively. The most influential predictors were nulliparity, pre-pregnancy body mass index, and maternal age, while diabetes and hypertension (pre-existing or pregnancy-related) contributed least. Women with higher predicted CS probabilities had increased inpatient costs and maternal morbidity, regardless of actual mode of birth. The final model is accessible via an interactive web application (https://csai-8ccf2690242c.herokuapp.com/). This model demonstrates strong predictive performance using routinely collected maternal factors. Further co-design and implementation research is needed before potential clinical adoption.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 11","pages":"e0001061"},"PeriodicalIF":7.7,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12633899/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20eCollection Date: 2025-11-01DOI: 10.1371/journal.pdig.0001085
Derya Demirci, Muhammad H Minhas, Cynthia Lokker, Catherine Demers
Chronic disease management is a burden for many patients. Digital health tools (DHTs) can leverage technology to rapidly develop and disseminate interventions to alleviate obstacles faced and promote self-care. Primary care physicians (PCPs) are most directly involved in the care of chronic disease patients; however, their perspective is often overlooked. To develop an effective DHT for chronic disease management, PCP attitudes are critical to ensure improved patient integration, adoption and care outcomes. The purpose of this rapid review is to explore and identify PCPs' perspectives and attitudes regarding DHTs for chronic disease management and generate major themes from our findings using key literature. The themes will be used to guide DHT creators, clinicians and policy makers on adoption and implementation considerations. We conducted a rapid review of primary qualitative research between 2000 and 2022. Two reviewers, independently, conducted study screening, selection, and data abstraction. The themes identified in the articles were extracted and presented narratively. The data was analyzed using NVIVO12 software. Braun and Clarke's deductive thematic analysis was used, and the themes identified were extracted and presented narratively. Nine qualitative research studies met the inclusion criteria. Themes were classified into two major categories: physician-patient relationship and physician-technology relationship. Within these, seven subcategories were identified: (1) Increased Physician Workload, (2) Data Capture & Data Quality, (3) Evidence-Based Care, (4) Education and Training, (5) Liability, (6) Patient Interactions, and (7) Patient Empowerment and Suitability. DHT creators/endorsers need to consider how DHTs affect the patient-physician relationship and the physician-technology relationship as this affects how PCPs perceive DHTs. PCPs' perspectives must be taken into consideration to promote self-care for patients living with chronic diseases.
{"title":"Primary care physicians' perspectives on digital health tools for chronic disease management: A rapid review.","authors":"Derya Demirci, Muhammad H Minhas, Cynthia Lokker, Catherine Demers","doi":"10.1371/journal.pdig.0001085","DOIUrl":"10.1371/journal.pdig.0001085","url":null,"abstract":"<p><p>Chronic disease management is a burden for many patients. Digital health tools (DHTs) can leverage technology to rapidly develop and disseminate interventions to alleviate obstacles faced and promote self-care. Primary care physicians (PCPs) are most directly involved in the care of chronic disease patients; however, their perspective is often overlooked. To develop an effective DHT for chronic disease management, PCP attitudes are critical to ensure improved patient integration, adoption and care outcomes. The purpose of this rapid review is to explore and identify PCPs' perspectives and attitudes regarding DHTs for chronic disease management and generate major themes from our findings using key literature. The themes will be used to guide DHT creators, clinicians and policy makers on adoption and implementation considerations. We conducted a rapid review of primary qualitative research between 2000 and 2022. Two reviewers, independently, conducted study screening, selection, and data abstraction. The themes identified in the articles were extracted and presented narratively. The data was analyzed using NVIVO12 software. Braun and Clarke's deductive thematic analysis was used, and the themes identified were extracted and presented narratively. Nine qualitative research studies met the inclusion criteria. Themes were classified into two major categories: physician-patient relationship and physician-technology relationship. Within these, seven subcategories were identified: (1) Increased Physician Workload, (2) Data Capture & Data Quality, (3) Evidence-Based Care, (4) Education and Training, (5) Liability, (6) Patient Interactions, and (7) Patient Empowerment and Suitability. DHT creators/endorsers need to consider how DHTs affect the patient-physician relationship and the physician-technology relationship as this affects how PCPs perceive DHTs. PCPs' perspectives must be taken into consideration to promote self-care for patients living with chronic diseases.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 11","pages":"e0001085"},"PeriodicalIF":7.7,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12633909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}