Pub Date : 2026-01-26DOI: 10.1016/j.cmpbup.2026.100232
Hyun Kim , Faeyza Rishad Ardi , Kévin Spinicci , Jae Kyoung Kim
Single-cell RNA sequencing (scRNA-seq) provides deep insights into cellular heterogeneity but demands robust dimensionality reduction (DR) and clustering to handle high-dimensional, noisy data. Many DR and clustering approaches rely on user-defined parameters, undermining reliability. Even automated clustering methods like ChooseR and MultiK still employ fixed principal component defaults, limiting their full automation. To overcome this limitation, we propose a fully automated clustering approach by integrating scLENS—a method for optimal PC selection—with these tools. Our fully automated approach improves clustering performance by ∼14 % for ChooseR and ∼10 % for MultiK and identifies additional cell subtypes, highlighting the advantages of adaptive, data-driven DR.
{"title":"A fully automated, data-driven approach for dimensionality reduction and clustering in single-cell RNA-seq analysis","authors":"Hyun Kim , Faeyza Rishad Ardi , Kévin Spinicci , Jae Kyoung Kim","doi":"10.1016/j.cmpbup.2026.100232","DOIUrl":"10.1016/j.cmpbup.2026.100232","url":null,"abstract":"<div><div>Single-cell RNA sequencing (scRNA-seq) provides deep insights into cellular heterogeneity but demands robust dimensionality reduction (DR) and clustering to handle high-dimensional, noisy data. Many DR and clustering approaches rely on user-defined parameters, undermining reliability. Even automated clustering methods like ChooseR and MultiK still employ fixed principal component defaults, limiting their full automation. To overcome this limitation, we propose a fully automated clustering approach by integrating scLENS—a method for optimal PC selection—with these tools. Our fully automated approach improves clustering performance by ∼14 % for ChooseR and ∼10 % for MultiK and identifies additional cell subtypes, highlighting the advantages of adaptive, data-driven DR.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100232"},"PeriodicalIF":0.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1016/j.cmpbup.2025.100229
Sarah L. Alzamili , Salwa Shakir Baawi , Mustafa Noaman Kadhim , Dhiah Al-Shammary , Ayman Ibaida , Khandakar Ahmed
This paper aims to introduce a novel clustering method for electroencephalogram (EEG) based on Ruzicka mathematical similarity and incorporates Particle Swarm Optimization (PSO) to enhance feature selection. Medical datasets often contain both convergent and divergent features, making feature selection a crucial step for accurate disease diagnosis and public health applications. The proposed Ruzicka-based clustering method groups EEG records into non-overlapping subgroups according to a defined similarity metric. Cluster centers are determined using a polynomial-based calculation, after which EEG records are assigned to clusters based on the Ruzicka similarity measure. After clustering the EEG records into highly coherent groups, PSO algorithm is employed to identify the most effective subset of features. This process enhances classification accuracy and contributes to more reliable diagnostic outcomes by combining clustering with feature selection. The selected features are then evaluated using multiple classifiers, including Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), K-Nearest Neighbors (KNN), and Naive Bayes (NB). Accuracy, recall, f1-score and precision measures are conducted to evaluate the model’s performance. Experimental validation is carried out on the Bonn University EEG dataset. With both RF and NB classifiers, the proposed model has achieved up to 100% accuracy compared to other models. The proposed method can be implemented in medical organizations as a decision-support system to assist healthcare professionals in analyzing EEG patterns. Its integration can enhance the accuracy and efficiency of disease diagnosis, leading to improved patient care.
{"title":"Ruzicka similarity-based brain EEG clustering for improved intelligent epilepsy diagnosis","authors":"Sarah L. Alzamili , Salwa Shakir Baawi , Mustafa Noaman Kadhim , Dhiah Al-Shammary , Ayman Ibaida , Khandakar Ahmed","doi":"10.1016/j.cmpbup.2025.100229","DOIUrl":"10.1016/j.cmpbup.2025.100229","url":null,"abstract":"<div><div>This paper aims to introduce a novel clustering method for electroencephalogram (EEG) based on Ruzicka mathematical similarity and incorporates Particle Swarm Optimization (PSO) to enhance feature selection. Medical datasets often contain both convergent and divergent features, making feature selection a crucial step for accurate disease diagnosis and public health applications. The proposed Ruzicka-based clustering method groups EEG records into non-overlapping subgroups according to a defined similarity metric. Cluster centers are determined using a polynomial-based calculation, after which EEG records are assigned to clusters based on the Ruzicka similarity measure. After clustering the EEG records into highly coherent groups, PSO algorithm is employed to identify the most effective subset of features. This process enhances classification accuracy and contributes to more reliable diagnostic outcomes by combining clustering with feature selection. The selected features are then evaluated using multiple classifiers, including Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), K-Nearest Neighbors (KNN), and Naive Bayes (NB). Accuracy, recall, f1-score and precision measures are conducted to evaluate the model’s performance. Experimental validation is carried out on the Bonn University EEG dataset. With both RF and NB classifiers, the proposed model has achieved up to 100% accuracy compared to other models. The proposed method can be implemented in medical organizations as a decision-support system to assist healthcare professionals in analyzing EEG patterns. Its integration can enhance the accuracy and efficiency of disease diagnosis, leading to improved patient care.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100229"},"PeriodicalIF":0.0,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145977757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pelvimetry has historically shown limitations in diagnosing cephalopelvic disproportion, yet recent evidence suggests potential predictive value. This study uses artificial intelligence to reassess pelvimetry's utility in predicting cesarean section.
Methods
This single-center, retrospective case-control study included pregnant women at 37 weeks 0 days and 41 weeks 6 days of gestation, who underwent pelvic radiography for suspected cephalopelvic disproportion from January 2015 to August 2023. Pelvic radiographic measurements were obtained using the Guthmann-Sussmann method. Maternal characteristics, ultrasound examination data, and pelvimetric measurements were extracted from electronic medical records as potential predictors of delivery outcomes. In this study, the input data were analyzed using four machine learning models: Light Gradient Boosting Machine, Random Forest, Extreme Gradient Boosting, and Category Boosting. The primary outcome was the hierarchical importance of pelvic measurements in the predictive models.
Results
Analysis included 355 participants. The strongest predictors were the differences between (1) the obstetric conjugate and biparietal diameter and (2) the interspinous diameter and biparietal diameter. The receiver operating characteristic curve for each model was Light Gradient Boosting Machine 0.74, Random Forest 0.85, Extreme Gradient Boosting 0.83, and Category Boosting 0.82.
Conclusions
We developed high-performance machine learning models demonstrating that pelvimetric measurements— particularly, the differences between the obstetric conjugate and biparietal diameter, and between the interspinous diameter and biparietal diameter —combined with maternal and ultrasound factors, are strong predictors of cesarean section. The model’s ability to capture nonlinear associations may enhance predictive accuracy, and reassessing pelvimetric values could support delivery planning in clinical settings.
{"title":"Reassessment of pelvic radiographic measurements for delivery prediction using machine learning","authors":"Ayano Suemori , Jota Maki , Hikaru Ooba , Hikari Nakato , Keiichi Oishi , Tomohiro Mitoma , Sakurako Mishima , Akiko Ohira , Satoe Kirino , Eriko Eto , Hisashi Masuyama","doi":"10.1016/j.cmpbup.2026.100231","DOIUrl":"10.1016/j.cmpbup.2026.100231","url":null,"abstract":"<div><h3>Background and Objective</h3><div>Pelvimetry has historically shown limitations in diagnosing cephalopelvic disproportion, yet recent evidence suggests potential predictive value. This study uses artificial intelligence to reassess pelvimetry's utility in predicting cesarean section.</div></div><div><h3>Methods</h3><div>This single-center, retrospective case-control study included pregnant women at 37 weeks 0 days and 41 weeks 6 days of gestation, who underwent pelvic radiography for suspected cephalopelvic disproportion from January 2015 to August 2023. Pelvic radiographic measurements were obtained using the Guthmann-Sussmann method. Maternal characteristics, ultrasound examination data, and pelvimetric measurements were extracted from electronic medical records as potential predictors of delivery outcomes. In this study, the input data were analyzed using four machine learning models: Light Gradient Boosting Machine, Random Forest, Extreme Gradient Boosting, and Category Boosting. The primary outcome was the hierarchical importance of pelvic measurements in the predictive models.</div></div><div><h3>Results</h3><div>Analysis included 355 participants. The strongest predictors were the differences between (1) the obstetric conjugate and biparietal diameter and (2) the interspinous diameter and biparietal diameter. The receiver operating characteristic curve for each model was Light Gradient Boosting Machine 0.74, Random Forest 0.85, Extreme Gradient Boosting 0.83, and Category Boosting 0.82.</div></div><div><h3>Conclusions</h3><div>We developed high-performance machine learning models demonstrating that pelvimetric measurements— particularly, the differences between the obstetric conjugate and biparietal diameter, and between the interspinous diameter and biparietal diameter —combined with maternal and ultrasound factors, are strong predictors of cesarean section. The model’s ability to capture nonlinear associations may enhance predictive accuracy, and reassessing pelvimetric values could support delivery planning in clinical settings.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100231"},"PeriodicalIF":0.0,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-02DOI: 10.1016/j.cmpbup.2025.100226
Nadia Brancati, Maria Frucci
Early detection of breast cancer disease is crucial to enhancing patient outcomes through effective treatment. Ultrasound imaging, a simple, low-cost, and non-invasive technique, can help differentiate cystic from solid masses, mainly on the basis of the analysis of the detected anomalies’ boundaries. Automatic detection methods of mass boundaries in ultrasound images can reduce the dependence on the radiologist’s experience for this analysis. We propose USE-MiT, a segmentation method for breast ultrasound images, based on a UNet architecture in which the encoder and decoder modules are interfaced through a configuration based on Squeeze and Excitation Attention modules, and the encoder structure is represented by a Mix Transformer. The model was trained and validated, with a 4-fold cross-validation, on the Breast Ultrasound Image Dataset, and was tested on the independent dataset, namely Breast-Lesions-USG. The experiments have demonstrated the efficiency of the model, achieving an overall Dice of 0.88 and an IoU of 0.64, outperforming the state-of-the-art. The source code is available at https://github.com/nbrancati/USE-MiT.
{"title":"USE-MiT: Attention-based model for breast ultrasound images segmentation","authors":"Nadia Brancati, Maria Frucci","doi":"10.1016/j.cmpbup.2025.100226","DOIUrl":"10.1016/j.cmpbup.2025.100226","url":null,"abstract":"<div><div>Early detection of breast cancer disease is crucial to enhancing patient outcomes through effective treatment. Ultrasound imaging, a simple, low-cost, and non-invasive technique, can help differentiate cystic from solid masses, mainly on the basis of the analysis of the detected anomalies’ boundaries. Automatic detection methods of mass boundaries in ultrasound images can reduce the dependence on the radiologist’s experience for this analysis. We propose USE-MiT, a segmentation method for breast ultrasound images, based on a UNet architecture in which the encoder and decoder modules are interfaced through a configuration based on Squeeze and Excitation Attention modules, and the encoder structure is represented by a Mix Transformer. The model was trained and validated, with a 4-fold cross-validation, on the Breast Ultrasound Image Dataset, and was tested on the independent dataset, namely Breast-Lesions-USG. The experiments have demonstrated the efficiency of the model, achieving an overall Dice of 0<em>.</em>88 and an IoU of 0<em>.</em>64, outperforming the state-of-the-art. The source code is available at <span><span>https://github.com/nbrancati/USE-MiT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100226"},"PeriodicalIF":0.0,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1016/j.cmpbup.2025.100230
Marie-Christine Pali , Christina Schwaiger , Malik Galijasevic , Valentin K. Ladenhauf , Stephanie Mangesius , Elke R. Gizewski
The analysis of carotid arteries, particularly plaques, in multi-sequence Magnetic Resonance Imaging (MRI) data is crucial for assessing the risk of atherosclerosis and ischemic stroke. In order to evaluate metrics and radiomic features, quantifying the state of atherosclerosis, accurate segmentation is important. However, the complex morphology of plaques and the scarcity of labeled data poses significant challenges. In this work, we address these problems and propose a semi-supervised deep learning-based approach designed to effectively integrate multi-sequence MRI data for the segmentation of carotid artery vessel wall and plaque. The proposed algorithm consists of two networks: a coarse localization model identifies the region of interest guided by some prior knowledge on the position and number of carotid arteries, followed by a fine segmentation model for precise delineation of vessel walls and plaques. To effectively integrate complementary information across different MRI sequences, we investigate different fusion strategies and introduce a multi-level multi-sequence version of U-Net architecture. To address the challenges of limited labeled data and the complexity of carotid artery MRI, we propose a semi-supervised approach that enforces consistency under various input transformations. Our approach is evaluated on 52 patients with arteriosclerosis, each with five MRI sequences. Comprehensive experiments demonstrate the effectiveness of our approach and emphasize the role of fusion point selection in U-Net-based architectures. To validate the accuracy of our results, we also include an expert-based assessment of model performance. Our findings highlight the potential of fusion strategies and semi-supervised learning for improving carotid artery segmentation in data-limited MRI applications.
{"title":"Semi-supervised learning and integration of multi-sequence MR-images for carotid vessel wall and plaque segmentation","authors":"Marie-Christine Pali , Christina Schwaiger , Malik Galijasevic , Valentin K. Ladenhauf , Stephanie Mangesius , Elke R. Gizewski","doi":"10.1016/j.cmpbup.2025.100230","DOIUrl":"10.1016/j.cmpbup.2025.100230","url":null,"abstract":"<div><div>The analysis of carotid arteries, particularly plaques, in multi-sequence Magnetic Resonance Imaging (MRI) data is crucial for assessing the risk of atherosclerosis and ischemic stroke. In order to evaluate metrics and radiomic features, quantifying the state of atherosclerosis, accurate segmentation is important. However, the complex morphology of plaques and the scarcity of labeled data poses significant challenges. In this work, we address these problems and propose a semi-supervised deep learning-based approach designed to effectively integrate multi-sequence MRI data for the segmentation of carotid artery vessel wall and plaque. The proposed algorithm consists of two networks: a coarse localization model identifies the region of interest guided by some prior knowledge on the position and number of carotid arteries, followed by a fine segmentation model for precise delineation of vessel walls and plaques. To effectively integrate complementary information across different MRI sequences, we investigate different fusion strategies and introduce a multi-level multi-sequence version of U-Net architecture. To address the challenges of limited labeled data and the complexity of carotid artery MRI, we propose a semi-supervised approach that enforces consistency under various input transformations. Our approach is evaluated on 52 patients with arteriosclerosis, each with five MRI sequences. Comprehensive experiments demonstrate the effectiveness of our approach and emphasize the role of fusion point selection in U-Net-based architectures. To validate the accuracy of our results, we also include an expert-based assessment of model performance. Our findings highlight the potential of fusion strategies and semi-supervised learning for improving carotid artery segmentation in data-limited MRI applications.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100230"},"PeriodicalIF":0.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-26DOI: 10.1016/j.cmpbup.2025.100227
Nithinkumar K.V., Anand R.
Respiratory sounds captured via auscultation contain critical clues for diagnosing pulmonary conditions. Automated classification of these sounds faces the dual challenge of distinguishing subtle acoustic patterns and addressing the severe class imbalance inherent in clinical datasets. This study investigates methods for classifying respiratory sounds into multiple disease categories, with a specific focus on mitigating pronounced class imbalances. In this study, we developed and evaluated a hybrid deep learning model incorporating a Long Short-Term Memory (LSTM) network as a feature sequence encoder, followed by a Kolmogorov–Arnold Network (KAN) for classification. This architecture was combined with a comprehensive feature extraction pipeline and targeted imbalance mitigation techniques. The model was evaluated using a public respiratory sound database comprising six classes with a highly skewed distribution. Strategies such as focal loss, class-specific data augmentation, and Synthetic Minority Over-sampling Technique (SMOTE) are employed to improve minority class recognition. Our results demonstrate that the proposed Hybrid LSTM-KAN model achieves a high overall accuracy of 94.6% and a macro-averaged -score of 0.703. This performance is notable, given that the dominant class (COPD) constitutes over 86% of the data. While challenges persist for the rarest classes (Bronchiolitis and URTI, with -scores of approximately 0.45 and 0.44, respectively), the approach shows significant improvement in their detection compared to naive baselines and performs strongly on other minority classes, such as bronchiectasis (-score 0.84). This study contributes to the development of intelligent auscultation tools for the early detection of respiratory diseases, highlighting the potential of combining recurrent neural networks with advanced KAN architectures and focused imbalance handling.
{"title":"Investigation into respiratory sound classification for an imbalanced data set using hybrid LSTM-KAN architectures","authors":"Nithinkumar K.V., Anand R.","doi":"10.1016/j.cmpbup.2025.100227","DOIUrl":"10.1016/j.cmpbup.2025.100227","url":null,"abstract":"<div><div>Respiratory sounds captured via auscultation contain critical clues for diagnosing pulmonary conditions. Automated classification of these sounds faces the dual challenge of distinguishing subtle acoustic patterns and addressing the severe class imbalance inherent in clinical datasets. This study investigates methods for classifying respiratory sounds into multiple disease categories, with a specific focus on mitigating pronounced class imbalances. In this study, we developed and evaluated a hybrid deep learning model incorporating a Long Short-Term Memory (LSTM) network as a feature sequence encoder, followed by a Kolmogorov–Arnold Network (KAN) for classification. This architecture was combined with a comprehensive feature extraction pipeline and targeted imbalance mitigation techniques. The model was evaluated using a public respiratory sound database comprising six classes with a highly skewed distribution. Strategies such as focal loss, class-specific data augmentation, and Synthetic Minority Over-sampling Technique (SMOTE) are employed to improve minority class recognition. Our results demonstrate that the proposed Hybrid LSTM-KAN model achieves a high overall accuracy of 94.6% and a macro-averaged <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-score of 0.703. This performance is notable, given that the dominant class (COPD) constitutes over 86% of the data. While challenges persist for the rarest classes (Bronchiolitis and URTI, with <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-scores of approximately 0.45 and 0.44, respectively), the approach shows significant improvement in their detection compared to naive baselines and performs strongly on other minority classes, such as bronchiectasis (<span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-score <span><math><mo>≈</mo></math></span> 0.84). This study contributes to the development of intelligent auscultation tools for the early detection of respiratory diseases, highlighting the potential of combining recurrent neural networks with advanced KAN architectures and focused imbalance handling.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100227"},"PeriodicalIF":0.0,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1016/j.cmpbup.2025.100228
Damilare Emmanuel Olatunji , Julius Dona Zannu, Carine Pierrette Mukamakuza, Godbright Nixon Uiso, Chol Buol, John Bosco Thuo, Nchofon Tagha Ghogomu, Mona Mamoun Mubarak Aman, Evelyne Umubyeyi
AI-powered stethoscopes offer a promising alternative for screening rheumatic heart disease (RHD), particularly in regions with limited diagnostic infrastructure. Early detection is vital, yet echocardiography, the gold standard tool, remains largely inaccessible in low-resource settings due to cost and workforce constraints. This review systematically examines machine learning (ML) applications from 2015 to 2025 that analyze electrocardiogram (ECG) and phonocardiogram (PCG) data to support accessible, scalable screening of all RHD variants in relation to the World Heart Federation's "25 by 25" goal to reduce RHD mortality. Using PRISMA-ScR guidelines, 37 peer-reviewed studies were selected from PubMed, IEEE Xplore, Scopus, and Embase. Convolutional neural networks (CNNs) dominate recent efforts, achieving a median accuracy of 97.75 %, F1-score of 0.95, and AUROC of 0.89. However, challenges remain: 73 % of studies used single-center datasets, 81.1 % relied on private data, only 10.8 % were externally validated, and none assessed cost-effectiveness. Although 45.9 % originated from endemic regions, few addressed demographic diversity or implementation feasibility. These gaps underscore the disconnect between model performance and clinical readiness. Bridging this divide requires standardized benchmark datasets, prospective trials in endemic areas, and broader validation. If these issues are addressed, AI-augmented auscultation could transform cardiovascular diagnostics in underserved populations, thereby aiding early detection. This review also offers practical recommendations for building accessible ML-based RHD screening tools, aiming to close the diagnostic gap in low-resource settings where conventional auscultation may miss up to 90 % of cases and echocardiography remains out of reach.
人工智能听诊器为筛查风湿性心脏病(RHD)提供了一种有希望的替代方法,特别是在诊断基础设施有限的地区。早期检测至关重要,但由于成本和劳动力限制,超声心动图作为一种金标准工具,在资源匮乏的环境中仍然难以获得。本综述系统地研究了2015年至2025年机器学习(ML)的应用,这些应用分析了心电图(ECG)和心音图(PCG)数据,以支持与世界心脏联合会(World Heart Federation)降低RHD死亡率的“25 by 25”目标相关的所有RHD变异的可访问、可扩展的筛查。使用PRISMA-ScR指南,从PubMed、IEEE explore、Scopus和Embase中选择了37项同行评议的研究。卷积神经网络(cnn)在最近的研究中占主导地位,实现了97.75%的中位数准确率,f1得分为0.95,AUROC为0.89。然而,挑战仍然存在:73%的研究使用单中心数据集,81.1%依赖于私人数据,只有10.8%的研究经过外部验证,没有评估成本效益。虽然45.9%来自流行地区,但很少涉及人口多样性或实施可行性。这些差距强调了模型性能和临床准备之间的脱节。弥合这一鸿沟需要标准化的基准数据集、流行地区的前瞻性试验和更广泛的验证。如果这些问题得到解决,人工智能增强听诊可以改变服务不足人群的心血管诊断,从而有助于早期发现。本综述还为建立可访问的基于ml的RHD筛查工具提供了实用建议,旨在缩小资源匮乏地区的诊断差距,在这些地区,传统听诊可能错过高达90%的病例,超声心动图仍然遥不可及。
{"title":"Machine learning-based analysis of ECG and PCG signals for rheumatic heart disease detection: A scoping review (2015–2025)","authors":"Damilare Emmanuel Olatunji , Julius Dona Zannu, Carine Pierrette Mukamakuza, Godbright Nixon Uiso, Chol Buol, John Bosco Thuo, Nchofon Tagha Ghogomu, Mona Mamoun Mubarak Aman, Evelyne Umubyeyi","doi":"10.1016/j.cmpbup.2025.100228","DOIUrl":"10.1016/j.cmpbup.2025.100228","url":null,"abstract":"<div><div>AI-powered stethoscopes offer a promising alternative for screening rheumatic heart disease (RHD), particularly in regions with limited diagnostic infrastructure. Early detection is vital, yet echocardiography, the gold standard tool, remains largely inaccessible in low-resource settings due to cost and workforce constraints. This review systematically examines machine learning (ML) applications from 2015 to 2025 that analyze electrocardiogram (ECG) and phonocardiogram (PCG) data to support accessible, scalable screening of all RHD variants in relation to the World Heart Federation's \"25 by 25\" goal to reduce RHD mortality. Using PRISMA-ScR guidelines, 37 peer-reviewed studies were selected from PubMed, IEEE Xplore, Scopus, and Embase. Convolutional neural networks (CNNs) dominate recent efforts, achieving a median accuracy of 97.75 %, F1-score of 0.95, and AUROC of 0.89. However, challenges remain: 73 % of studies used single-center datasets, 81.1 % relied on private data, only 10.8 % were externally validated, and none assessed cost-effectiveness. Although 45.9 % originated from endemic regions, few addressed demographic diversity or implementation feasibility. These gaps underscore the disconnect between model performance and clinical readiness. Bridging this divide requires standardized benchmark datasets, prospective trials in endemic areas, and broader validation. If these issues are addressed, AI-augmented auscultation could transform cardiovascular diagnostics in underserved populations, thereby aiding early detection. This review also offers practical recommendations for building accessible ML-based RHD screening tools, aiming to close the diagnostic gap in low-resource settings where conventional auscultation may miss up to 90 % of cases and echocardiography remains out of reach.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100228"},"PeriodicalIF":0.0,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1016/j.cmpbup.2025.100225
Khaled Al-Thelaya , Nauman Ullah Gilal , Fahad Majeed , Mahmood Alzubaidi , Sabri Boughorbel , William Mifsud , Marco Agus , Jens Schneider
Whole Slide Imaging (WSI) generates vast data sets in histopathology. Manual annotation is impractical and time consuming. There is, thus, a dire need for effective analysis tools. However, a lack of annotated data hampers supervised learning of models that generalize well across domains. Point annotations have emerged as a practical remedy. Motivated by the fact that the randomness of the tissue slice angle and depth renders size measurements of nuclei — such as it would be provided by segmentation — meaningless (unlike in other medical tasks), point annotations are efficient and useful due to their sparseness. In this paper, we formulate the task of nuclei detection as a density estimation problem. We use a U-Net architecture with PoolFormer encoders as the basis to compute point-annotations for nuclei detection. Specifically, we use Gaussian kernels to generate target density masks from a segmented data set and use isocontouring to separate overlapping nuclei. We show that conformal prediction can compute a near-optimal threshold for contouring. This significantly enhances our detection rate. To address cross-domain generalization issues, our framework uses color normalization. As a result, our framework sets a new state-of-the-art in nucleus localization on both the PanNuke and MoNuSeg data sets, and we demonstrate our cross-domain generalization capabilities using samples of the TCGA data set.
{"title":"NuDetect: A point annotation-based framework for nuclei detection using density estimation and conformal thresholding","authors":"Khaled Al-Thelaya , Nauman Ullah Gilal , Fahad Majeed , Mahmood Alzubaidi , Sabri Boughorbel , William Mifsud , Marco Agus , Jens Schneider","doi":"10.1016/j.cmpbup.2025.100225","DOIUrl":"10.1016/j.cmpbup.2025.100225","url":null,"abstract":"<div><div>Whole Slide Imaging (WSI) generates vast data sets in histopathology. Manual annotation is impractical and time consuming. There is, thus, a dire need for effective analysis tools. However, a lack of annotated data hampers supervised learning of models that generalize well across domains. Point annotations have emerged as a practical remedy. Motivated by the fact that the randomness of the tissue slice angle and depth renders size measurements of nuclei — such as it would be provided by segmentation — meaningless (unlike in other medical tasks), point annotations are efficient and useful due to their sparseness. In this paper, we formulate the task of nuclei detection as a density estimation problem. We use a U-Net architecture with PoolFormer encoders as the basis to compute point-annotations for nuclei detection. Specifically, we use Gaussian kernels to generate target density masks from a segmented data set and use isocontouring to separate overlapping nuclei. We show that conformal prediction can compute a near-optimal threshold for contouring. This significantly enhances our detection rate. To address cross-domain generalization issues, our framework uses color normalization. As a result, our framework sets a new state-of-the-art in nucleus localization on both the PanNuke and MoNuSeg data sets, and we demonstrate our cross-domain generalization capabilities using samples of the TCGA data set.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100225"},"PeriodicalIF":0.0,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.cmpbup.2025.100224
Katja Löwenstein , Johanna Rehrl , Anja Schuster , Michael Gadermayr
The in vitro scratch assay is a widely used assay in cell biology to assess the rate of wound closure related to a variety of therapeutic interventions. While manual measurement is subjective and vulnerable to intra- and interobserver variability, computer-based tools are theoretically objective, but in practice often contain parameters which are manually adjusted (individually per image or data set) and thereby provide a source for subjectivity. Modern deep learning approaches typically require large annotated training data which complicates instant applicability. In this paper, we deeply investigate the Segment Anything Model (SAM), a deep foundation model based on interactive point-prompts, which enables class-agnostic segmentation without tuning the network’s parameters based on any domain specific training data. With respect to segmentation accuracy, the interactive method significantly outperformed a semi-objective baseline that required manual inspection and, when necessary, parameter adjustments for each image. Experiments were conducted to evaluate the impact of variability due to interactive prompting. The results exhibited remarkably low intra- and interobserver variability, clearly surpassing the consistency of manual segmentation by domain experts. In addition, a fully automated zero-shot approach was explored, incorporating the self-supervised learning model DINOv2 as a preprocessing step before sampling input points for SAM, with various sampling methods systematically investigated.
{"title":"Towards objective In-Vitro wound healing assessment with segment anything: A large evaluation of interactive and automated pipelines","authors":"Katja Löwenstein , Johanna Rehrl , Anja Schuster , Michael Gadermayr","doi":"10.1016/j.cmpbup.2025.100224","DOIUrl":"10.1016/j.cmpbup.2025.100224","url":null,"abstract":"<div><div>The <em>in vitro</em> scratch assay is a widely used assay in cell biology to assess the rate of wound closure related to a variety of therapeutic interventions. While manual measurement is subjective and vulnerable to intra- and interobserver variability, computer-based tools are theoretically objective, but in practice often contain parameters which are manually adjusted (individually per image or data set) and thereby provide a source for subjectivity. Modern deep learning approaches typically require large annotated training data which complicates instant applicability. In this paper, we deeply investigate the Segment Anything Model (SAM), a deep foundation model based on interactive point-prompts, which enables class-agnostic segmentation without tuning the network’s parameters based on any domain specific training data. With respect to segmentation accuracy, the interactive method significantly outperformed a semi-objective baseline that required manual inspection and, when necessary, parameter adjustments for each image. Experiments were conducted to evaluate the impact of variability due to interactive prompting. The results exhibited remarkably low intra- and interobserver variability, clearly surpassing the consistency of manual segmentation by domain experts. In addition, a fully automated zero-shot approach was explored, incorporating the self-supervised learning model DINOv2 as a preprocessing step before sampling input points for SAM, with various sampling methods systematically investigated.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100224"},"PeriodicalIF":0.0,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145766132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oropharynx Squamous Cell Carcinoma (OPSCC) linked to Human Papillomavirus (HPV) exhibits a more favorable prognosis than other squamous cell carcinomas of the upper aerodigestive tract. Finding reliable non-invasive detection methods of this prognostic entity is key to propose appropriate therapeutic decisions. This study aims to provide a comprehensive method based on pre-treatment clinical data for predicting the patient’s HPV status over a large OPSCC patient cohort and employing explainability techniques to interpret the significance and effects of the features.
Materials and Methods:
We employed the RADCURE dataset clinical information to train six Machine Learning algorithms, evaluating them via cross-validation for grid search hyper-parameter tuning and feature selection as well as a final performance measurement on a 20% sample test set. For explainability, SHAP and LIME were used to identify the most relevant relationships and their effect on the predictive model. Furthermore, additional publicly available datasets were scrutinized to compare outcomes and assess the method’s generalization across diverse feature sets and populations.
Results:
The best model yielded an AUC of 0.85, a sensitivity of 0.83, and a specificity of 0.75 over the testing set. The explainability analysis highlighted the remarkable significance of specific clinical attributes, in particular the oropharynx subsite tumor location and the patient’s smoking history. The contribution of each variable to the prediction was substantiated by creating a 95% confidence intervals of model coefficients by means of a 10,000 sample bootstrap and by analyzing top contributors across the best-performing models.
Conclusions:
The combination of specific clinical factors typically collected for OPSCC patients, such as smoking habits and the tumor oropharynx sub-location, along with the ML models hereby presented, can by themselves provide an informed analysis of the HPV status, and of proper use of data science techniques to explain it. Future work should focus on adding other data modalities such as CT scans to enhance performance and to uncover new relations, thus aiding medical practitioners in diagnosing OPSCC more accurately.
{"title":"Predictive analysis of clinical features for HPV status in oropharynx squamous cell carcinoma: A machine learning approach with explainability","authors":"Emily Diaz Badilla , Ignasi Cos , Claudio Sampieri , Berta Alegre , Isabel Vilaseca , Simone Balocco , Petia Radeva","doi":"10.1016/j.cmpbup.2024.100170","DOIUrl":"10.1016/j.cmpbup.2024.100170","url":null,"abstract":"<div><h3>Background and Objective:</h3><div>Oropharynx Squamous Cell Carcinoma (OPSCC) linked to Human Papillomavirus (HPV) exhibits a more favorable prognosis than other squamous cell carcinomas of the upper aerodigestive tract. Finding reliable non-invasive detection methods of this prognostic entity is key to propose appropriate therapeutic decisions. This study aims to provide a comprehensive method based on pre-treatment clinical data for predicting the patient’s HPV status over a large OPSCC patient cohort and employing explainability techniques to interpret the significance and effects of the features.</div></div><div><h3>Materials and Methods:</h3><div>We employed the RADCURE dataset clinical information to train six Machine Learning algorithms, evaluating them via cross-validation for grid search hyper-parameter tuning and feature selection as well as a final performance measurement on a 20% sample test set. For explainability, SHAP and LIME were used to identify the most relevant relationships and their effect on the predictive model. Furthermore, additional publicly available datasets were scrutinized to compare outcomes and assess the method’s generalization across diverse feature sets and populations.</div></div><div><h3>Results:</h3><div>The best model yielded an AUC of 0.85, a sensitivity of 0.83, and a specificity of 0.75 over the testing set. The explainability analysis highlighted the remarkable significance of specific clinical attributes, in particular the oropharynx subsite tumor location and the patient’s smoking history. The contribution of each variable to the prediction was substantiated by creating a 95% confidence intervals of model coefficients by means of a 10,000 sample bootstrap and by analyzing top contributors across the best-performing models.</div></div><div><h3>Conclusions:</h3><div>The combination of specific clinical factors typically collected for OPSCC patients, such as smoking habits and the tumor oropharynx sub-location, along with the ML models hereby presented, can by themselves provide an informed analysis of the HPV status, and of proper use of data science techniques to explain it. Future work should focus on adding other data modalities such as CT scans to enhance performance and to uncover new relations, thus aiding medical practitioners in diagnosing OPSCC more accurately.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"7 ","pages":"Article 100170"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143180353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}