首页 > 最新文献

Computer methods and programs in biomedicine update最新文献

英文 中文
A fully automated, data-driven approach for dimensionality reduction and clustering in single-cell RNA-seq analysis 在单细胞RNA-seq分析中用于降维和聚类的全自动,数据驱动的方法
Pub Date : 2026-01-26 DOI: 10.1016/j.cmpbup.2026.100232
Hyun Kim , Faeyza Rishad Ardi , Kévin Spinicci , Jae Kyoung Kim
Single-cell RNA sequencing (scRNA-seq) provides deep insights into cellular heterogeneity but demands robust dimensionality reduction (DR) and clustering to handle high-dimensional, noisy data. Many DR and clustering approaches rely on user-defined parameters, undermining reliability. Even automated clustering methods like ChooseR and MultiK still employ fixed principal component defaults, limiting their full automation. To overcome this limitation, we propose a fully automated clustering approach by integrating scLENS—a method for optimal PC selection—with these tools. Our fully automated approach improves clustering performance by ∼14 % for ChooseR and ∼10 % for MultiK and identifies additional cell subtypes, highlighting the advantages of adaptive, data-driven DR.
单细胞RNA测序(scRNA-seq)提供了对细胞异质性的深入了解,但需要强大的降维(DR)和聚类来处理高维,嘈杂的数据。许多容灾和聚类方法依赖于用户定义的参数,从而降低了可靠性。即使是像ChooseR和MultiK这样的自动化集群方法仍然使用固定的主成分默认值,限制了它们的完全自动化。为了克服这一限制,我们提出了一种完全自动化的聚类方法,通过将sclen -一种最佳PC选择方法-与这些工具集成在一起。我们的全自动方法将ChooseR和MultiK的聚类性能分别提高了14%和10%,并识别了额外的细胞亚型,突出了自适应数据驱动DR的优势。
{"title":"A fully automated, data-driven approach for dimensionality reduction and clustering in single-cell RNA-seq analysis","authors":"Hyun Kim ,&nbsp;Faeyza Rishad Ardi ,&nbsp;Kévin Spinicci ,&nbsp;Jae Kyoung Kim","doi":"10.1016/j.cmpbup.2026.100232","DOIUrl":"10.1016/j.cmpbup.2026.100232","url":null,"abstract":"<div><div>Single-cell RNA sequencing (scRNA-seq) provides deep insights into cellular heterogeneity but demands robust dimensionality reduction (DR) and clustering to handle high-dimensional, noisy data. Many DR and clustering approaches rely on user-defined parameters, undermining reliability. Even automated clustering methods like ChooseR and MultiK still employ fixed principal component defaults, limiting their full automation. To overcome this limitation, we propose a fully automated clustering approach by integrating scLENS—a method for optimal PC selection—with these tools. Our fully automated approach improves clustering performance by ∼14 % for ChooseR and ∼10 % for MultiK and identifies additional cell subtypes, highlighting the advantages of adaptive, data-driven DR.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100232"},"PeriodicalIF":0.0,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ruzicka similarity-based brain EEG clustering for improved intelligent epilepsy diagnosis 基于Ruzicka相似度的脑电聚类提高癫痫智能诊断
Pub Date : 2026-01-07 DOI: 10.1016/j.cmpbup.2025.100229
Sarah L. Alzamili , Salwa Shakir Baawi , Mustafa Noaman Kadhim , Dhiah Al-Shammary , Ayman Ibaida , Khandakar Ahmed
This paper aims to introduce a novel clustering method for electroencephalogram (EEG) based on Ruzicka mathematical similarity and incorporates Particle Swarm Optimization (PSO) to enhance feature selection. Medical datasets often contain both convergent and divergent features, making feature selection a crucial step for accurate disease diagnosis and public health applications. The proposed Ruzicka-based clustering method groups EEG records into non-overlapping subgroups according to a defined similarity metric. Cluster centers are determined using a polynomial-based calculation, after which EEG records are assigned to clusters based on the Ruzicka similarity measure. After clustering the EEG records into highly coherent groups, PSO algorithm is employed to identify the most effective subset of features. This process enhances classification accuracy and contributes to more reliable diagnostic outcomes by combining clustering with feature selection. The selected features are then evaluated using multiple classifiers, including Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), K-Nearest Neighbors (KNN), and Naive Bayes (NB). Accuracy, recall, f1-score and precision measures are conducted to evaluate the model’s performance. Experimental validation is carried out on the Bonn University EEG dataset. With both RF and NB classifiers, the proposed model has achieved up to 100% accuracy compared to other models. The proposed method can be implemented in medical organizations as a decision-support system to assist healthcare professionals in analyzing EEG patterns. Its integration can enhance the accuracy and efficiency of disease diagnosis, leading to improved patient care.
本文提出了一种基于Ruzicka数学相似度的脑电图聚类方法,并结合粒子群算法(PSO)增强特征选择。医疗数据集通常包含收敛和发散特征,这使得特征选择成为准确疾病诊断和公共卫生应用的关键步骤。提出的基于ruzicka的聚类方法根据定义的相似度度量将EEG记录划分为不重叠的子组。使用基于多项式的计算确定聚类中心,然后根据Ruzicka相似性度量将EEG记录分配到聚类中。将EEG记录聚类成高度相干的组后,采用粒子群算法识别最有效的特征子集。该过程通过将聚类与特征选择相结合,提高了分类精度,并有助于获得更可靠的诊断结果。然后使用多个分类器对所选特征进行评估,包括支持向量机(SVM)、决策树(DT)、随机森林(RF)、k近邻(KNN)和朴素贝叶斯(NB)。采用准确性、召回率、f1分数和精度指标来评估模型的性能。在波恩大学EEG数据集上进行了实验验证。使用RF和NB分类器,与其他模型相比,所提出的模型达到了100%的准确率。该方法可作为决策支持系统在医疗机构中实现,以帮助医疗保健专业人员分析脑电图模式。它的整合可以提高疾病诊断的准确性和效率,从而改善患者的护理。
{"title":"Ruzicka similarity-based brain EEG clustering for improved intelligent epilepsy diagnosis","authors":"Sarah L. Alzamili ,&nbsp;Salwa Shakir Baawi ,&nbsp;Mustafa Noaman Kadhim ,&nbsp;Dhiah Al-Shammary ,&nbsp;Ayman Ibaida ,&nbsp;Khandakar Ahmed","doi":"10.1016/j.cmpbup.2025.100229","DOIUrl":"10.1016/j.cmpbup.2025.100229","url":null,"abstract":"<div><div>This paper aims to introduce a novel clustering method for electroencephalogram (EEG) based on Ruzicka mathematical similarity and incorporates Particle Swarm Optimization (PSO) to enhance feature selection. Medical datasets often contain both convergent and divergent features, making feature selection a crucial step for accurate disease diagnosis and public health applications. The proposed Ruzicka-based clustering method groups EEG records into non-overlapping subgroups according to a defined similarity metric. Cluster centers are determined using a polynomial-based calculation, after which EEG records are assigned to clusters based on the Ruzicka similarity measure. After clustering the EEG records into highly coherent groups, PSO algorithm is employed to identify the most effective subset of features. This process enhances classification accuracy and contributes to more reliable diagnostic outcomes by combining clustering with feature selection. The selected features are then evaluated using multiple classifiers, including Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), K-Nearest Neighbors (KNN), and Naive Bayes (NB). Accuracy, recall, f1-score and precision measures are conducted to evaluate the model’s performance. Experimental validation is carried out on the Bonn University EEG dataset. With both RF and NB classifiers, the proposed model has achieved up to 100% accuracy compared to other models. The proposed method can be implemented in medical organizations as a decision-support system to assist healthcare professionals in analyzing EEG patterns. Its integration can enhance the accuracy and efficiency of disease diagnosis, leading to improved patient care.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100229"},"PeriodicalIF":0.0,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145977757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reassessment of pelvic radiographic measurements for delivery prediction using machine learning 利用机器学习对骨盆x线测量进行产程预测的重新评估
Pub Date : 2026-01-03 DOI: 10.1016/j.cmpbup.2026.100231
Ayano Suemori , Jota Maki , Hikaru Ooba , Hikari Nakato , Keiichi Oishi , Tomohiro Mitoma , Sakurako Mishima , Akiko Ohira , Satoe Kirino , Eriko Eto , Hisashi Masuyama

Background and Objective

Pelvimetry has historically shown limitations in diagnosing cephalopelvic disproportion, yet recent evidence suggests potential predictive value. This study uses artificial intelligence to reassess pelvimetry's utility in predicting cesarean section.

Methods

This single-center, retrospective case-control study included pregnant women at 37 weeks 0 days and 41 weeks 6 days of gestation, who underwent pelvic radiography for suspected cephalopelvic disproportion from January 2015 to August 2023. Pelvic radiographic measurements were obtained using the Guthmann-Sussmann method. Maternal characteristics, ultrasound examination data, and pelvimetric measurements were extracted from electronic medical records as potential predictors of delivery outcomes. In this study, the input data were analyzed using four machine learning models: Light Gradient Boosting Machine, Random Forest, Extreme Gradient Boosting, and Category Boosting. The primary outcome was the hierarchical importance of pelvic measurements in the predictive models.

Results

Analysis included 355 participants. The strongest predictors were the differences between (1) the obstetric conjugate and biparietal diameter and (2) the interspinous diameter and biparietal diameter. The receiver operating characteristic curve for each model was Light Gradient Boosting Machine 0.74, Random Forest 0.85, Extreme Gradient Boosting 0.83, and Category Boosting 0.82.

Conclusions

We developed high-performance machine learning models demonstrating that pelvimetric measurements— particularly, the differences between the obstetric conjugate and biparietal diameter, and between the interspinous diameter and biparietal diameter —combined with maternal and ultrasound factors, are strong predictors of cesarean section. The model’s ability to capture nonlinear associations may enhance predictive accuracy, and reassessing pelvimetric values could support delivery planning in clinical settings.
背景与目的骨盆测量在诊断头骨盆比例失调方面一直存在局限性,但最近的证据表明其具有潜在的预测价值。本研究使用人工智能重新评估骨盆测量在预测剖宫产中的效用。方法本研究为单中心、回顾性病例对照研究,纳入2015年1月至2023年8月期间因疑似头骨盆比例失调接受盆腔x线检查的妊娠37周0天和41周6天孕妇。盆腔x线测量采用Guthmann-Sussmann方法。从电子病历中提取产妇特征、超声检查数据和骨盆测量数据,作为分娩结局的潜在预测因素。在本研究中,使用四种机器学习模型对输入数据进行分析:光梯度增强机、随机森林、极端梯度增强和类别增强。主要结果是预测模型中骨盆测量的等级重要性。结果共纳入355名参与者。最强的预测因子是(1)产科共轭物和双顶骨直径和(2)棘间直径和双顶骨直径之间的差异。各模型的接收者工作特征曲线分别为光梯度增强机0.74、随机森林0.85、极端梯度增强0.83、类别增强0.82。结论:我们开发了高性能的机器学习模型,证明骨盆测量-特别是产科结合部和双顶叶直径之间的差异,以及棘间直径和双顶叶直径之间的差异-结合母体和超声因素,是剖宫产的有力预测因素。该模型捕捉非线性关联的能力可以提高预测的准确性,重新评估骨盆测量值可以支持临床环境中的分娩计划。
{"title":"Reassessment of pelvic radiographic measurements for delivery prediction using machine learning","authors":"Ayano Suemori ,&nbsp;Jota Maki ,&nbsp;Hikaru Ooba ,&nbsp;Hikari Nakato ,&nbsp;Keiichi Oishi ,&nbsp;Tomohiro Mitoma ,&nbsp;Sakurako Mishima ,&nbsp;Akiko Ohira ,&nbsp;Satoe Kirino ,&nbsp;Eriko Eto ,&nbsp;Hisashi Masuyama","doi":"10.1016/j.cmpbup.2026.100231","DOIUrl":"10.1016/j.cmpbup.2026.100231","url":null,"abstract":"<div><h3>Background and Objective</h3><div>Pelvimetry has historically shown limitations in diagnosing cephalopelvic disproportion, yet recent evidence suggests potential predictive value. This study uses artificial intelligence to reassess pelvimetry's utility in predicting cesarean section.</div></div><div><h3>Methods</h3><div>This single-center, retrospective case-control study included pregnant women at 37 weeks 0 days and 41 weeks 6 days of gestation, who underwent pelvic radiography for suspected cephalopelvic disproportion from January 2015 to August 2023. Pelvic radiographic measurements were obtained using the Guthmann-Sussmann method. Maternal characteristics, ultrasound examination data, and pelvimetric measurements were extracted from electronic medical records as potential predictors of delivery outcomes. In this study, the input data were analyzed using four machine learning models: Light Gradient Boosting Machine, Random Forest, Extreme Gradient Boosting, and Category Boosting. The primary outcome was the hierarchical importance of pelvic measurements in the predictive models.</div></div><div><h3>Results</h3><div>Analysis included 355 participants. The strongest predictors were the differences between (1) the obstetric conjugate and biparietal diameter and (2) the interspinous diameter and biparietal diameter. The receiver operating characteristic curve for each model was Light Gradient Boosting Machine 0.74, Random Forest 0.85, Extreme Gradient Boosting 0.83, and Category Boosting 0.82.</div></div><div><h3>Conclusions</h3><div>We developed high-performance machine learning models demonstrating that pelvimetric measurements— particularly, the differences between the obstetric conjugate and biparietal diameter, and between the interspinous diameter and biparietal diameter —combined with maternal and ultrasound factors, are strong predictors of cesarean section. The model’s ability to capture nonlinear associations may enhance predictive accuracy, and reassessing pelvimetric values could support delivery planning in clinical settings.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100231"},"PeriodicalIF":0.0,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
USE-MiT: Attention-based model for breast ultrasound images segmentation USE-MiT:基于注意力的乳腺超声图像分割模型
Pub Date : 2026-01-02 DOI: 10.1016/j.cmpbup.2025.100226
Nadia Brancati, Maria Frucci
Early detection of breast cancer disease is crucial to enhancing patient outcomes through effective treatment. Ultrasound imaging, a simple, low-cost, and non-invasive technique, can help differentiate cystic from solid masses, mainly on the basis of the analysis of the detected anomalies’ boundaries. Automatic detection methods of mass boundaries in ultrasound images can reduce the dependence on the radiologist’s experience for this analysis. We propose USE-MiT, a segmentation method for breast ultrasound images, based on a UNet architecture in which the encoder and decoder modules are interfaced through a configuration based on Squeeze and Excitation Attention modules, and the encoder structure is represented by a Mix Transformer. The model was trained and validated, with a 4-fold cross-validation, on the Breast Ultrasound Image Dataset, and was tested on the independent dataset, namely Breast-Lesions-USG. The experiments have demonstrated the efficiency of the model, achieving an overall Dice of 0.88 and an IoU of 0.64, outperforming the state-of-the-art. The source code is available at https://github.com/nbrancati/USE-MiT.
早期发现乳腺癌疾病对于通过有效治疗提高患者预后至关重要。超声成像是一种简单、低成本、无创的技术,主要基于对检测到的异常边界的分析,可以帮助区分囊性肿块和实性肿块。超声图像中质量边界的自动检测方法可以减少对放射科医生经验的依赖。我们提出了一种基于UNet架构的乳腺超声图像分割方法USE-MiT,其中编码器和解码器模块通过基于挤压和激励注意模块的配置进行接口,编码器结构由Mix Transformer表示。在乳腺超声图像数据集上对模型进行4倍交叉验证和训练,并在独立数据集Breast- lesions - usg上进行测试。实验证明了该模型的效率,实现了0.88的总体Dice和0.64的IoU,优于最先进的技术。源代码可从https://github.com/nbrancati/USE-MiT获得。
{"title":"USE-MiT: Attention-based model for breast ultrasound images segmentation","authors":"Nadia Brancati,&nbsp;Maria Frucci","doi":"10.1016/j.cmpbup.2025.100226","DOIUrl":"10.1016/j.cmpbup.2025.100226","url":null,"abstract":"<div><div>Early detection of breast cancer disease is crucial to enhancing patient outcomes through effective treatment. Ultrasound imaging, a simple, low-cost, and non-invasive technique, can help differentiate cystic from solid masses, mainly on the basis of the analysis of the detected anomalies’ boundaries. Automatic detection methods of mass boundaries in ultrasound images can reduce the dependence on the radiologist’s experience for this analysis. We propose USE-MiT, a segmentation method for breast ultrasound images, based on a UNet architecture in which the encoder and decoder modules are interfaced through a configuration based on Squeeze and Excitation Attention modules, and the encoder structure is represented by a Mix Transformer. The model was trained and validated, with a 4-fold cross-validation, on the Breast Ultrasound Image Dataset, and was tested on the independent dataset, namely Breast-Lesions-USG. The experiments have demonstrated the efficiency of the model, achieving an overall Dice of 0<em>.</em>88 and an IoU of 0<em>.</em>64, outperforming the state-of-the-art. The source code is available at <span><span>https://github.com/nbrancati/USE-MiT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100226"},"PeriodicalIF":0.0,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-supervised learning and integration of multi-sequence MR-images for carotid vessel wall and plaque segmentation 半监督学习和多序列磁共振图像集成用于颈动脉血管壁和斑块分割
Pub Date : 2025-12-31 DOI: 10.1016/j.cmpbup.2025.100230
Marie-Christine Pali , Christina Schwaiger , Malik Galijasevic , Valentin K. Ladenhauf , Stephanie Mangesius , Elke R. Gizewski
The analysis of carotid arteries, particularly plaques, in multi-sequence Magnetic Resonance Imaging (MRI) data is crucial for assessing the risk of atherosclerosis and ischemic stroke. In order to evaluate metrics and radiomic features, quantifying the state of atherosclerosis, accurate segmentation is important. However, the complex morphology of plaques and the scarcity of labeled data poses significant challenges. In this work, we address these problems and propose a semi-supervised deep learning-based approach designed to effectively integrate multi-sequence MRI data for the segmentation of carotid artery vessel wall and plaque. The proposed algorithm consists of two networks: a coarse localization model identifies the region of interest guided by some prior knowledge on the position and number of carotid arteries, followed by a fine segmentation model for precise delineation of vessel walls and plaques. To effectively integrate complementary information across different MRI sequences, we investigate different fusion strategies and introduce a multi-level multi-sequence version of U-Net architecture. To address the challenges of limited labeled data and the complexity of carotid artery MRI, we propose a semi-supervised approach that enforces consistency under various input transformations. Our approach is evaluated on 52 patients with arteriosclerosis, each with five MRI sequences. Comprehensive experiments demonstrate the effectiveness of our approach and emphasize the role of fusion point selection in U-Net-based architectures. To validate the accuracy of our results, we also include an expert-based assessment of model performance. Our findings highlight the potential of fusion strategies and semi-supervised learning for improving carotid artery segmentation in data-limited MRI applications.
在多序列磁共振成像(MRI)数据中分析颈动脉,特别是斑块,对于评估动脉粥样硬化和缺血性中风的风险至关重要。为了评估指标和放射学特征,量化动脉粥样硬化的状态,准确的分割是很重要的。然而,斑块的复杂形态和标记数据的稀缺性构成了重大挑战。在这项工作中,我们解决了这些问题,并提出了一种基于半监督深度学习的方法,旨在有效地整合多序列MRI数据,用于分割颈动脉血管壁和斑块。该算法由两个网络组成:一个粗定位模型通过对颈动脉位置和数量的先验知识来识别感兴趣的区域,然后是一个精细分割模型来精确描绘血管壁和斑块。为了有效地整合不同MRI序列的互补信息,我们研究了不同的融合策略,并引入了多级多序列版本的U-Net架构。为了解决有限的标记数据和颈动脉MRI的复杂性的挑战,我们提出了一种半监督的方法,在各种输入变换下强制一致性。我们的方法在52例动脉硬化患者中进行了评估,每个患者有5个MRI序列。综合实验证明了该方法的有效性,并强调了融合点选择在基于u - net的体系结构中的作用。为了验证结果的准确性,我们还包括基于专家的模型性能评估。我们的研究结果强调了融合策略和半监督学习在数据有限的MRI应用中改善颈动脉分割的潜力。
{"title":"Semi-supervised learning and integration of multi-sequence MR-images for carotid vessel wall and plaque segmentation","authors":"Marie-Christine Pali ,&nbsp;Christina Schwaiger ,&nbsp;Malik Galijasevic ,&nbsp;Valentin K. Ladenhauf ,&nbsp;Stephanie Mangesius ,&nbsp;Elke R. Gizewski","doi":"10.1016/j.cmpbup.2025.100230","DOIUrl":"10.1016/j.cmpbup.2025.100230","url":null,"abstract":"<div><div>The analysis of carotid arteries, particularly plaques, in multi-sequence Magnetic Resonance Imaging (MRI) data is crucial for assessing the risk of atherosclerosis and ischemic stroke. In order to evaluate metrics and radiomic features, quantifying the state of atherosclerosis, accurate segmentation is important. However, the complex morphology of plaques and the scarcity of labeled data poses significant challenges. In this work, we address these problems and propose a semi-supervised deep learning-based approach designed to effectively integrate multi-sequence MRI data for the segmentation of carotid artery vessel wall and plaque. The proposed algorithm consists of two networks: a coarse localization model identifies the region of interest guided by some prior knowledge on the position and number of carotid arteries, followed by a fine segmentation model for precise delineation of vessel walls and plaques. To effectively integrate complementary information across different MRI sequences, we investigate different fusion strategies and introduce a multi-level multi-sequence version of U-Net architecture. To address the challenges of limited labeled data and the complexity of carotid artery MRI, we propose a semi-supervised approach that enforces consistency under various input transformations. Our approach is evaluated on 52 patients with arteriosclerosis, each with five MRI sequences. Comprehensive experiments demonstrate the effectiveness of our approach and emphasize the role of fusion point selection in U-Net-based architectures. To validate the accuracy of our results, we also include an expert-based assessment of model performance. Our findings highlight the potential of fusion strategies and semi-supervised learning for improving carotid artery segmentation in data-limited MRI applications.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100230"},"PeriodicalIF":0.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigation into respiratory sound classification for an imbalanced data set using hybrid LSTM-KAN architectures 基于混合LSTM-KAN架构的不平衡数据集呼吸声分类研究
Pub Date : 2025-12-26 DOI: 10.1016/j.cmpbup.2025.100227
Nithinkumar K.V., Anand R.
Respiratory sounds captured via auscultation contain critical clues for diagnosing pulmonary conditions. Automated classification of these sounds faces the dual challenge of distinguishing subtle acoustic patterns and addressing the severe class imbalance inherent in clinical datasets. This study investigates methods for classifying respiratory sounds into multiple disease categories, with a specific focus on mitigating pronounced class imbalances. In this study, we developed and evaluated a hybrid deep learning model incorporating a Long Short-Term Memory (LSTM) network as a feature sequence encoder, followed by a Kolmogorov–Arnold Network (KAN) for classification. This architecture was combined with a comprehensive feature extraction pipeline and targeted imbalance mitigation techniques. The model was evaluated using a public respiratory sound database comprising six classes with a highly skewed distribution. Strategies such as focal loss, class-specific data augmentation, and Synthetic Minority Over-sampling Technique (SMOTE) are employed to improve minority class recognition. Our results demonstrate that the proposed Hybrid LSTM-KAN model achieves a high overall accuracy of 94.6% and a macro-averaged F1-score of 0.703. This performance is notable, given that the dominant class (COPD) constitutes over 86% of the data. While challenges persist for the rarest classes (Bronchiolitis and URTI, with F1-scores of approximately 0.45 and 0.44, respectively), the approach shows significant improvement in their detection compared to naive baselines and performs strongly on other minority classes, such as bronchiectasis (F1-score 0.84). This study contributes to the development of intelligent auscultation tools for the early detection of respiratory diseases, highlighting the potential of combining recurrent neural networks with advanced KAN architectures and focused imbalance handling.
通过听诊捕获的呼吸声音包含诊断肺部疾病的关键线索。这些声音的自动分类面临着双重挑战,即区分细微的声学模式和解决临床数据集中固有的严重类别不平衡。本研究探讨了将呼吸音分类为多种疾病类别的方法,特别侧重于减轻明显的类别不平衡。在本研究中,我们开发并评估了一种混合深度学习模型,该模型将长短期记忆(LSTM)网络作为特征序列编码器,然后使用Kolmogorov-Arnold网络(KAN)进行分类。该架构结合了全面的特征提取管道和有针对性的失衡缓解技术。该模型使用公共呼吸声数据库进行评估,该数据库包含6个高度倾斜分布的类别。采用焦点丢失、特定类别数据增强和合成少数派过采样技术(SMOTE)等策略来提高少数派类别识别。结果表明,本文提出的混合LSTM-KAN模型总体准确率达到94.6%,宏观平均f1得分为0.703。考虑到占主导地位的类别(COPD)占数据的86%以上,这一表现值得注意。虽然对于最罕见的类别(细支气管炎和尿路感染,f1分数分别约为0.45和0.44)仍然存在挑战,但与初始基线相比,该方法在检测它们方面显示出显着改善,并且在其他少数类别,如支气管扩张(f1分数≈0.84)上表现强劲。这项研究有助于智能听诊工具的发展,用于早期发现呼吸系统疾病,突出了将递归神经网络与先进的KAN架构和集中不平衡处理相结合的潜力。
{"title":"Investigation into respiratory sound classification for an imbalanced data set using hybrid LSTM-KAN architectures","authors":"Nithinkumar K.V.,&nbsp;Anand R.","doi":"10.1016/j.cmpbup.2025.100227","DOIUrl":"10.1016/j.cmpbup.2025.100227","url":null,"abstract":"<div><div>Respiratory sounds captured via auscultation contain critical clues for diagnosing pulmonary conditions. Automated classification of these sounds faces the dual challenge of distinguishing subtle acoustic patterns and addressing the severe class imbalance inherent in clinical datasets. This study investigates methods for classifying respiratory sounds into multiple disease categories, with a specific focus on mitigating pronounced class imbalances. In this study, we developed and evaluated a hybrid deep learning model incorporating a Long Short-Term Memory (LSTM) network as a feature sequence encoder, followed by a Kolmogorov–Arnold Network (KAN) for classification. This architecture was combined with a comprehensive feature extraction pipeline and targeted imbalance mitigation techniques. The model was evaluated using a public respiratory sound database comprising six classes with a highly skewed distribution. Strategies such as focal loss, class-specific data augmentation, and Synthetic Minority Over-sampling Technique (SMOTE) are employed to improve minority class recognition. Our results demonstrate that the proposed Hybrid LSTM-KAN model achieves a high overall accuracy of 94.6% and a macro-averaged <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-score of 0.703. This performance is notable, given that the dominant class (COPD) constitutes over 86% of the data. While challenges persist for the rarest classes (Bronchiolitis and URTI, with <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-scores of approximately 0.45 and 0.44, respectively), the approach shows significant improvement in their detection compared to naive baselines and performs strongly on other minority classes, such as bronchiectasis (<span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-score <span><math><mo>≈</mo></math></span> 0.84). This study contributes to the development of intelligent auscultation tools for the early detection of respiratory diseases, highlighting the potential of combining recurrent neural networks with advanced KAN architectures and focused imbalance handling.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100227"},"PeriodicalIF":0.0,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning-based analysis of ECG and PCG signals for rheumatic heart disease detection: A scoping review (2015–2025) 基于机器学习的ECG和PCG信号分析用于风湿性心脏病检测:范围综述(2015-2025)
Pub Date : 2025-12-23 DOI: 10.1016/j.cmpbup.2025.100228
Damilare Emmanuel Olatunji , Julius Dona Zannu, Carine Pierrette Mukamakuza, Godbright Nixon Uiso, Chol Buol, John Bosco Thuo, Nchofon Tagha Ghogomu, Mona Mamoun Mubarak Aman, Evelyne Umubyeyi
AI-powered stethoscopes offer a promising alternative for screening rheumatic heart disease (RHD), particularly in regions with limited diagnostic infrastructure. Early detection is vital, yet echocardiography, the gold standard tool, remains largely inaccessible in low-resource settings due to cost and workforce constraints. This review systematically examines machine learning (ML) applications from 2015 to 2025 that analyze electrocardiogram (ECG) and phonocardiogram (PCG) data to support accessible, scalable screening of all RHD variants in relation to the World Heart Federation's "25 by 25" goal to reduce RHD mortality. Using PRISMA-ScR guidelines, 37 peer-reviewed studies were selected from PubMed, IEEE Xplore, Scopus, and Embase. Convolutional neural networks (CNNs) dominate recent efforts, achieving a median accuracy of 97.75 %, F1-score of 0.95, and AUROC of 0.89. However, challenges remain: 73 % of studies used single-center datasets, 81.1 % relied on private data, only 10.8 % were externally validated, and none assessed cost-effectiveness. Although 45.9 % originated from endemic regions, few addressed demographic diversity or implementation feasibility. These gaps underscore the disconnect between model performance and clinical readiness. Bridging this divide requires standardized benchmark datasets, prospective trials in endemic areas, and broader validation. If these issues are addressed, AI-augmented auscultation could transform cardiovascular diagnostics in underserved populations, thereby aiding early detection. This review also offers practical recommendations for building accessible ML-based RHD screening tools, aiming to close the diagnostic gap in low-resource settings where conventional auscultation may miss up to 90 % of cases and echocardiography remains out of reach.
人工智能听诊器为筛查风湿性心脏病(RHD)提供了一种有希望的替代方法,特别是在诊断基础设施有限的地区。早期检测至关重要,但由于成本和劳动力限制,超声心动图作为一种金标准工具,在资源匮乏的环境中仍然难以获得。本综述系统地研究了2015年至2025年机器学习(ML)的应用,这些应用分析了心电图(ECG)和心音图(PCG)数据,以支持与世界心脏联合会(World Heart Federation)降低RHD死亡率的“25 by 25”目标相关的所有RHD变异的可访问、可扩展的筛查。使用PRISMA-ScR指南,从PubMed、IEEE explore、Scopus和Embase中选择了37项同行评议的研究。卷积神经网络(cnn)在最近的研究中占主导地位,实现了97.75%的中位数准确率,f1得分为0.95,AUROC为0.89。然而,挑战仍然存在:73%的研究使用单中心数据集,81.1%依赖于私人数据,只有10.8%的研究经过外部验证,没有评估成本效益。虽然45.9%来自流行地区,但很少涉及人口多样性或实施可行性。这些差距强调了模型性能和临床准备之间的脱节。弥合这一鸿沟需要标准化的基准数据集、流行地区的前瞻性试验和更广泛的验证。如果这些问题得到解决,人工智能增强听诊可以改变服务不足人群的心血管诊断,从而有助于早期发现。本综述还为建立可访问的基于ml的RHD筛查工具提供了实用建议,旨在缩小资源匮乏地区的诊断差距,在这些地区,传统听诊可能错过高达90%的病例,超声心动图仍然遥不可及。
{"title":"Machine learning-based analysis of ECG and PCG signals for rheumatic heart disease detection: A scoping review (2015–2025)","authors":"Damilare Emmanuel Olatunji ,&nbsp;Julius Dona Zannu,&nbsp;Carine Pierrette Mukamakuza,&nbsp;Godbright Nixon Uiso,&nbsp;Chol Buol,&nbsp;John Bosco Thuo,&nbsp;Nchofon Tagha Ghogomu,&nbsp;Mona Mamoun Mubarak Aman,&nbsp;Evelyne Umubyeyi","doi":"10.1016/j.cmpbup.2025.100228","DOIUrl":"10.1016/j.cmpbup.2025.100228","url":null,"abstract":"<div><div>AI-powered stethoscopes offer a promising alternative for screening rheumatic heart disease (RHD), particularly in regions with limited diagnostic infrastructure. Early detection is vital, yet echocardiography, the gold standard tool, remains largely inaccessible in low-resource settings due to cost and workforce constraints. This review systematically examines machine learning (ML) applications from 2015 to 2025 that analyze electrocardiogram (ECG) and phonocardiogram (PCG) data to support accessible, scalable screening of all RHD variants in relation to the World Heart Federation's \"25 by 25\" goal to reduce RHD mortality. Using PRISMA-ScR guidelines, 37 peer-reviewed studies were selected from PubMed, IEEE Xplore, Scopus, and Embase. Convolutional neural networks (CNNs) dominate recent efforts, achieving a median accuracy of 97.75 %, F1-score of 0.95, and AUROC of 0.89. However, challenges remain: 73 % of studies used single-center datasets, 81.1 % relied on private data, only 10.8 % were externally validated, and none assessed cost-effectiveness. Although 45.9 % originated from endemic regions, few addressed demographic diversity or implementation feasibility. These gaps underscore the disconnect between model performance and clinical readiness. Bridging this divide requires standardized benchmark datasets, prospective trials in endemic areas, and broader validation. If these issues are addressed, AI-augmented auscultation could transform cardiovascular diagnostics in underserved populations, thereby aiding early detection. This review also offers practical recommendations for building accessible ML-based RHD screening tools, aiming to close the diagnostic gap in low-resource settings where conventional auscultation may miss up to 90 % of cases and echocardiography remains out of reach.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100228"},"PeriodicalIF":0.0,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NuDetect: A point annotation-based framework for nuclei detection using density estimation and conformal thresholding NuDetect:一个基于点注释的核检测框架,使用密度估计和保形阈值
Pub Date : 2025-12-18 DOI: 10.1016/j.cmpbup.2025.100225
Khaled Al-Thelaya , Nauman Ullah Gilal , Fahad Majeed , Mahmood Alzubaidi , Sabri Boughorbel , William Mifsud , Marco Agus , Jens Schneider
Whole Slide Imaging (WSI) generates vast data sets in histopathology. Manual annotation is impractical and time consuming. There is, thus, a dire need for effective analysis tools. However, a lack of annotated data hampers supervised learning of models that generalize well across domains. Point annotations have emerged as a practical remedy. Motivated by the fact that the randomness of the tissue slice angle and depth renders size measurements of nuclei — such as it would be provided by segmentation — meaningless (unlike in other medical tasks), point annotations are efficient and useful due to their sparseness. In this paper, we formulate the task of nuclei detection as a density estimation problem. We use a U-Net architecture with PoolFormer encoders as the basis to compute point-annotations for nuclei detection. Specifically, we use Gaussian kernels to generate target density masks from a segmented data set and use isocontouring to separate overlapping nuclei. We show that conformal prediction can compute a near-optimal threshold for contouring. This significantly enhances our detection rate. To address cross-domain generalization issues, our framework uses color normalization. As a result, our framework sets a new state-of-the-art in nucleus localization on both the PanNuke and MoNuSeg data sets, and we demonstrate our cross-domain generalization capabilities using samples of the TCGA data set.
全玻片成像(WSI)在组织病理学中产生大量数据集。手动注释是不切实际的,而且耗时。因此,迫切需要有效的分析工具。然而,缺乏带注释的数据阻碍了模型的监督学习,这些模型可以很好地跨领域泛化。点注释作为一种实用的补救措施出现了。由于组织切片角度和深度的随机性使得核的尺寸测量(如分割所提供的)变得毫无意义(与其他医学任务不同),点注释由于其稀疏性而变得高效和有用。在本文中,我们将核检测任务表述为密度估计问题。我们使用带有PoolFormer编码器的U-Net架构作为计算核检测点注释的基础。具体来说,我们使用高斯核从分割的数据集生成目标密度掩模,并使用等轮廓来分离重叠的核。我们证明保形预测可以计算出轮廓的近最优阈值。这大大提高了我们的检出率。为了解决跨域泛化问题,我们的框架使用颜色归一化。因此,我们的框架在PanNuke和MoNuSeg数据集上设置了新的核定位技术,并且我们使用TCGA数据集的样本展示了我们的跨域泛化能力。
{"title":"NuDetect: A point annotation-based framework for nuclei detection using density estimation and conformal thresholding","authors":"Khaled Al-Thelaya ,&nbsp;Nauman Ullah Gilal ,&nbsp;Fahad Majeed ,&nbsp;Mahmood Alzubaidi ,&nbsp;Sabri Boughorbel ,&nbsp;William Mifsud ,&nbsp;Marco Agus ,&nbsp;Jens Schneider","doi":"10.1016/j.cmpbup.2025.100225","DOIUrl":"10.1016/j.cmpbup.2025.100225","url":null,"abstract":"<div><div>Whole Slide Imaging (WSI) generates vast data sets in histopathology. Manual annotation is impractical and time consuming. There is, thus, a dire need for effective analysis tools. However, a lack of annotated data hampers supervised learning of models that generalize well across domains. Point annotations have emerged as a practical remedy. Motivated by the fact that the randomness of the tissue slice angle and depth renders size measurements of nuclei — such as it would be provided by segmentation — meaningless (unlike in other medical tasks), point annotations are efficient and useful due to their sparseness. In this paper, we formulate the task of nuclei detection as a density estimation problem. We use a U-Net architecture with PoolFormer encoders as the basis to compute point-annotations for nuclei detection. Specifically, we use Gaussian kernels to generate target density masks from a segmented data set and use isocontouring to separate overlapping nuclei. We show that conformal prediction can compute a near-optimal threshold for contouring. This significantly enhances our detection rate. To address cross-domain generalization issues, our framework uses color normalization. As a result, our framework sets a new state-of-the-art in nucleus localization on both the PanNuke and MoNuSeg data sets, and we demonstrate our cross-domain generalization capabilities using samples of the TCGA data set.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100225"},"PeriodicalIF":0.0,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards objective In-Vitro wound healing assessment with segment anything: A large evaluation of interactive and automated pipelines 面向客观的体外切口愈合评估:交互式和自动化管道的大规模评估
Pub Date : 2025-12-16 DOI: 10.1016/j.cmpbup.2025.100224
Katja Löwenstein , Johanna Rehrl , Anja Schuster , Michael Gadermayr
The in vitro scratch assay is a widely used assay in cell biology to assess the rate of wound closure related to a variety of therapeutic interventions. While manual measurement is subjective and vulnerable to intra- and interobserver variability, computer-based tools are theoretically objective, but in practice often contain parameters which are manually adjusted (individually per image or data set) and thereby provide a source for subjectivity. Modern deep learning approaches typically require large annotated training data which complicates instant applicability. In this paper, we deeply investigate the Segment Anything Model (SAM), a deep foundation model based on interactive point-prompts, which enables class-agnostic segmentation without tuning the network’s parameters based on any domain specific training data. With respect to segmentation accuracy, the interactive method significantly outperformed a semi-objective baseline that required manual inspection and, when necessary, parameter adjustments for each image. Experiments were conducted to evaluate the impact of variability due to interactive prompting. The results exhibited remarkably low intra- and interobserver variability, clearly surpassing the consistency of manual segmentation by domain experts. In addition, a fully automated zero-shot approach was explored, incorporating the self-supervised learning model DINOv2 as a preprocessing step before sampling input points for SAM, with various sampling methods systematically investigated.
体外划痕实验是细胞生物学中广泛使用的一种实验,用于评估与各种治疗干预相关的伤口愈合率。虽然人工测量是主观的,容易受到观察者内部和观察者之间变化的影响,但基于计算机的工具在理论上是客观的,但在实践中往往包含手动调整的参数(每个图像或数据集单独),从而提供了主观性的来源。现代深度学习方法通常需要大量带注释的训练数据,这使得即时适用性变得复杂。本文深入研究了基于交互式点提示的深度基础模型SAM (Segment Anything Model),该模型无需根据任何特定领域的训练数据调整网络参数,即可实现与类别无关的分割。在分割精度方面,交互式方法显著优于半客观基线,后者需要人工检查,并在必要时对每个图像进行参数调整。我们进行了实验来评估交互提示引起的变异性的影响。结果显示观察者内部和观察者之间的可变性非常低,明显超过了领域专家手工分割的一致性。此外,探索了一种全自动零采样方法,将自监督学习模型DINOv2作为SAM输入点采样前的预处理步骤,并对各种采样方法进行了系统研究。
{"title":"Towards objective In-Vitro wound healing assessment with segment anything: A large evaluation of interactive and automated pipelines","authors":"Katja Löwenstein ,&nbsp;Johanna Rehrl ,&nbsp;Anja Schuster ,&nbsp;Michael Gadermayr","doi":"10.1016/j.cmpbup.2025.100224","DOIUrl":"10.1016/j.cmpbup.2025.100224","url":null,"abstract":"<div><div>The <em>in vitro</em> scratch assay is a widely used assay in cell biology to assess the rate of wound closure related to a variety of therapeutic interventions. While manual measurement is subjective and vulnerable to intra- and interobserver variability, computer-based tools are theoretically objective, but in practice often contain parameters which are manually adjusted (individually per image or data set) and thereby provide a source for subjectivity. Modern deep learning approaches typically require large annotated training data which complicates instant applicability. In this paper, we deeply investigate the Segment Anything Model (SAM), a deep foundation model based on interactive point-prompts, which enables class-agnostic segmentation without tuning the network’s parameters based on any domain specific training data. With respect to segmentation accuracy, the interactive method significantly outperformed a semi-objective baseline that required manual inspection and, when necessary, parameter adjustments for each image. Experiments were conducted to evaluate the impact of variability due to interactive prompting. The results exhibited remarkably low intra- and interobserver variability, clearly surpassing the consistency of manual segmentation by domain experts. In addition, a fully automated zero-shot approach was explored, incorporating the self-supervised learning model DINOv2 as a preprocessing step before sampling input points for SAM, with various sampling methods systematically investigated.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"9 ","pages":"Article 100224"},"PeriodicalIF":0.0,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145766132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictive analysis of clinical features for HPV status in oropharynx squamous cell carcinoma: A machine learning approach with explainability 口咽鳞癌 HPV 状态的临床特征预测分析:一种具有可解释性的机器学习方法
Pub Date : 2025-01-01 DOI: 10.1016/j.cmpbup.2024.100170
Emily Diaz Badilla , Ignasi Cos , Claudio Sampieri , Berta Alegre , Isabel Vilaseca , Simone Balocco , Petia Radeva

Background and Objective:

Oropharynx Squamous Cell Carcinoma (OPSCC) linked to Human Papillomavirus (HPV) exhibits a more favorable prognosis than other squamous cell carcinomas of the upper aerodigestive tract. Finding reliable non-invasive detection methods of this prognostic entity is key to propose appropriate therapeutic decisions. This study aims to provide a comprehensive method based on pre-treatment clinical data for predicting the patient’s HPV status over a large OPSCC patient cohort and employing explainability techniques to interpret the significance and effects of the features.

Materials and Methods:

We employed the RADCURE dataset clinical information to train six Machine Learning algorithms, evaluating them via cross-validation for grid search hyper-parameter tuning and feature selection as well as a final performance measurement on a 20% sample test set. For explainability, SHAP and LIME were used to identify the most relevant relationships and their effect on the predictive model. Furthermore, additional publicly available datasets were scrutinized to compare outcomes and assess the method’s generalization across diverse feature sets and populations.

Results:

The best model yielded an AUC of 0.85, a sensitivity of 0.83, and a specificity of 0.75 over the testing set. The explainability analysis highlighted the remarkable significance of specific clinical attributes, in particular the oropharynx subsite tumor location and the patient’s smoking history. The contribution of each variable to the prediction was substantiated by creating a 95% confidence intervals of model coefficients by means of a 10,000 sample bootstrap and by analyzing top contributors across the best-performing models.

Conclusions:

The combination of specific clinical factors typically collected for OPSCC patients, such as smoking habits and the tumor oropharynx sub-location, along with the ML models hereby presented, can by themselves provide an informed analysis of the HPV status, and of proper use of data science techniques to explain it. Future work should focus on adding other data modalities such as CT scans to enhance performance and to uncover new relations, thus aiding medical practitioners in diagnosing OPSCC more accurately.
背景与目的:与人乳头状瘤病毒(HPV)相关的口咽鳞状细胞癌(OPSCC)表现出比其他上呼吸道鳞状细胞癌更好的预后。寻找可靠的非侵入性检测方法是提出适当治疗决策的关键。本研究旨在提供一种基于治疗前临床数据的综合方法来预测患者的HPV状态,并采用可解释性技术来解释这些特征的意义和影响。材料和方法:我们使用RADCURE数据集临床信息来训练六种机器学习算法,通过网格搜索超参数调整和特征选择的交叉验证来评估它们,并在20%的样本测试集上进行最终性能测量。为了可解释性,我们使用SHAP和LIME来确定最相关的关系及其对预测模型的影响。此外,还仔细检查了其他公开可用的数据集,以比较结果并评估该方法在不同特征集和人群中的泛化性。结果:最佳模型在测试集上的AUC为0.85,灵敏度为0.83,特异性为0.75。可解释性分析强调了特定临床属性的显著意义,特别是口咽部亚位肿瘤的位置和患者的吸烟史。每个变量对预测的贡献是通过创建模型系数的95%置信区间来证实的,方法是通过10,000个样本的自举,并通过分析表现最好的模型中的顶级贡献者。结论:结合OPSCC患者通常收集的特定临床因素,如吸烟习惯和肿瘤口咽亚位,以及本文提出的ML模型,可以单独提供对HPV状态的知情分析,并正确使用数据科学技术来解释它。未来的工作应侧重于增加其他数据模式,如CT扫描,以提高性能和发现新的关系,从而帮助医生更准确地诊断OPSCC。
{"title":"Predictive analysis of clinical features for HPV status in oropharynx squamous cell carcinoma: A machine learning approach with explainability","authors":"Emily Diaz Badilla ,&nbsp;Ignasi Cos ,&nbsp;Claudio Sampieri ,&nbsp;Berta Alegre ,&nbsp;Isabel Vilaseca ,&nbsp;Simone Balocco ,&nbsp;Petia Radeva","doi":"10.1016/j.cmpbup.2024.100170","DOIUrl":"10.1016/j.cmpbup.2024.100170","url":null,"abstract":"<div><h3>Background and Objective:</h3><div>Oropharynx Squamous Cell Carcinoma (OPSCC) linked to Human Papillomavirus (HPV) exhibits a more favorable prognosis than other squamous cell carcinomas of the upper aerodigestive tract. Finding reliable non-invasive detection methods of this prognostic entity is key to propose appropriate therapeutic decisions. This study aims to provide a comprehensive method based on pre-treatment clinical data for predicting the patient’s HPV status over a large OPSCC patient cohort and employing explainability techniques to interpret the significance and effects of the features.</div></div><div><h3>Materials and Methods:</h3><div>We employed the RADCURE dataset clinical information to train six Machine Learning algorithms, evaluating them via cross-validation for grid search hyper-parameter tuning and feature selection as well as a final performance measurement on a 20% sample test set. For explainability, SHAP and LIME were used to identify the most relevant relationships and their effect on the predictive model. Furthermore, additional publicly available datasets were scrutinized to compare outcomes and assess the method’s generalization across diverse feature sets and populations.</div></div><div><h3>Results:</h3><div>The best model yielded an AUC of 0.85, a sensitivity of 0.83, and a specificity of 0.75 over the testing set. The explainability analysis highlighted the remarkable significance of specific clinical attributes, in particular the oropharynx subsite tumor location and the patient’s smoking history. The contribution of each variable to the prediction was substantiated by creating a 95% confidence intervals of model coefficients by means of a 10,000 sample bootstrap and by analyzing top contributors across the best-performing models.</div></div><div><h3>Conclusions:</h3><div>The combination of specific clinical factors typically collected for OPSCC patients, such as smoking habits and the tumor oropharynx sub-location, along with the ML models hereby presented, can by themselves provide an informed analysis of the HPV status, and of proper use of data science techniques to explain it. Future work should focus on adding other data modalities such as CT scans to enhance performance and to uncover new relations, thus aiding medical practitioners in diagnosing OPSCC more accurately.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"7 ","pages":"Article 100170"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143180353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer methods and programs in biomedicine update
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1