Pub Date : 2025-01-01DOI: 10.1016/j.cmpbup.2025.100180
A Z M Ehtesham Chowdhury , Andrew Mehnert , Graham Mann , William H. Morgan , Ferdous Sohel
Optic disc (OD) segmentation from retinal images is crucial for diagnosing, assessing, and tracking the progression of several sight-threatening diseases. This paper presents a deep machine-learning method for semantically segmenting OD from retinal images. The method is named multiscale guided attention network (MSGANet-OD), comprising encoders for extracting multiscale features and decoders for constructing segmentation maps from the extracted features. The decoder also includes a guided attention module that incorporates features related to structural, contextual, and illumination information to segment OD. A custom loss function is proposed to retain the optic disc's geometrical shape (i.e., elliptical) constraint and to alleviate the blood vessels' influence in the overlapping region between the OD and vessels. MSGANet-OD was trained and tested on an in-house clinical color retinal image dataset captured during ophthalmodynamometry as well as on several publicly available color fundus image datasets, e.g., DRISHTI-GS, RIM-ONE-r3, and REFUGE1. Experimental results show that MSGANet-OD achieved superior OD segmentation performance from ophthalmodynamometry images compared to widely used segmentation methods. Our method also achieved competitive results compared to state-of-the-art OD segmentation methods on public datasets. The proposed method can be used in automated systems to quantitatively assess optic nerve head abnormalities (e.g., glaucoma, optic disc neuropathy) and vascular changes in the OD region.
{"title":"Multiscale guided attention network for optic disc segmentation of retinal images","authors":"A Z M Ehtesham Chowdhury , Andrew Mehnert , Graham Mann , William H. Morgan , Ferdous Sohel","doi":"10.1016/j.cmpbup.2025.100180","DOIUrl":"10.1016/j.cmpbup.2025.100180","url":null,"abstract":"<div><div>Optic disc (OD) segmentation from retinal images is crucial for diagnosing, assessing, and tracking the progression of several sight-threatening diseases. This paper presents a deep machine-learning method for semantically segmenting OD from retinal images. The method is named multiscale guided attention network (MSGANet-OD), comprising encoders for extracting multiscale features and decoders for constructing segmentation maps from the extracted features. The decoder also includes a guided attention module that incorporates features related to structural, contextual, and illumination information to segment OD. A custom loss function is proposed to retain the optic disc's geometrical shape (i.e., elliptical) constraint and to alleviate the blood vessels' influence in the overlapping region between the OD and vessels. MSGANet-OD was trained and tested on an in-house clinical color retinal image dataset captured during ophthalmodynamometry as well as on several publicly available color fundus image datasets, e.g., DRISHTI-GS, RIM-ONE-r3, and REFUGE1. Experimental results show that MSGANet-OD achieved superior OD segmentation performance from ophthalmodynamometry images compared to widely used segmentation methods. Our method also achieved competitive results compared to state-of-the-art OD segmentation methods on public datasets. The proposed method can be used in automated systems to quantitatively assess optic nerve head abnormalities (e.g., glaucoma, optic disc neuropathy) and vascular changes in the OD region.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"7 ","pages":"Article 100180"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143179430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cmpbup.2025.100196
Noemi Giordano, Silvia Cannone, Gabriella Balestra, Marco Knaflitz
Goal
The home monitoring of cardiac time intervals reduces hospitalization and mortality of cardiovascular patients. However, a reliable time reference in the electrocardiogram is necessary. Nevertheless, the use of different single leads, typical of wearable devices, impacts the repeatability of the time reference and thus the accuracy of the time-dependent parameters. This work proposes a simple approach to detect the peak and onset of the ventricular depolarization, and demonstrates its lead independence, which makes it suitable for wearable devices even with non-standard leads.
Methods
Our method grounds on an energy-based approach, which we applied on a) a publicly available dataset with standard 12-lead recordings; b) a proof-of-concept dataset including a custom precordial non-standard lead implemented on a wearable device.
Results
Compared against the Pan-Tompkins algorithm, our method reduced the absolute error between each lead and the first standard lead by 26 % to 64 % for the peak, and by 70 % to 82 % for the onset detection. The achieved consistency across leads is compatible with clinical monitoring. The computational time was also reduced by 65 % to 96 %, making the algorithm suitable for use on microcontroller-based wearable devices.
Conclusions
The proposed method enables the identification of a stable reference of the ventricular depolarization regardless of the choice of the lead. The presented results open to the implementation on wearable devices for chronic disease monitoring purposes.
{"title":"Independence on the lead of the identification of the ventricular depolarization in the electrocardiogram in wearable devices","authors":"Noemi Giordano, Silvia Cannone, Gabriella Balestra, Marco Knaflitz","doi":"10.1016/j.cmpbup.2025.100196","DOIUrl":"10.1016/j.cmpbup.2025.100196","url":null,"abstract":"<div><h3>Goal</h3><div>The home monitoring of cardiac time intervals reduces hospitalization and mortality of cardiovascular patients. However, a reliable time reference in the electrocardiogram is necessary. Nevertheless, the use of different single leads, typical of wearable devices, impacts the repeatability of the time reference and thus the accuracy of the time-dependent parameters. This work proposes a simple approach to detect the peak and onset of the ventricular depolarization, and demonstrates its lead independence, which makes it suitable for wearable devices even with non-standard leads.</div></div><div><h3>Methods</h3><div>Our method grounds on an energy-based approach, which we applied on a) a publicly available dataset with standard 12-lead recordings; b) a proof-of-concept dataset including a custom precordial non-standard lead implemented on a wearable device.</div></div><div><h3>Results</h3><div>Compared against the Pan-Tompkins algorithm, our method reduced the absolute error between each lead and the first standard lead by 26 % to 64 % for the peak, and by 70 % to 82 % for the onset detection. The achieved consistency across leads is compatible with clinical monitoring. The computational time was also reduced by 65 % to 96 %, making the algorithm suitable for use on microcontroller-based wearable devices.</div></div><div><h3>Conclusions</h3><div>The proposed method enables the identification of a stable reference of the ventricular depolarization regardless of the choice of the lead. The presented results open to the implementation on wearable devices for chronic disease monitoring purposes.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"8 ","pages":"Article 100196"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144230049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study investigates the finite-time stability of fractional-order (FO) discrete Susceptible–Infected–Recovered (SIR) models for COVID-19, incorporating memory effects to capture real-world epidemic dynamics. We use discrete fractional calculus to analyze the stability of disease-free and pandemic equilibrium points. The theoretical framework introduces essential definitions, finite-time stability (FTS) criteria, and novel fractional-order modeling insights. Numerical simulations validate the theoretical results under various parameters, demonstrating the finite-time convergence to equilibrium states. Results highlight the flexibility of FO models in addressing delayed responses and prolonged effects, offering enhanced predictive accuracy over traditional integer-order approaches. This research contributes to the design of effective public health interventions and advances in mathematical epidemiology.
{"title":"On finite-time stability of some COVID-19 models using fractional discrete calculus","authors":"Shaher Momani , Iqbal M. Batiha , Issam Bendib , Abeer Al-Nana , Adel Ouannas , Mohamed Dalah","doi":"10.1016/j.cmpbup.2025.100188","DOIUrl":"10.1016/j.cmpbup.2025.100188","url":null,"abstract":"<div><div>This study investigates the finite-time stability of fractional-order (FO) discrete Susceptible–Infected–Recovered (SIR) models for COVID-19, incorporating memory effects to capture real-world epidemic dynamics. We use discrete fractional calculus to analyze the stability of disease-free and pandemic equilibrium points. The theoretical framework introduces essential definitions, finite-time stability (FTS) criteria, and novel fractional-order modeling insights. Numerical simulations validate the theoretical results under various parameters, demonstrating the finite-time convergence to equilibrium states. Results highlight the flexibility of FO models in addressing delayed responses and prolonged effects, offering enhanced predictive accuracy over traditional integer-order approaches. This research contributes to the design of effective public health interventions and advances in mathematical epidemiology.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"7 ","pages":"Article 100188"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visualization of virtual resections plays a central role in computer-assisted liver surgery planning. However, the intricate liver anatomical information often results in occlusions and visualization information clutter, which can lead to inaccuracies in virtual resections. To overcome these challenges, we introduce Resectograms, which are planar (2D) representations of virtual resections enabling the visualization of information associated with the surgical plan.
Methods:
Resectograms are computed in real-time and displayed as additional 2D views showing anatomical, functional, and risk-associated information extracted from the 3D virtual resection as this is modified during planning, offering surgeons an occlusion-free visualization of the virtual resection during surgery planning. To further improve functionality, we explored three flattening methods: fixed-shape, Least Squares Conformal Maps, and As-Rigid-As-Possible, to generate these 2D views. Additionally, we optimized GPU memory usage by downsampling texture objects, ensuring errors remain within acceptable limits as defined by surgeons.
Results:
We evaluated Resectograms with experienced surgeons (n = 4, 9-15 years) and assessed 2D flattening methods with computer and biomedical scientists (n = 11) through visual experiments. Surgeons found Resectograms valuable for enhancing surgical planning effectiveness and accuracy. Among flattening methods, Least Squares Conformal Maps and As-Rigid-As-Possible techniques demonstrated similarly low distortion levels, superior to the fixed-shape approach. Our analysis of texture object downsampling revealed effectiveness for liver and tumor segmentations, but less so for vessel segmentations.
Conclusions:
This paper presents Resectograms, a novel method for visualizing liver virtual resection plans in 2D, offering an intuitive, occlusion-free representation computable in real-time. Resectograms incorporate multiple information layers, providing comprehensive data for liver surgery planning. We enhanced the visualization through improved 3D-to-2D orientation mapping and distortion-minimizing parameterization algorithms. This research contributes to advancing liver surgery planning tools by offering a more accessible and informative visualization method. The code repository for this work is available at: https://github.com/ALive-research/Slicer-Liver.
{"title":"Resectograms: Planning liver surgery with real-time occlusion-free visualization of virtual resections","authors":"Ruoyan Meng , Davit Aghayan , Egidijus Pelanis , Bjørn Edwin , Faouzi Alaya Cheikh , Rafael Palomar","doi":"10.1016/j.cmpbup.2025.100186","DOIUrl":"10.1016/j.cmpbup.2025.100186","url":null,"abstract":"<div><h3>Background and Objective:</h3><div>Visualization of virtual resections plays a central role in computer-assisted liver surgery planning. However, the intricate liver anatomical information often results in occlusions and visualization information clutter, which can lead to inaccuracies in virtual resections. To overcome these challenges, we introduce <em>Resectograms</em>, which are planar (2D) representations of virtual resections enabling the visualization of information associated with the surgical plan.</div></div><div><h3>Methods:</h3><div>Resectograms are computed in real-time and displayed as additional 2D views showing anatomical, functional, and risk-associated information extracted from the 3D virtual resection as this is modified during planning, offering surgeons an occlusion-free visualization of the virtual resection during surgery planning. To further improve functionality, we explored three flattening methods: fixed-shape, Least Squares Conformal Maps, and As-Rigid-As-Possible, to generate these 2D views. Additionally, we optimized GPU memory usage by downsampling texture objects, ensuring errors remain within acceptable limits as defined by surgeons.</div></div><div><h3>Results:</h3><div>We evaluated Resectograms with experienced surgeons (n = 4, 9-15 years) and assessed 2D flattening methods with computer and biomedical scientists (n = 11) through visual experiments. Surgeons found Resectograms valuable for enhancing surgical planning effectiveness and accuracy. Among flattening methods, Least Squares Conformal Maps and As-Rigid-As-Possible techniques demonstrated similarly low distortion levels, superior to the fixed-shape approach. Our analysis of texture object downsampling revealed effectiveness for liver and tumor segmentations, but less so for vessel segmentations.</div></div><div><h3>Conclusions:</h3><div>This paper presents Resectograms, a novel method for visualizing liver virtual resection plans in 2D, offering an intuitive, occlusion-free representation computable in real-time. Resectograms incorporate multiple information layers, providing comprehensive data for liver surgery planning. We enhanced the visualization through improved 3D-to-2D orientation mapping and distortion-minimizing parameterization algorithms. This research contributes to advancing liver surgery planning tools by offering a more accessible and informative visualization method. The code repository for this work is available at: <span><span>https://github.com/ALive-research/Slicer-Liver</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"7 ","pages":"Article 100186"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143518926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cmpbup.2025.100182
Deepak Kumar , Chaman Verma , Zoltán Illés
Background and Objective:
Early detection of eye diseases, especially cataracts, is essential for preventing vision impairment. Accurate and cost-effective cataract diagnosis often requires advanced methods. This study proposes novel deep learning models that integrate global and local attention mechanisms into MobileNet and InceptionV3 architectures to improve cataract detection from fundus images.
Methods:
Two deep learning models, Global–Local Attention Augmented MobileNet (GLAAM) and Global–Local Attention Augmented InceptionV3 (GLAAI), were developed to enhance the analysis of fundus images. The models incorporate a combined attention mechanism to effectively capture deteriorated regions in retinal images. Data augmentation techniques were employed to prevent overfitting during training and testing on two cataract datasets. Additionally, Grad-CAM visualizations were used to increase interpretability by highlighting key regions influencing predictions.
Results:
The GLAAM model achieved a balanced accuracy of 97.08%, an average precision of 97.11%, and an F1-score of 97.12% on the retinal dataset. Grad-CAM visualizations confirmed the models’ ability to identify crucial cataract-related regions in fundus images.
Conclusion:
This study demonstrates a significant advancement in cataract diagnosis using deep learning, with GLAAM and GLAAI models exhibiting strong diagnostic performance. These models have the potential to enhance diagnostic tools and improve patient care by offering a cost-effective and accurate solution for cataract detection, suitable for integration into clinical settings.
{"title":"GLAAM and GLAAI: Pioneering attention models for robust automated cataract detection","authors":"Deepak Kumar , Chaman Verma , Zoltán Illés","doi":"10.1016/j.cmpbup.2025.100182","DOIUrl":"10.1016/j.cmpbup.2025.100182","url":null,"abstract":"<div><h3>Background and Objective:</h3><div>Early detection of eye diseases, especially cataracts, is essential for preventing vision impairment. Accurate and cost-effective cataract diagnosis often requires advanced methods. This study proposes novel deep learning models that integrate global and local attention mechanisms into MobileNet and InceptionV3 architectures to improve cataract detection from fundus images.</div></div><div><h3>Methods:</h3><div>Two deep learning models, Global–Local Attention Augmented MobileNet (GLAAM) and Global–Local Attention Augmented InceptionV3 (GLAAI), were developed to enhance the analysis of fundus images. The models incorporate a combined attention mechanism to effectively capture deteriorated regions in retinal images. Data augmentation techniques were employed to prevent overfitting during training and testing on two cataract datasets. Additionally, Grad-CAM visualizations were used to increase interpretability by highlighting key regions influencing predictions.</div></div><div><h3>Results:</h3><div>The GLAAM model achieved a balanced accuracy of 97.08%, an average precision of 97.11%, and an F1-score of 97.12% on the retinal dataset. Grad-CAM visualizations confirmed the models’ ability to identify crucial cataract-related regions in fundus images.</div></div><div><h3>Conclusion:</h3><div>This study demonstrates a significant advancement in cataract diagnosis using deep learning, with GLAAM and GLAAI models exhibiting strong diagnostic performance. These models have the potential to enhance diagnostic tools and improve patient care by offering a cost-effective and accurate solution for cataract detection, suitable for integration into clinical settings.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"7 ","pages":"Article 100182"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143474173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cmpbup.2025.100191
Bambang Krismono Triwijoyo, Ahmat Adil, Muhammad Zulfikri
Background
The issue is that most heart attacks and strokes happen unexpectedly to people who have signs of high blood pressure that are not identified in time for treatment. These gap factors make the research on hypertensive retinopathy urgent since it requires an early detection model to improve treatment accuracy and prevent heart attacks and strokes before they happen.
Methods
This research utilizes secondary data, specifically a retinal image dataset from the open-source Messidor database. This database comprises 1200 retinal images, each measuring 1440 × 940 pixels. The dataset is divided into 60 % training and 40 % validation data. The next step is the image analysis process, which involves extracting retinal blood vessels using the Otsu segmentation algorithm. A Morphological Approach is used to obtain comprehensive features of the blood vessels around the Optic Disc (OD). This stage aims to extract and sample the comparison between the width of the artery and vein (AVR). This research uses a Deep Convolutional Neural Network (DCNN) classification model with cross-validation training using the Leave-one-out method.
Results
The results of testing the model with nine output classes, the features extracted in each convolutional layer, the second layer successfully extracts the retina and eye blood vessels, the third layer extracts the retinal image texture, and the fourth layer extracts hard exudates, hemorrhages, and cotton wool spots. Meanwhile, the Specificity, Recall, Accuracy, and F-Score results are 90 %, 81.82 %, 90 %, and 90 %, respectively.
Conclusions
This research's findings first include applying the AVR calculation algorithm to build a new dataset with 9 class categories. Second, the architectural specifications of the CNN model are determined, and the input size, depth, and number of nodes for each layer, as well as the transfer function, learning rate, and number of epochs, are set by adjusting hyperparameters.
{"title":"Detection and classification of hypertensive retinopathy based on retinal image analysis using a deep learning approach","authors":"Bambang Krismono Triwijoyo, Ahmat Adil, Muhammad Zulfikri","doi":"10.1016/j.cmpbup.2025.100191","DOIUrl":"10.1016/j.cmpbup.2025.100191","url":null,"abstract":"<div><h3>Background</h3><div>The issue is that most heart attacks and strokes happen unexpectedly to people who have signs of high blood pressure that are not identified in time for treatment. These gap factors make the research on hypertensive retinopathy urgent since it requires an early detection model to improve treatment accuracy and prevent heart attacks and strokes before they happen.</div></div><div><h3>Methods</h3><div>This research utilizes secondary data, specifically a retinal image dataset from the open-source Messidor database. This database comprises 1200 retinal images, each measuring 1440 × 940 pixels. The dataset is divided into 60 % training and 40 % validation data. The next step is the image analysis process, which involves extracting retinal blood vessels using the Otsu segmentation algorithm. A Morphological Approach is used to obtain comprehensive features of the blood vessels around the Optic Disc (OD). This stage aims to extract and sample the comparison between the width of the artery and vein (AVR). This research uses a Deep Convolutional Neural Network (DCNN) classification model with cross-validation training using the Leave-one-out method.</div></div><div><h3>Results</h3><div>The results of testing the model with nine output classes, the features extracted in each convolutional layer, the second layer successfully extracts the retina and eye blood vessels, the third layer extracts the retinal image texture, and the fourth layer extracts hard exudates, hemorrhages, and cotton wool spots. Meanwhile, the Specificity, Recall, Accuracy, and F-Score results are 90 %, 81.82 %, 90 %, and 90 %, respectively.</div></div><div><h3>Conclusions</h3><div>This research's findings first include applying the AVR calculation algorithm to build a new dataset with 9 class categories. Second, the architectural specifications of the CNN model are determined, and the input size, depth, and number of nodes for each layer, as well as the transfer function, learning rate, and number of epochs, are set by adjusting hyperparameters.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"7 ","pages":"Article 100191"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143941186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cmpbup.2025.100195
Van-Thuan Tran, Ting-Hao You, Wei-Ho Tsai
Objectives
This study presents an improved approach to person identification (PID) using nonverbal vocalizations, focusing specifically on cough sounds as a biometric modality. While recent works have demonstrated the feasibility of cough-based PID (CPID), most report accuracies around 80–90 % and could face limitations in terms of model efficiency, generalization, or robustness. Our objective is to advance CPID performance through compact model design and more effective training strategies.
Methods
We collected a custom dataset from 19 subjects and developed a lightweight yet effective deep learning framework for CPID. The proposed architecture, CoughCueNet, is a convolutional recurrent neural network designed to capture both spatial and temporal patterns in cough sounds. The training process incorporates a hybrid loss function that combines supervised contrastive (SC) learning and cross-entropy (CE) loss to enhance feature discrimination. We systematically evaluated multiple acoustic representations, including MFCCs and spectrograms, to identify the most suitable features. We also applied data augmentation for robustness and investigated cross-modal transferability by testing speech-trained models on cough data.
Results
Our CPID system achieved a mean identification accuracy of 97.18 %. Training the proposed CoughCueNet using a hybrid SC+CE loss function consistently improved model generalization and robustness. It outperformed the same network and larger-capacity networks (i.e., VGG16 and ResNet50) trained with CE loss alone, which achieved accuracies around 90 %. Among the tested features, MFCCs yielded superior identification performance over spectrograms. Experiments with speech-trained models tested on coughs revealed limited cross-vocal transferability, emphasizing the need for cough-specific models.
Conclusion
This work advances the state of cough-based PID by demonstrating that high-accuracy identification is achievable using compact models and hybrid training strategies. It establishes cough sounds as a practical and distinctive biometric modality, with promising applications in security, user authentication, and health monitoring, particularly in environments where speech-based systems are less reliable or infeasible.
{"title":"Acoustic cues for person identification using cough sounds","authors":"Van-Thuan Tran, Ting-Hao You, Wei-Ho Tsai","doi":"10.1016/j.cmpbup.2025.100195","DOIUrl":"10.1016/j.cmpbup.2025.100195","url":null,"abstract":"<div><h3>Objectives</h3><div>This study presents an improved approach to person identification (PID) using nonverbal vocalizations, focusing specifically on cough sounds as a biometric modality. While recent works have demonstrated the feasibility of cough-based PID (CPID), most report accuracies around 80–90 % and could face limitations in terms of model efficiency, generalization, or robustness. Our objective is to advance CPID performance through compact model design and more effective training strategies.</div></div><div><h3>Methods</h3><div>We collected a custom dataset from 19 subjects and developed a lightweight yet effective deep learning framework for CPID. The proposed architecture, CoughCueNet, is a convolutional recurrent neural network designed to capture both spatial and temporal patterns in cough sounds. The training process incorporates a hybrid loss function that combines supervised contrastive (SC) learning and cross-entropy (CE) loss to enhance feature discrimination. We systematically evaluated multiple acoustic representations, including MFCCs and spectrograms, to identify the most suitable features. We also applied data augmentation for robustness and investigated cross-modal transferability by testing speech-trained models on cough data.</div></div><div><h3>Results</h3><div>Our CPID system achieved a mean identification accuracy of 97.18 %. Training the proposed CoughCueNet using a hybrid SC+CE loss function consistently improved model generalization and robustness. It outperformed the same network and larger-capacity networks (i.e., VGG16 and ResNet50) trained with CE loss alone, which achieved accuracies around 90 %. Among the tested features, MFCCs yielded superior identification performance over spectrograms. Experiments with speech-trained models tested on coughs revealed limited cross-vocal transferability, emphasizing the need for cough-specific models.</div></div><div><h3>Conclusion</h3><div>This work advances the state of cough-based PID by demonstrating that high-accuracy identification is achievable using compact models and hybrid training strategies. It establishes cough sounds as a practical and distinctive biometric modality, with promising applications in security, user authentication, and health monitoring, particularly in environments where speech-based systems are less reliable or infeasible.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"8 ","pages":"Article 100195"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144230050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cmpbup.2025.100211
Wiley Tam , Paul Babyn , Javad Alirezaie
Lung diseases remain a leading cause of mortality worldwide, as evidenced by statistics from the World Health Organization (WHO). The limited availability of radiologists to interpret Chest X-ray (CXR) images for diagnosing common lung conditions poses a significant challenge, often resulting in delayed diagnosis and treatment. In response, Computer-Aided Diagnostic (CAD) tools can be used to potentially streamline and expedite the diagnostic process. Recently, deep learning techniques have gained prominence in the automated analysis of CXR images, particularly in segmenting lung regions as a critical preliminary step. This study aims to develop and evaluate a lung segmentation model based on a modified U-Net architecture. The architecture leverages techniques such as transfer learning with DenseNet201 as a feature extractor alongside dilated convolutions and residual blocks. An ablation study was conducted to evaluate these architectural components, along with additional elements like augmented data, alternative backbones, and attention mechanisms. Numerous and extensive experiments were performed on two publicly available datasets, the Montgomery County (MC) and Shenzhen Hospital (SH) datasets, to validate the efficacy of these techniques on segmentation performance. Outperforming other state-of-the-art methods on the MC dataset, the proposed model achieved a Jaccard Index (IoU) of 97.77 and a Dice Similarity Coefficient (DSC) of 98.87. These results represent a significant improvement over the baseline U-Net, with gains of 3.37% and 1.75% in IoU and DSC, respectively. These findings highlight the importance of architectural enhancements in deep learning-based lung segmentation models, contributing to more efficient, accurate, and reliable CAD systems for lung disease assessment.
世界卫生组织(世卫组织)的统计数据证明,肺部疾病仍然是全世界死亡的主要原因。放射科医生解释胸部x光片(CXR)图像以诊断常见肺部疾病的有限可用性构成了重大挑战,经常导致诊断和治疗延迟。因此,可以使用计算机辅助诊断(CAD)工具来简化和加快诊断过程。最近,深度学习技术在CXR图像的自动分析中获得了突出的地位,特别是在分割肺区域作为关键的初步步骤方面。本研究旨在开发和评估一个基于改进U-Net架构的肺分割模型。该架构利用了DenseNet201的迁移学习等技术,作为扩展卷积和残差块的特征提取器。我们进行了一项消融研究,以评估这些架构组件,以及其他元素,如增强数据、替代主干和注意机制。在蒙哥马利县(MC)和深圳医院(SH)两个公开可用的数据集上进行了大量广泛的实验,以验证这些技术在分割性能方面的有效性。该模型在MC数据集上的表现优于其他最先进的方法,其Jaccard Index (IoU)为97.77,Dice Similarity Coefficient (DSC)为98.87。这些结果表明,与基线U-Net相比,IoU和DSC分别增加了3.37%和1.75%。这些发现强调了基于深度学习的肺分割模型的架构增强的重要性,有助于更有效、准确和可靠的肺部疾病评估CAD系统。
{"title":"Robust lung segmentation in Chest X-ray images using modified U-Net with deeper network and residual blocks","authors":"Wiley Tam , Paul Babyn , Javad Alirezaie","doi":"10.1016/j.cmpbup.2025.100211","DOIUrl":"10.1016/j.cmpbup.2025.100211","url":null,"abstract":"<div><div>Lung diseases remain a leading cause of mortality worldwide, as evidenced by statistics from the World Health Organization (WHO). The limited availability of radiologists to interpret Chest X-ray (CXR) images for diagnosing common lung conditions poses a significant challenge, often resulting in delayed diagnosis and treatment. In response, Computer-Aided Diagnostic (CAD) tools can be used to potentially streamline and expedite the diagnostic process. Recently, deep learning techniques have gained prominence in the automated analysis of CXR images, particularly in segmenting lung regions as a critical preliminary step. This study aims to develop and evaluate a lung segmentation model based on a modified U-Net architecture. The architecture leverages techniques such as transfer learning with DenseNet201 as a feature extractor alongside dilated convolutions and residual blocks. An ablation study was conducted to evaluate these architectural components, along with additional elements like augmented data, alternative backbones, and attention mechanisms. Numerous and extensive experiments were performed on two publicly available datasets, the Montgomery County (MC) and Shenzhen Hospital (SH) datasets, to validate the efficacy of these techniques on segmentation performance. Outperforming other state-of-the-art methods on the MC dataset, the proposed model achieved a Jaccard Index (IoU) of 97.77 and a Dice Similarity Coefficient (DSC) of 98.87. These results represent a significant improvement over the baseline U-Net, with gains of 3.37% and 1.75% in IoU and DSC, respectively. These findings highlight the importance of architectural enhancements in deep learning-based lung segmentation models, contributing to more efficient, accurate, and reliable CAD systems for lung disease assessment.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"8 ","pages":"Article 100211"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144780982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.cmpbup.2025.100193
Suleiman Daoud , Ahmad Nasayreh , Khalid M.O. Nahar , Wlla k. Abedalaziz , Salem M. Alayasreh , Hasan Gharaibeh , Ayah Bashkami , Amer Jaradat , Sultan Jarrar , Hammam Al-Hawamdeh , Absalom E. Ezugwu , Raed Abu Zitar , Aseel Smerat , Vaclav Snasel , Laith Abualigah
A brain tumor, one of the deadliest disorders, is characterized by the abnormal growth of synapses in the brain. Early detection can improve brain tumor diagnosis, and accurate diagnosis is essential for effective treatment. Researchers have developed several deep-learning classification methods to diagnose brain tumors. Moreover, these types of tumorscan significantly impair physical activity, presenting a broad spectrum of symptoms. As a result, each patient requires an individualized physical therapy treatment plan tailored to their specific needs. However, some challenges remain, including the need for a competent expert in classifying brain tumors using deep learning models, as well as the challenge of creating the most accurate deep learning model for brain tumor classification. To address these challenges, we present a highly accurate and efficient methodology based on advanced metaheuristic algorithms and deep learning. To identify different types of pediatric brain tumors, we specifically develop an optimal residual learning architecture. We also present the Spider Wasp Optimization (SWO) algorithm, which aims to improve performance by feature selection. The algorithm enhances the effectiveness of optimization by balancing the speed of convergence and diversity of solutions. We first convert the algorithm from continuous to binary, combine it with the K-Nearest Neighbor (KNN) algorithm for classification, and evaluate it on a dataset of brain MRI images collected from King Abdullah Hospital. Our analysis revealed that in terms of metrics such as accuracy, sensitivity, specificity, and f1-score, it outperformed other conventional algorithms. We demonstrate the overall effectiveness of the proposed model by using it to select the optimal features extracted from the Resnet50V2 model for pediatric brain tumor detection. We compared the proposed SWO+KNN model with other deep learning architectures such as MobileNetV2, Resnet50V2, and machine learning algorithms such as KNN, Support Vector Machine SVM, and Random Forest (RF). The experimental results indicate that the proposed SWO+KNN model outperforms other well-established deep learning models and previous studies. SWO+KNN achieved accuracy rates of 97.5 % and 95.5 % for both binary classification and multiclass classification, respectively. The results clearly demonstrate the ability of the proposed SWO+KNN model to accurately classify brain tumors.
{"title":"A novel deep learning-based spider wasp optimization approach for enhancing brain tumor detection and physical therapy prediction","authors":"Suleiman Daoud , Ahmad Nasayreh , Khalid M.O. Nahar , Wlla k. Abedalaziz , Salem M. Alayasreh , Hasan Gharaibeh , Ayah Bashkami , Amer Jaradat , Sultan Jarrar , Hammam Al-Hawamdeh , Absalom E. Ezugwu , Raed Abu Zitar , Aseel Smerat , Vaclav Snasel , Laith Abualigah","doi":"10.1016/j.cmpbup.2025.100193","DOIUrl":"10.1016/j.cmpbup.2025.100193","url":null,"abstract":"<div><div>A brain tumor, one of the deadliest disorders, is characterized by the abnormal growth of synapses in the brain. Early detection can improve brain tumor diagnosis, and accurate diagnosis is essential for effective treatment. Researchers have developed several deep-learning classification methods to diagnose brain tumors. Moreover, these types of tumorscan significantly impair physical activity, presenting a broad spectrum of symptoms. As a result, each patient requires an individualized physical therapy treatment plan tailored to their specific needs. However, some challenges remain, including the need for a competent expert in classifying brain tumors using deep learning models, as well as the challenge of creating the most accurate deep learning model for brain tumor classification. To address these challenges, we present a highly accurate and efficient methodology based on advanced metaheuristic algorithms and deep learning. To identify different types of pediatric brain tumors, we specifically develop an optimal residual learning architecture. We also present the Spider Wasp Optimization (SWO) algorithm, which aims to improve performance by feature selection. The algorithm enhances the effectiveness of optimization by balancing the speed of convergence and diversity of solutions. We first convert the algorithm from continuous to binary, combine it with the K-Nearest Neighbor (KNN) algorithm for classification, and evaluate it on a dataset of brain MRI images collected from King Abdullah Hospital. Our analysis revealed that in terms of metrics such as accuracy, sensitivity, specificity, and f1-score, it outperformed other conventional algorithms. We demonstrate the overall effectiveness of the proposed model by using it to select the optimal features extracted from the Resnet50V2 model for pediatric brain tumor detection. We compared the proposed SWO+KNN model with other deep learning architectures such as MobileNetV2, Resnet50V2, and machine learning algorithms such as KNN, Support Vector Machine SVM, and Random Forest (RF). The experimental results indicate that the proposed SWO+KNN model outperforms other well-established deep learning models and previous studies. SWO+KNN achieved accuracy rates of 97.5 % and 95.5 % for both binary classification and multiclass classification, respectively. The results clearly demonstrate the ability of the proposed SWO+KNN model to accurately classify brain tumors.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"7 ","pages":"Article 100193"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144169362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning is a powerful technique for analysing datasets and making data-driven recommendations. However, in general, the performance of machine learning in recognising patterns is proportional to the size of the dataset. On the other hand, in some cases, such as in the medical field, providing an instance of a dataset takes a lot of work and budget. Therefore, additional data acquisition techniques are needed to increase data size and improve model quality.
This study applied Data Augmentation and Transfer Learning to solve small-scale dataset problems in analyzing stroke patient information in The Banyumas Regional General Hospital (RSUD Banyumas). The information is utilized to predict the patient's status when discharged from the hospital. The research compared the prediction accuracy from three solutions: Data Augmentation, Transfer Learning, and the mixing of both methods. The classification models employed in this study were four algorithms: Random Forest, Support Vector Machine, Gradient Boosting, and Extreme Gradient Boosting. We implemented the Synthetic Minority Over-sampling Technique for Nominal and Continuous to generate the artificial dataset. In the Transfer Learning process, we used a benchmark stroke dataset with a different target than ours, so we labelled it based on the nearest neighbours of the original dataset. Applying Data Augmentation in this study is a good decision because it leads to better performance than using only the original dataset. However, implementing the Transfer Learning technique does not give a satisfying result for XGBoost and SVM. Mixing Data Augmentation and Transfer Learning provides the best performance with accuracy and recall, both 0.813, the precision of 0.853497, and the F-1 score of 0.826628 given by the Random Forest model. The research can contribute significantly to developing better classification models so physicians can obtain more accurate information and help treat stroke cases more effectively and efficiently.
{"title":"Enhancing stroke prediction models: A mixing of data augmentation and transfer learning for small-scale dataset in machine learning","authors":"Imam Tahyudin , Ade Nurhopipah , Ades Tikaningsih , Puji Lestari , Yaya Suryana , Edi Winarko , Eko Winarto , Nazwan Haza , Hidetaka Nambo","doi":"10.1016/j.cmpbup.2025.100198","DOIUrl":"10.1016/j.cmpbup.2025.100198","url":null,"abstract":"<div><div>Machine learning is a powerful technique for analysing datasets and making data-driven recommendations. However, in general, the performance of machine learning in recognising patterns is proportional to the size of the dataset. On the other hand, in some cases, such as in the medical field, providing an instance of a dataset takes a lot of work and budget. Therefore, additional data acquisition techniques are needed to increase data size and improve model quality.</div><div>This study applied Data Augmentation and Transfer Learning to solve small-scale dataset problems in analyzing stroke patient information in The Banyumas Regional General Hospital (RSUD Banyumas). The information is utilized to predict the patient's status when discharged from the hospital. The research compared the prediction accuracy from three solutions: Data Augmentation, Transfer Learning, and the mixing of both methods. The classification models employed in this study were four algorithms: Random Forest, Support Vector Machine, Gradient Boosting, and Extreme Gradient Boosting. We implemented the Synthetic Minority Over-sampling Technique for Nominal and Continuous to generate the artificial dataset. In the Transfer Learning process, we used a benchmark stroke dataset with a different target than ours, so we labelled it based on the nearest neighbours of the original dataset. Applying Data Augmentation in this study is a good decision because it leads to better performance than using only the original dataset. However, implementing the Transfer Learning technique does not give a satisfying result for XGBoost and SVM. Mixing Data Augmentation and Transfer Learning provides the best performance with accuracy and recall, both 0.813, the precision of 0.853497, and the F-1 score of 0.826628 given by the Random Forest model. The research can contribute significantly to developing better classification models so physicians can obtain more accurate information and help treat stroke cases more effectively and efficiently.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"8 ","pages":"Article 100198"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144500858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}