Pub Date : 2025-03-06DOI: 10.1016/j.compbiomed.2025.109916
Hatice Catal Reis , Veysel Turk
Cancer is a severe threat to public health. Early diagnosis of disease is critical, but the lack of experts in this field, the personal assessment process, the clinical workload, and the high level of similarity in disease classes make it difficult. In recent years, deep learning-based artificial intelligence models have shown promise, with the potential to increase diagnosis speed and accuracy. These models attract attention with their automatic learning and adaptation capabilities. In this study, the deep learning-based PADBSRNet model and the PADBSRNet-Vision Transformer (ViT) hybrid method are proposed for the detection of brain tumors and skin and lung cancers. PADBSRNet is a comprehensive deep neural network architecture that integrates separable and traditional convolution layers, multiple attention mechanisms, bidirectional recurrent neural networks, and cross-connections/multi-stage feature fusion strategies. This architecture offers significant advantages in terms of effectively extracting local-global, contextual features and accurately modeling long-term dependencies in image classification tasks. The second proposed approach developed a hybrid method that combines the advantages of the PADBSRNet model and the ViT model. Experimental analysis on medical datasets such as the Figshare Brain Tumor Dataset, IQ-OTH/NCCD Dataset, and Skin Cancer: Malignant vs. Benign Dataset has evaluated the proposed models' performances. According to the experimental results, the PADBSRNet model has shown successful performance with 95.24 %, 99.55 %, and 88.61 % accuracy rates, respectively. The experimental findings show that the proposed deep learning model can effectively learn the complex relationships and hidden patterns of cancer disease, thus producing applicable and effective results in cancer diagnosis.
{"title":"A multi-stage fusion deep learning framework merging local patterns with attention-driven contextual dependencies for cancer detection","authors":"Hatice Catal Reis , Veysel Turk","doi":"10.1016/j.compbiomed.2025.109916","DOIUrl":"10.1016/j.compbiomed.2025.109916","url":null,"abstract":"<div><div>Cancer is a severe threat to public health. Early diagnosis of disease is critical, but the lack of experts in this field, the personal assessment process, the clinical workload, and the high level of similarity in disease classes make it difficult. In recent years, deep learning-based artificial intelligence models have shown promise, with the potential to increase diagnosis speed and accuracy. These models attract attention with their automatic learning and adaptation capabilities. In this study, the deep learning-based PADBSRNet model and the PADBSRNet-Vision Transformer (ViT) hybrid method are proposed for the detection of brain tumors and skin and lung cancers. PADBSRNet is a comprehensive deep neural network architecture that integrates separable and traditional convolution layers, multiple attention mechanisms, bidirectional recurrent neural networks, and cross-connections/multi-stage feature fusion strategies. This architecture offers significant advantages in terms of effectively extracting local-global, contextual features and accurately modeling long-term dependencies in image classification tasks. The second proposed approach developed a hybrid method that combines the advantages of the PADBSRNet model and the ViT model. Experimental analysis on medical datasets such as the Figshare Brain Tumor Dataset, IQ-OTH/NCCD Dataset, and Skin Cancer: Malignant vs. Benign Dataset has evaluated the proposed models' performances. According to the experimental results, the PADBSRNet model has shown successful performance with 95.24 %, 99.55 %, and 88.61 % accuracy rates, respectively. The experimental findings show that the proposed deep learning model can effectively learn the complex relationships and hidden patterns of cancer disease, thus producing applicable and effective results in cancer diagnosis.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"189 ","pages":"Article 109916"},"PeriodicalIF":7.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143549227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-06DOI: 10.1016/j.compbiomed.2025.109962
Javed Aalam, Syed Naseer Ahmad Shah, Rafat Parveen
Infectious diseases, including tuberculosis (TB), HIV/AIDS, and emerging pathogens like COVID-19 pose severe global health challenges due to their rapid spread and significant morbidity and mortality rates. Next-generation sequencing (NGS) and machine learning (ML) have emerged as transformative technologies for enhancing disease diagnosis and management.
Objective
This review aims to explore integrating ML techniques with NGS for diagnosing infectious diseases, highlighting their effectiveness and identifying existing challenges.
Methods
A comprehensive literature review spanning the past decade was conducted using reputable databases, including IEEE Xplore, PubMed, Scopus, SpringerLink, and Science Direct. Research papers, articles, and conference proceedings meeting stringent quality criteria were analysed to assess the performance of ML algorithms applied to NGS and metagenomic NGS (mNGS) data.
Results
The findings reveal that ML algorithms, such as deep neural networks (DNNs), support vector machines (SVM), and K-nearest neighbours (KNN), achieve high accuracy rates, often exceeding 95 %, in diagnosing infectious diseases. Deep learning methods excel in genomic and metagenomic data analysis, while traditional algorithms like Gaussian mixture models (GMM) also demonstrate robust classification capabilities. Challenges include reliance on single data types and difficulty distinguishing closely related pathogens.
Conclusion
The integration of ML and NGS significantly advances infectious disease diagnosis, offering rapid and precise detection capabilities. Addressing current limitations can further enhance the effectiveness of these technologies, ultimately improving global public health outcomes.
{"title":"An extensive review on infectious disease diagnosis using machine learning techniques and next generation sequencing: State-of-the-art and perspectives","authors":"Javed Aalam, Syed Naseer Ahmad Shah, Rafat Parveen","doi":"10.1016/j.compbiomed.2025.109962","DOIUrl":"10.1016/j.compbiomed.2025.109962","url":null,"abstract":"<div><div>Infectious diseases, including tuberculosis (TB), HIV/AIDS, and emerging pathogens like COVID-19 pose severe global health challenges due to their rapid spread and significant morbidity and mortality rates. Next-generation sequencing (NGS) and machine learning (ML) have emerged as transformative technologies for enhancing disease diagnosis and management.</div></div><div><h3>Objective</h3><div>This review aims to explore integrating ML techniques with NGS for diagnosing infectious diseases, highlighting their effectiveness and identifying existing challenges.</div></div><div><h3>Methods</h3><div>A comprehensive literature review spanning the past decade was conducted using reputable databases, including IEEE Xplore, PubMed, Scopus, SpringerLink, and Science Direct. Research papers, articles, and conference proceedings meeting stringent quality criteria were analysed to assess the performance of ML algorithms applied to NGS and metagenomic NGS (mNGS) data.</div></div><div><h3>Results</h3><div>The findings reveal that ML algorithms, such as deep neural networks (DNNs), support vector machines (SVM), and K-nearest neighbours (KNN), achieve high accuracy rates, often exceeding 95 %, in diagnosing infectious diseases. Deep learning methods excel in genomic and metagenomic data analysis, while traditional algorithms like Gaussian mixture models (GMM) also demonstrate robust classification capabilities. Challenges include reliance on single data types and difficulty distinguishing closely related pathogens.</div></div><div><h3>Conclusion</h3><div>The integration of ML and NGS significantly advances infectious disease diagnosis, offering rapid and precise detection capabilities. Addressing current limitations can further enhance the effectiveness of these technologies, ultimately improving global public health outcomes.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"189 ","pages":"Article 109962"},"PeriodicalIF":7.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143549230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-06DOI: 10.1016/j.compbiomed.2025.109928
Sagnik De, Prithwijit Mukherjee, Anisha Halder Roy
Low Back Pain (LBP) is the most prevalent musculoskeletal condition worldwide and a leading cause of disability, significantly affecting mobility, work productivity, and overall quality of life. Due to its high prevalence and substantial economic burden, LBP presents a critical global public health challenge that demands innovative diagnostic and therapeutic solutions. This study introduces a novel deep-learning approach for diagnosing LBP intensity using electroencephalography (EEG) signals and surface electromyography (sEMG) signals from back muscles. A GAN-Convolution-Transformer-based model, named GLEAM (GAN-ConvoLution-sElf Attention-ETLSTM), is designed to classify LBP intensity into four categories: no LBP, mild LBP, moderate LBP, and intolerable LBP. A denoising GAN is central to the model’s functionality, playing a pivotal role in enhancing the quality of EEG and sEMG signals by removing noise, resulting in cleaner and more accurate input data. Various features are extracted from the GAN-denoised EEG and sEMG signals, and the combined features from both EEG and sEMG are used for LBP detection. After the feature extraction, the CNN is employed to capture local temporal patterns within the data, allowing the model to focus on smaller, region-specific trends in the signals. Subsequently, the self-attention module identifies global correlations among these locally extracted features, enhancing the model’s ability to recognize broader patterns. The proposed ETLSTM network performs the final classification, which achieves an impressive LBP detection accuracy of 98.95%. This research presents several innovative contributions: (i) the development of a novel denoising GAN for cleaning EEG and sEMG signals, (ii) the design and integration of a new ETLSTM architecture as a classifier within the GLEAM model, and (iii) the introduction of the GLEAM hybrid deep learning framework, which enables robust and reliable LBP intensity assessment.
{"title":"GLEAM: A multimodal deep learning framework for chronic lower back pain detection using EEG and sEMG signals","authors":"Sagnik De, Prithwijit Mukherjee, Anisha Halder Roy","doi":"10.1016/j.compbiomed.2025.109928","DOIUrl":"10.1016/j.compbiomed.2025.109928","url":null,"abstract":"<div><div>Low Back Pain (LBP) is the most prevalent musculoskeletal condition worldwide and a leading cause of disability, significantly affecting mobility, work productivity, and overall quality of life. Due to its high prevalence and substantial economic burden, LBP presents a critical global public health challenge that demands innovative diagnostic and therapeutic solutions. This study introduces a novel deep-learning approach for diagnosing LBP intensity using electroencephalography (EEG) signals and surface electromyography (sEMG) signals from back muscles. A GAN-Convolution-Transformer-based model, named <strong>GLEAM</strong> (<strong>G</strong>AN-Convo<strong>L</strong>ution-s<strong>E</strong>lf <strong>A</strong>ttention-ETLST<strong>M</strong>), is designed to classify LBP intensity into four categories: no LBP, mild LBP, moderate LBP, and intolerable LBP. A denoising GAN is central to the model’s functionality, playing a pivotal role in enhancing the quality of EEG and sEMG signals by removing noise, resulting in cleaner and more accurate input data. Various features are extracted from the GAN-denoised EEG and sEMG signals, and the combined features from both EEG and sEMG are used for LBP detection. After the feature extraction, the CNN is employed to capture local temporal patterns within the data, allowing the model to focus on smaller, region-specific trends in the signals. Subsequently, the self-attention module identifies global correlations among these locally extracted features, enhancing the model’s ability to recognize broader patterns. The proposed ETLSTM network performs the final classification, which achieves an impressive LBP detection accuracy of 98.95%. This research presents several innovative contributions: (i) the development of a novel denoising GAN for cleaning EEG and sEMG signals, (ii) the design and integration of a new ETLSTM architecture as a classifier within the GLEAM model, and (iii) the introduction of the GLEAM hybrid deep learning framework, which enables robust and reliable LBP intensity assessment.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"189 ","pages":"Article 109928"},"PeriodicalIF":7.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143549228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-06DOI: 10.1016/j.compbiomed.2025.109904
Matthew Fynn , Kayapanda Mandana , Javed Rashid , Sven Nordholm , Yue Rong , Goutam Saha
The leading cause of mortality and morbidity worldwide is cardiovascular disease (CVD), with coronary artery disease (CAD) being the largest sub-category. Unfortunately, myocardial infarction or stroke can manifest as the first symptom of CAD, underscoring the crucial importance of early disease detection. Hence, there is a global need for a cost-effective, non-invasive, reliable, and easy-to-use system to pre-screen CAD. Previous studies have explored weak murmurs arising from CAD for classification using phonocardiogram (PCG) signals. However, these studies often involve tedious and inconvenient data collection methods, requiring precise subject preparation and environmental conditions. This study proposes using a novel data acquisition system (DAQS) designed for simplicity and convenience. The DAQS incorporates multi-channel PCG sensors into a wearable vest. The entire signal acquisition process can be completed in under two minutes, from fitting the vest to recording signals and removing it, requiring no specialist training. This exemplifies the potential for mass screening, which is impractical with current state-of-the-art protocols. Seven PCG signals are acquired, six from the chest and one from the subject’s back, marking a novel approach. Our classification approach, which utilizes linear-frequency cepstral coefficients (LFCC) as features and employs a support vector machine (SVM) to distinguish between normal and CAD-affected heartbeats, outperformed alternative low-computational methods suitable for portable applications. Utilizing feature-level fusion, multiple channels are combined, and the optimal combination yields the highest subject-level accuracy and F1-score of 80.44% and 81.00%, respectively, representing a 7% improvement over the best-performing single channel. The proposed system’s performance metrics have been demonstrated to be clinically significant, making the DAQS suitable for practical use. Moreover, the system shows promise in post-procedural monitoring for subjects undergoing percutaneous transluminal coronary angioplasty (PTCA) or coronary artery bypass grafting (CABG), effectively identifying cases of restenosis following intervention.
{"title":"Practicality meets precision: Wearable vest with integrated multi-channel PCG sensors for effective coronary artery disease pre-screening","authors":"Matthew Fynn , Kayapanda Mandana , Javed Rashid , Sven Nordholm , Yue Rong , Goutam Saha","doi":"10.1016/j.compbiomed.2025.109904","DOIUrl":"10.1016/j.compbiomed.2025.109904","url":null,"abstract":"<div><div>The leading cause of mortality and morbidity worldwide is cardiovascular disease (CVD), with coronary artery disease (CAD) being the largest sub-category. Unfortunately, myocardial infarction or stroke can manifest as the first symptom of CAD, underscoring the crucial importance of early disease detection. Hence, there is a global need for a cost-effective, non-invasive, reliable, and easy-to-use system to pre-screen CAD. Previous studies have explored weak murmurs arising from CAD for classification using phonocardiogram (PCG) signals. However, these studies often involve tedious and inconvenient data collection methods, requiring precise subject preparation and environmental conditions. This study proposes using a novel data acquisition system (DAQS) designed for simplicity and convenience. The DAQS incorporates multi-channel PCG sensors into a wearable vest. The entire signal acquisition process can be completed in under two minutes, from fitting the vest to recording signals and removing it, requiring no specialist training. This exemplifies the potential for mass screening, which is impractical with current state-of-the-art protocols. Seven PCG signals are acquired, six from the chest and one from the subject’s back, marking a novel approach. Our classification approach, which utilizes linear-frequency cepstral coefficients (LFCC) as features and employs a support vector machine (SVM) to distinguish between normal and CAD-affected heartbeats, outperformed alternative low-computational methods suitable for portable applications. Utilizing feature-level fusion, multiple channels are combined, and the optimal combination yields the highest subject-level accuracy and F1-score of 80.44% and 81.00%, respectively, representing a 7% improvement over the best-performing single channel. The proposed system’s performance metrics have been demonstrated to be clinically significant, making the DAQS suitable for practical use. Moreover, the system shows promise in post-procedural monitoring for subjects undergoing percutaneous transluminal coronary angioplasty (PTCA) or coronary artery bypass grafting (CABG), effectively identifying cases of restenosis following intervention.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"189 ","pages":"Article 109904"},"PeriodicalIF":7.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-05DOI: 10.1016/j.compbiomed.2025.109966
Arjun Thakur , Pradyumna Agasthi , Chieh-Ju Chao , Juan Maria Farina , David R. Holmes , David Fortuin , Chadi Ayoub , Reza Arsanjani , Imon Banerjee
Predicting post-Percutaneous Coronary Intervention (PCI) outcomes is crucial for effective patient management and quality improvement in healthcare. However, achieving accurate predictions requires the integration of multimodal clinical data, including physiological signals, demographics, and patient history, to estimate prognosis. The integration of such high-dimensional, multi-modal data presents a significant challenge due to its complexity and the need for sophisticated analytical methods.
Our study focuses on comparative performance analysis for state-of-theart vision transformer (ViT) and proposed a novel multi-branch CNN model with block attention for multimodal data analysis in a joint fusion framework. To design a comparative model for ViT, we proposed a new joint fusion architecture that consists of a convolutional neural network (CNN) with a convolutional block attention module (CBAM).
We integrate images of electrocardiogram (ECG) data and tabular electronic health records (EHR) of 13,064 subjects, considering 6871 samples for training and 6193 for testing (stratified sampling) in order to predict 3 clinically relevant post-PCI (6 months) clinical endpoints - heart failure, all-cause mortality, and stroke. The learned representations are combined at an intermediate layer, followed by processing these representations using a fully connected layer. The proposed model demonstrates excellent performance with the highest AUROC scores of 0.849, 0.913, and 0.794 for predicting heart failure, all-cause mortality, and stroke, respectively. Surpassing the baseline EHR model and ViT, the proposed CNN + CBAM fusion model showcases superior predictive capabilities for heart failure prediction (DeLong's test p-value = 0.043) which highlights the importance of preserving local spatial features via CNN low-level filters and semi-global dependency using block attention.
Without using any laboratory test results and vital data, we obtained state-of-the-art performance using ECG image directly using proposed attention based CNN model and outperformed the ViT baseline. Proposed multimodal integration strategy would lead to the development of more accurate, mutlimodal data-driven models for predicting PCI outcomes. As a result, cardiologists could better tailor treatment plans, optimize patient management strategies, and improve overall clinical outcomes after the complex PCI procedure.
{"title":"Joint fusion of EHR and ECG data using attention-based CNN and ViT for predicting adverse clinical endpoints in percutaneous coronary intervention patients","authors":"Arjun Thakur , Pradyumna Agasthi , Chieh-Ju Chao , Juan Maria Farina , David R. Holmes , David Fortuin , Chadi Ayoub , Reza Arsanjani , Imon Banerjee","doi":"10.1016/j.compbiomed.2025.109966","DOIUrl":"10.1016/j.compbiomed.2025.109966","url":null,"abstract":"<div><div>Predicting post-Percutaneous Coronary Intervention (PCI) outcomes is crucial for effective patient management and quality improvement in healthcare. However, achieving accurate predictions requires the integration of multimodal clinical data, including physiological signals, demographics, and patient history, to estimate prognosis. The integration of such high-dimensional, multi-modal data presents a significant challenge due to its complexity and the need for sophisticated analytical methods.</div><div>Our study focuses on comparative performance analysis for state-of-theart vision transformer (ViT) and proposed a novel multi-branch CNN model with block attention for multimodal data analysis in a joint fusion framework. To design a comparative model for ViT, we proposed a new joint fusion architecture that consists of a convolutional neural network (CNN) with a convolutional block attention module (CBAM).</div><div>We integrate images of electrocardiogram (ECG) data and tabular electronic health records (EHR) of 13,064 subjects, considering 6871 samples for training and 6193 for testing (stratified sampling) in order to predict 3 clinically relevant post-PCI (6 months) clinical endpoints - heart failure, all-cause mortality, and stroke. The learned representations are combined at an intermediate layer, followed by processing these representations using a fully connected layer. The proposed model demonstrates excellent performance with the highest AUROC scores of 0.849, 0.913, and 0.794 for predicting heart failure, all-cause mortality, and stroke, respectively. Surpassing the baseline EHR model and ViT, the proposed CNN + CBAM fusion model showcases superior predictive capabilities for heart failure prediction (DeLong's test p-value = 0.043) which highlights the importance of preserving local spatial features via CNN low-level filters and semi-global dependency using block attention.</div><div>Without using any laboratory test results and vital data, we obtained state-of-the-art performance using ECG image directly using proposed attention based CNN model and outperformed the ViT baseline. Proposed multimodal integration strategy would lead to the development of more accurate, mutlimodal data-driven models for predicting PCI outcomes. As a result, cardiologists could better tailor treatment plans, optimize patient management strategies, and improve overall clinical outcomes after the complex PCI procedure.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"189 ","pages":"Article 109966"},"PeriodicalIF":7.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143549171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-05DOI: 10.1016/j.compbiomed.2025.109939
Siting Li , Maxwell Levis , Monica DiMambro , Weiyi Wu , Joshua Levy , Brian Shiner , Jiang Gui
Objective
Suicide risk assessment has historically relied heavily on clinical evaluations and patient self-reports. Natural language processing (NLP) of electronic health records (EHRs) provides an alternative approach for extracting risk predictors from clinical notes. Modeling NLP variables, however, is challenging because of zero inflation and skewed distributions. Therefore, we evaluated whether an adaptive-mixture-categorization (AMC) method could optimize the suicide risk predictive capacity of NLP data extracted from Veterans Affairs (VA) EHR notes.
Methods
NLP variables for 25,342 patients were analyzed using the SÉANCE python package. The AMC method was employed to categorize NLP measures into distinct groups to maximize the between-category variance. Associations between suicide outcomes and AMC-categorized NLP variables were compared to those between the original and quantile-categorized NLP variables.
Results
AMC-categorized variables showed stronger associations with suicide risk than other approaches did in the full cohort analysis and sensitivity analyses by subsampling bootstrapping. Additionally, over 90 % of the NLP variables were significantly associated with suicide risk in univariate analyses, indicating the relevance of clinical notes in suicide prevention.
Conclusion
AMC-based categorization substantially enhanced the suicide predictive capacity of NLP variables extracted from clinical text. Transforming skewed NLP data with the AMC method holds promise for improving risk prediction models.
{"title":"Preprocessing of natural language process variables using a data-driven method improves the association with suicide risk in a large veterans affairs population","authors":"Siting Li , Maxwell Levis , Monica DiMambro , Weiyi Wu , Joshua Levy , Brian Shiner , Jiang Gui","doi":"10.1016/j.compbiomed.2025.109939","DOIUrl":"10.1016/j.compbiomed.2025.109939","url":null,"abstract":"<div><h3>Objective</h3><div>Suicide risk assessment has historically relied heavily on clinical evaluations and patient self-reports. Natural language processing (NLP) of electronic health records (EHRs) provides an alternative approach for extracting risk predictors from clinical notes. Modeling NLP variables, however, is challenging because of zero inflation and skewed distributions. Therefore, we evaluated whether an adaptive-mixture-categorization (AMC) method could optimize the suicide risk predictive capacity of NLP data extracted from Veterans Affairs (VA) EHR notes.</div></div><div><h3>Methods</h3><div>NLP variables for 25,342 patients were analyzed using the SÉANCE python package. The AMC method was employed to categorize NLP measures into distinct groups to maximize the between-category variance. Associations between suicide outcomes and AMC-categorized NLP variables were compared to those between the original and quantile-categorized NLP variables.</div></div><div><h3>Results</h3><div>AMC-categorized variables showed stronger associations with suicide risk than other approaches did in the full cohort analysis and sensitivity analyses by subsampling bootstrapping. Additionally, over 90 % of the NLP variables were significantly associated with suicide risk in univariate analyses, indicating the relevance of clinical notes in suicide prevention.</div></div><div><h3>Conclusion</h3><div>AMC-based categorization substantially enhanced the suicide predictive capacity of NLP variables extracted from clinical text. Transforming skewed NLP data with the AMC method holds promise for improving risk prediction models.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"189 ","pages":"Article 109939"},"PeriodicalIF":7.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143549169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-05DOI: 10.1016/j.compbiomed.2025.109888
Martin Kukrál , Duc Thien Pham , Josef Kohout , Štefan Kohek , Marek Havlík , Dominika Grygarová
Electroencephalography (EEG) experiments typically generate vast amounts of data due to the high sampling rates and the use of multiple electrodes to capture brain activity. Consequently, storing and transmitting these large datasets is challenging, necessitating the creation of specialized compression techniques tailored to this data type. This study proposes one such method, which at its core uses an artificial neural network (specifically a convolutional autoencoder) to learn the latent representations of modelled EEG signals to perform lossy compression, which gets further improved with lossless corrections based on the user-defined threshold for the maximum tolerable amplitude loss, resulting in a flexible near-lossless compression scheme. To test the viability of our approach, a case study was performed on the 256-channel binocular rivalry dataset, which also describes mostly data-specific statistical analyses and preprocessing steps. Compression results, evaluation metrics, and comparisons with baseline general compression methods suggest that the proposed method can achieve substantial compression results and speed, making it one of the potential research topics for follow-up studies.
{"title":"Near-lossless EEG signal compression using a convolutional autoencoder: Case study for 256-channel binocular rivalry dataset","authors":"Martin Kukrál , Duc Thien Pham , Josef Kohout , Štefan Kohek , Marek Havlík , Dominika Grygarová","doi":"10.1016/j.compbiomed.2025.109888","DOIUrl":"10.1016/j.compbiomed.2025.109888","url":null,"abstract":"<div><div>Electroencephalography (EEG) experiments typically generate vast amounts of data due to the high sampling rates and the use of multiple electrodes to capture brain activity. Consequently, storing and transmitting these large datasets is challenging, necessitating the creation of specialized compression techniques tailored to this data type. This study proposes one such method, which at its core uses an artificial neural network (specifically a convolutional autoencoder) to learn the latent representations of modelled EEG signals to perform lossy compression, which gets further improved with lossless corrections based on the user-defined threshold for the maximum tolerable amplitude loss, resulting in a flexible near-lossless compression scheme. To test the viability of our approach, a case study was performed on the 256-channel binocular rivalry dataset, which also describes mostly data-specific statistical analyses and preprocessing steps. Compression results, evaluation metrics, and comparisons with baseline general compression methods suggest that the proposed method can achieve substantial compression results and speed, making it one of the potential research topics for follow-up studies.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"189 ","pages":"Article 109888"},"PeriodicalIF":7.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143549170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-05DOI: 10.1016/j.compbiomed.2025.109959
M. Barzegar Gerdroodbary , Sajad Salavatidezfouli
This study investigates the fluid-structure interaction (FSI) simulation of the abdominal aorta, with a particular focus on the hemodynamic alterations induced by aneurysmal deformations. The hemodynamic behavior within the aorta is highly dependent on the geometric characteristics of the aneurysm, necessitating the use of patient-specific models to ensure accurate predictions. The primary objective of this research is to enhance the predictive capability of flow and structural indices in a complex FSI biomechanical setting under varying physiological conditions, namely rest and exercise states. This paper presents a comparative analysis between two distinct yet promising surrogate models: Proper Orthogonal Decomposition coupled with Long Short-Term Memory (POD + LSTM) and Convolutional Neural Network combined with Long Short-Term Memory (CNN + LSTM). The methodology, model selection, and comparative performance analysis are discussed in detail, providing insights into the efficacy and limitations of each approach in the context of personalized cardiovascular simulations.
{"title":"A predictive surrogate model based on linear and nonlinear solution manifold reduction in cardiovascular FSI: A comparative study","authors":"M. Barzegar Gerdroodbary , Sajad Salavatidezfouli","doi":"10.1016/j.compbiomed.2025.109959","DOIUrl":"10.1016/j.compbiomed.2025.109959","url":null,"abstract":"<div><div>This study investigates the fluid-structure interaction (FSI) simulation of the abdominal aorta, with a particular focus on the hemodynamic alterations induced by aneurysmal deformations. The hemodynamic behavior within the aorta is highly dependent on the geometric characteristics of the aneurysm, necessitating the use of patient-specific models to ensure accurate predictions. The primary objective of this research is to enhance the predictive capability of flow and structural indices in a complex FSI biomechanical setting under varying physiological conditions, namely rest and exercise states. This paper presents a comparative analysis between two distinct yet promising surrogate models: Proper Orthogonal Decomposition coupled with Long Short-Term Memory (POD + LSTM) and Convolutional Neural Network combined with Long Short-Term Memory (CNN + LSTM). The methodology, model selection, and comparative performance analysis are discussed in detail, providing insights into the efficacy and limitations of each approach in the context of personalized cardiovascular simulations.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"189 ","pages":"Article 109959"},"PeriodicalIF":7.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143549168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-05DOI: 10.1016/j.compbiomed.2025.109958
Fazla Rabby Raihan , Lway Faisal Abdulrazak , Md. Ashikur Rahman , Md Mamun Ali , Sobhy M. Ibrahim , Kawsar Ahmed , Francis M. Bui , Imran Mahmud
The tumor-homing peptides (THPs) have emerged as one of the attractive resources for targeted cancer therapy, being able to bind and penetrate tumor cells selectively while ignoring adjacent healthy tissues. Therefore, the computational models to predict THPs became popular very rapidly, since laboratory methods are slow and resourceful. Herein, we are proposing StackTHP, a newly developed stacking-ensemble model aimed at further improving THP prediction accuracy. StackTHP implements multiple feature extraction methods, including amino acid composition (AAC), and pseudo amino acid composition (PAAC) together with classical machine learning classifiers like Extra Trees, Random Forest, and AdaBoost, while the logistic regression-based meta-classifier is used for the stacking framework. StackTHP outperformed all other models, producing an accuracy of 91.92 %, Matthew's correlation coefficient (MCC) of 0.8415, AUC of 0.977 on benchmark datasets, indicates that it is better than approaches attempted earlier and provides a robust solution for proceeding towards the discovery and development of peptide-based cancer therapies. Future research will focus on the application of StackTHP over more diverse sets of data along with some hybrid methods to enhance the prediction capability. The dataset and the code are available at the following link: https://github.com/Ashikur562/StackTHP.
{"title":"StackTHP: A stacking ensemble model for accurate prediction of tumor-homing peptides in cancer therapy","authors":"Fazla Rabby Raihan , Lway Faisal Abdulrazak , Md. Ashikur Rahman , Md Mamun Ali , Sobhy M. Ibrahim , Kawsar Ahmed , Francis M. Bui , Imran Mahmud","doi":"10.1016/j.compbiomed.2025.109958","DOIUrl":"10.1016/j.compbiomed.2025.109958","url":null,"abstract":"<div><div>The tumor-homing peptides (THPs) have emerged as one of the attractive resources for targeted cancer therapy, being able to bind and penetrate tumor cells selectively while ignoring adjacent healthy tissues. Therefore, the computational models to predict THPs became popular very rapidly, since laboratory methods are slow and resourceful. Herein, we are proposing StackTHP, a newly developed stacking-ensemble model aimed at further improving THP prediction accuracy. StackTHP implements multiple feature extraction methods, including amino acid composition (AAC), and pseudo amino acid composition (PAAC) together with classical machine learning classifiers like Extra Trees, Random Forest, and AdaBoost, while the logistic regression-based meta-classifier is used for the stacking framework. StackTHP outperformed all other models, producing an accuracy of 91.92 %, Matthew's correlation coefficient (MCC) of 0.8415, AUC of 0.977 on benchmark datasets, indicates that it is better than approaches attempted earlier and provides a robust solution for proceeding towards the discovery and development of peptide-based cancer therapies. Future research will focus on the application of StackTHP over more diverse sets of data along with some hybrid methods to enhance the prediction capability. The dataset and the code are available at the following link: <span><span>https://github.com/Ashikur562/StackTHP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"189 ","pages":"Article 109958"},"PeriodicalIF":7.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-04DOI: 10.1016/j.compbiomed.2025.109909
Ali Fahmi , Amy MacBrayne , Frances Humby , Paul Curzon , William Marsh
Dynamic Bayesian Networks (DBNs) are temporal probabilistic graphical models with a set of random variables and dependencies between them. DBNs have a meaningful structure and can model the continuity of events in discrete time-slices. In this study, we aimed to show how to build DBN models for self-management of chronic diseases using multiple sources of evidence.
Chronic diseases need a life-long treatment. People with chronic diseases are commonly provided fixed-interval clinic visits, but they can suffer from sudden increases of disease activity. We proposed an approach to build DBN models for self-management of chronic diseases in order to advise on treatment decisions. We used Rheumatoid Arthritis (RA) as a case-study, and employed rheumatology experts’ knowledge, clinical data, clinical guidelines, and established literature to identify the variables, their states, dependencies between the variables, and parameters of the model. Due to the unavailability of the ideal data (i.e., large data with enough frequency), we adopted two approaches to make inferences for initial evaluation of the model: manipulation of the clinical data to increase their frequency and creating dummy patient scenarios. The initial evaluation indicated promising results for treatment decisions.
The proposed approach used multiple sources of evidence to build DBN models for self-management of chronic diseases. The resulting DBN for RA case-study had a clinically meaningful structure, although it needed to be further evaluated and calibrated. Resulting DBN model has the potential to be used as a decision-support tool to help patients and clinicians better manage RA.
{"title":"Dynamic Bayesian network models for self-management of chronic diseases: Rheumatoid arthritis case-study","authors":"Ali Fahmi , Amy MacBrayne , Frances Humby , Paul Curzon , William Marsh","doi":"10.1016/j.compbiomed.2025.109909","DOIUrl":"10.1016/j.compbiomed.2025.109909","url":null,"abstract":"<div><div>Dynamic Bayesian Networks (DBNs) are temporal probabilistic graphical models with a set of random variables and dependencies between them. DBNs have a meaningful structure and can model the continuity of events in discrete time-slices. In this study, we aimed to show how to build DBN models for self-management of chronic diseases using multiple sources of evidence.</div><div>Chronic diseases need a life-long treatment. People with chronic diseases are commonly provided fixed-interval clinic visits, but they can suffer from sudden increases of disease activity. We proposed an approach to build DBN models for self-management of chronic diseases in order to advise on treatment decisions. We used Rheumatoid Arthritis (RA) as a case-study, and employed rheumatology experts’ knowledge, clinical data, clinical guidelines, and established literature to identify the variables, their states, dependencies between the variables, and parameters of the model. Due to the unavailability of the ideal data (i.e., large data with enough frequency), we adopted two approaches to make inferences for initial evaluation of the model: manipulation of the clinical data to increase their frequency and creating dummy patient scenarios. The initial evaluation indicated promising results for treatment decisions.</div><div>The proposed approach used multiple sources of evidence to build DBN models for self-management of chronic diseases. The resulting DBN for RA case-study had a clinically meaningful structure, although it needed to be further evaluated and calibrated. Resulting DBN model has the potential to be used as a decision-support tool to help patients and clinicians better manage RA.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"189 ","pages":"Article 109909"},"PeriodicalIF":7.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143549167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}