Pub Date : 2026-01-23DOI: 10.1016/j.bspc.2026.109605
Chandra Mohini C. P., V. Raghavendran
The quick advances in the field of self-driving vehicles and connected automobiles have increased the commercial worth of automobile applications. Digital Twin is employed as a promising technology to modernize the automotive industry. Moreover, the development of digital twins has offered smart manufacturing systems with knowledge-making capabilities. Hence, the training in a virtual environment minimizes the errors on the shop floor. Still, the extraction of relevant insights to establish the optimal course sequence for the shop floor employees is computationally difficult. To overcome such issues, this paper develops the Dense Resolution High-order Attention Forward Harmonic Network (DRHAFHNet)-based course sequence recommendation for learning the effectiveness of shopfloor employees with a digital twin. The shop floor owner collects the data from the physical space, and the cloud server stores the data from the shop floor owner. The twin manager collects the data from the cloud server and simulates it in the virtual space. The virtual data is stored in the cloud, and the course sequence recommendation is performed by the DRHAFHNet using a digital twin E-learning platform. Moreover, the proposed model attains the Normalized Mean Square Error (MSE), Normalized Mean Absolute Percentage Error (MAPE), Normalized Root MSE (RMSE), Normalized Mean Absolute Percentage (MAP), and latency of 0.334, 0.332, 0.342, 0.299, and 4.99 ms.
{"title":"DRHAFHNet: Dense resolution high-order attention forward harmonic network-based learning effectiveness of shopfloor employees with digital twin","authors":"Chandra Mohini C. P., V. Raghavendran","doi":"10.1016/j.bspc.2026.109605","DOIUrl":"10.1016/j.bspc.2026.109605","url":null,"abstract":"<div><div>The quick advances in the field of self-driving vehicles and connected automobiles have increased the commercial worth of automobile applications. Digital Twin is employed as a promising technology to modernize the automotive industry. Moreover, the development of digital twins has offered smart manufacturing systems with knowledge-making capabilities. Hence, the training in a virtual environment minimizes the errors on the shop floor. Still, the extraction of relevant insights to establish the optimal course sequence for the shop floor employees is computationally difficult. To overcome such issues, this paper develops the Dense Resolution High-order Attention Forward Harmonic Network (DRHAFHNet)-based course sequence recommendation for learning the effectiveness of shopfloor employees with a digital twin. The shop floor owner collects the data from the physical space, and the cloud server stores the data from the shop floor owner. The twin manager collects the data from the cloud server and simulates it in the virtual space. The virtual data is stored in the cloud, and the course sequence recommendation is performed by the DRHAFHNet using a digital twin E-learning platform. Moreover, the proposed model attains the Normalized Mean Square Error (MSE), Normalized Mean Absolute Percentage Error (MAPE), Normalized Root MSE (RMSE), Normalized Mean Absolute Percentage (MAP), and latency of 0.334, 0.332, 0.342, 0.299, and 4.99 ms.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"117 ","pages":"Article 109605"},"PeriodicalIF":4.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.bspc.2026.109660
Gelan Ayana , Beshatu Debela Wako , So-yun Park , Kwangcheol Casey Jeong , Soon‑Do Yoon , Se‑woon Choe
Squamous cell carcinoma (SCC) is the second most common form of skin cancer with significant public health implications owing to its potential for metastasis if not detected and treated early. Traditional diagnostic methods, which rely on histopathological analysis, face challenges, such as variability in tissue morphology and dependence on expert interpretation, leading to inconsistent diagnoses. To address these issues, this work proposes a novel multistage transfer learning (MSTL) approach that leverages deep learning models for automated SCC diagnosis from histopathological images. The MSTL framework begins with a model pretrained on the extensive ImageNet dataset, fine-tuned on a large breast histopathology dataset to capture domain-specific features, and further refined on a smaller SCC histopathology dataset. Vision transformer (ViT) models have been employed, marking a pioneering application in SCC analysis. The experimental results showed that the MSTL-based ViT model achieved state-of-the-art accuracy of 0.9752, precision of 0.9708, recall of 0.9734, F1 score of 0.9741, and an area under receiver operating curve (AUC) of 0.9739, thereby setting a new benchmark. Furthermore, the MSTL approach demonstrated superior training efficiency, with reduced loss and faster convergence compared to the conventional transfer learning models, without excessive computational costs. Evaluation on an independent dataset confirmed the robustness of the MSTL approach with the ViT model, achieving an AUC of 0.9437. The MSTL approach also exhibited strong transferability, with high Pearson correlation coefficients between the transferability measures and AUCs. Further investigations are needed to assess the generalizability of MSTL to other cancers and its applicability in clinical settings.
{"title":"Multistage transfer learning for skin squamous cell carcinoma histopathology image classification","authors":"Gelan Ayana , Beshatu Debela Wako , So-yun Park , Kwangcheol Casey Jeong , Soon‑Do Yoon , Se‑woon Choe","doi":"10.1016/j.bspc.2026.109660","DOIUrl":"10.1016/j.bspc.2026.109660","url":null,"abstract":"<div><div>Squamous cell carcinoma (SCC) is the second most common form of skin cancer with significant public health implications owing to its potential for metastasis if not detected and treated early. Traditional diagnostic methods, which rely on histopathological analysis, face challenges, such as variability in tissue morphology and dependence on expert interpretation, leading to inconsistent diagnoses. To address these issues, this work proposes a novel multistage transfer learning (MSTL) approach that leverages deep learning models for automated SCC diagnosis from histopathological images. The MSTL framework begins with a model pretrained on the extensive ImageNet dataset, fine-tuned on a large breast histopathology dataset to capture domain-specific features, and further refined on a smaller SCC histopathology dataset. Vision transformer (ViT) models have been employed, marking a pioneering application in SCC analysis. The experimental results showed that the MSTL-based ViT model achieved state-of-the-art accuracy of 0.9752, precision of 0.9708, recall of 0.9734, F1 score of 0.9741, and an area under receiver operating curve (AUC) of 0.9739, thereby setting a new benchmark. Furthermore, the MSTL approach demonstrated superior training efficiency, with reduced loss and faster convergence compared to the conventional transfer learning models, without excessive computational costs. Evaluation on an independent dataset confirmed the robustness of the MSTL approach with the ViT model, achieving an AUC of 0.9437. The MSTL approach also exhibited strong transferability, with high Pearson correlation coefficients between the transferability measures and AUCs. Further investigations are needed to assess the generalizability of MSTL to other cancers and its applicability in clinical settings.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"117 ","pages":"Article 109660"},"PeriodicalIF":4.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.bspc.2026.109641
Li Kang, Chuanghong Zhao, Jianjun Huang, Zhixin Gong
Medical image segmentation is a crucial component of medical image processing. However, most medical image segmentation methods require a large number of labeled samples to train neural networks. The annotation work for medical image segmentation is both labor-intensive and technically demanding, resulting in high costs for obtaining high-quality annotations. This paper aims to perform medical image segmentation under conditions of data lacking annotations using semi-supervised techniques. This paper proposes a multi-network cross pseudo-supervision method that leverages CNN and Transformer networks. By utilizing the differences between these networks, the method allows different networks to learn from the various perspectives provided by others. To further enhance these differences and discover common features among pixels of the same category in different samples, a contrastive learning method is integrated into one of the CNN networks. Experiments conducted on benchmark datasets have validated the effectiveness of the proposed method across different label proportions. Ablation studies on multi-network cross pseudo-supervision and contrastive loss demonstrate the effectiveness of the network architecture and contrastive learning. Extensive experimental results show that the semi-supervised proposed in this paper is superior to the existing six semi-supervised learning methods. With a small number of annotated samples, this method can significantly improve network performance. This advancement significantly enhances the accuracy and efficiency of medical image segmentation, crucial for precise lesion detection, treatment planning, and therapeutic effect assessment, even with limited annotated data.
{"title":"Heterogeneous multi-network cross pseudo-supervised medical image segmentation","authors":"Li Kang, Chuanghong Zhao, Jianjun Huang, Zhixin Gong","doi":"10.1016/j.bspc.2026.109641","DOIUrl":"10.1016/j.bspc.2026.109641","url":null,"abstract":"<div><div>Medical image segmentation is a crucial component of medical image processing. However, most medical image segmentation methods require a large number of labeled samples to train neural networks. The annotation work for medical image segmentation is both labor-intensive and technically demanding, resulting in high costs for obtaining high-quality annotations. This paper aims to perform medical image segmentation under conditions of data lacking annotations using semi-supervised techniques. This paper proposes a multi-network cross pseudo-supervision method that leverages CNN and Transformer networks. By utilizing the differences between these networks, the method allows different networks to learn from the various perspectives provided by others. To further enhance these differences and discover common features among pixels of the same category in different samples, a contrastive learning method is integrated into one of the CNN networks. Experiments conducted on benchmark datasets have validated the effectiveness of the proposed method across different label proportions. Ablation studies on multi-network cross pseudo-supervision and contrastive loss demonstrate the effectiveness of the network architecture and contrastive learning. Extensive experimental results show that the semi-supervised proposed in this paper is superior to the existing six semi-supervised learning methods. With a small number of annotated samples, this method can significantly improve network performance. This advancement significantly enhances the accuracy and efficiency of medical image segmentation, crucial for precise lesion detection, treatment planning, and therapeutic effect assessment, even with limited annotated data.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"117 ","pages":"Article 109641"},"PeriodicalIF":4.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.bspc.2026.109678
Chunping Gao , Lihua Guo , Qi Wu
In clinical practice, physicians usually review medical images from various modalities to obtain comprehensive structural insights that support accurate diagnosis. Inspired by this practice, Multi-Source Unsupervised Domain Adaptation (MUDA) aims to improve model generalization on an unlabeled target domain by leveraging structural knowledge from multiple labeled source domains. In medical image segmentation, existing MUDA frameworks mainly employ adversarial training to achieve domain adaptation, but they often capture limited structural information. In this work, we propose a dual alignment MUDA framework for medical image segmentation, which jointly uses image alignment and feature alignment to facilitate sufficient knowledge transfer. For image alignment, we apply Fourier-based domain adaptation (FDA) to mitigate appearance discrepancies between the source and target domains. For feature alignment, we integrate adversarial learning and curvature-based boundary consistency constraints to align global predictions and preserve local boundary details. Furthermore, we develop a performance-aware ensemble strategy that adaptively emphasizes models with superior performance, thereby improving prediction robustness. Extensive experiments on three publicly available datasets demonstrate that our method significantly outperforms existing state-of-the-art approaches. The proposed method enables robust unsupervised domain adaptation among medical images from multiple domains with large domain shifts.
{"title":"Multi-Source Unsupervised Domain Adaptation with dual alignment for medical image segmentation","authors":"Chunping Gao , Lihua Guo , Qi Wu","doi":"10.1016/j.bspc.2026.109678","DOIUrl":"10.1016/j.bspc.2026.109678","url":null,"abstract":"<div><div>In clinical practice, physicians usually review medical images from various modalities to obtain comprehensive structural insights that support accurate diagnosis. Inspired by this practice, Multi-Source Unsupervised Domain Adaptation (MUDA) aims to improve model generalization on an unlabeled target domain by leveraging structural knowledge from multiple labeled source domains. In medical image segmentation, existing MUDA frameworks mainly employ adversarial training to achieve domain adaptation, but they often capture limited structural information. In this work, we propose a dual alignment MUDA framework for medical image segmentation, which jointly uses image alignment and feature alignment to facilitate sufficient knowledge transfer. For image alignment, we apply Fourier-based domain adaptation (FDA) to mitigate appearance discrepancies between the source and target domains. For feature alignment, we integrate adversarial learning and curvature-based boundary consistency constraints to align global predictions and preserve local boundary details. Furthermore, we develop a performance-aware ensemble strategy that adaptively emphasizes models with superior performance, thereby improving prediction robustness. Extensive experiments on three publicly available datasets demonstrate that our method significantly outperforms existing state-of-the-art approaches. The proposed method enables robust unsupervised domain adaptation among medical images from multiple domains with large domain shifts.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"117 ","pages":"Article 109678"},"PeriodicalIF":4.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.bspc.2026.109652
Jina E. , Lin Wang , Bingchao Wan , Wenjie Yu , Yushi Chen , Lingxia Fei , Feng Yang , Jun Zhuang
Epilepsy is a common chronic neurological disorder, and its recurrent seizures severely impact patients’ physical and mental health as well as social functioning. This study presents a novel deep learning framework inspired by neuroscience, utilizing a Topological Convolutional Neural Network with Long Short-Term Memory (TSCNN-LSTM). This framework aims to enhance the quality of life for epilepsy patients and optimize clinical management strategies. The model utilizes a convolutional neural network to extract time–frequency features from EEG signals and an LSTM network to simulate the dynamic interactions between brain regions, achieving accurate seizure prediction. The TSCNN-LSTM model was evaluated across three datasets: CHB-MIT, Siena, and our clinical dataset. On CHB-MIT, the model achieved 96.0% sensitivity, 95.0% specificity, and 98.0% AUC. Siena dataset validation yielded 89.2% accuracy and 94.8% AUC. Clinical data evaluation demonstrated 85.9% accuracy and 92.5% AUC, confirming robust performance across diverse recording conditions. These consistent metrics across heterogeneous datasets validate the model’s exceptional predictive accuracy and cross-dataset generalizability, establishing its potential for clinical seizure prediction applications.
{"title":"Topology-convolutional long short-term memory network for epileptic seizure prediction: An interpretable deep learning framework based on a multi-center dataset","authors":"Jina E. , Lin Wang , Bingchao Wan , Wenjie Yu , Yushi Chen , Lingxia Fei , Feng Yang , Jun Zhuang","doi":"10.1016/j.bspc.2026.109652","DOIUrl":"10.1016/j.bspc.2026.109652","url":null,"abstract":"<div><div>Epilepsy is a common chronic neurological disorder, and its recurrent seizures severely impact patients’ physical and mental health as well as social functioning. This study presents a novel deep learning framework inspired by neuroscience, utilizing a Topological Convolutional Neural Network with Long Short-Term Memory (TSCNN-LSTM). This framework aims to enhance the quality of life for epilepsy patients and optimize clinical management strategies. The model utilizes a convolutional neural network to extract time–frequency features from EEG signals and an LSTM network to simulate the dynamic interactions between brain regions, achieving accurate seizure prediction. The TSCNN-LSTM model was evaluated across three datasets: CHB-MIT, Siena, and our clinical dataset. On CHB-MIT, the model achieved 96.0% sensitivity, 95.0% specificity, and 98.0% AUC. Siena dataset validation yielded 89.2% accuracy and 94.8% AUC. Clinical data evaluation demonstrated 85.9% accuracy and 92.5% AUC, confirming robust performance across diverse recording conditions. These consistent metrics across heterogeneous datasets validate the model’s exceptional predictive accuracy and cross-dataset generalizability, establishing its potential for clinical seizure prediction applications.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"117 ","pages":"Article 109652"},"PeriodicalIF":4.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.bspc.2026.109656
Asmaa Hammad , Mosa E. Hosney , Marwa M. Emam , Nagwan Abdel Samee , Reem Ibrahim Alkanhel , Essam H. Houssein
Accurately detecting and staging epileptic seizures is crucial in medical diagnostics, as early identification significantly improves patient survival rates. Electroencephalogram (EEG) signals are widely used for seizure detection, yet no universally accepted feature set exists for this purpose. While incorporating all possible EEG features may enhance classification accuracy, excessive dimensionality introduces redundancy and inefficiency, ultimately reducing overall performance. To overcome this challenge, we propose a novel wrapper-based feature selection method, ECO-OL, which integrates Orthogonal Learning (OL) with the Educational Competition Optimizer (ECO) to form a hybrid optimization algorithm. ECO-OL enhances the search process, selects the most relevant features, and prevents premature convergence while maintaining classifier performance. Additionally, we introduce a hybrid classification model, ECO-OL with MLPNN, which combines the improved ECO-OL feature selection approach with a multi-layer perceptron neural network (MLPNN) to optimize EEG seizure classification. ECO-OL was thoroughly assessed with the CEC’22 test suite and the Khas EEG seizure detection dataset, consisting of preictal, interictal, and ictal EEG signals. The method was applied to classify interictal vs. ictal, interictal vs. preictal, preictal vs. ictal, and all three classes combined. Experimental results demonstrate that ECO-OL outperforms ECO and six widely used metaheuristic algorithms in diversity, convergence, and statistical measures. The proposed method achieves 99.2% accuracy in ictal–preictal classification and 95.9% in interictal–preictal classification, surpassing existing methods by 1.2% and 15.8%, respectively. This study provides a robust computational framework for seizure detection, with promising applications in AI-driven medical research and bioinformatics.
{"title":"An enhanced educational competition optimizer and feedforward neural networks for automatic seizure detection in EEG signals","authors":"Asmaa Hammad , Mosa E. Hosney , Marwa M. Emam , Nagwan Abdel Samee , Reem Ibrahim Alkanhel , Essam H. Houssein","doi":"10.1016/j.bspc.2026.109656","DOIUrl":"10.1016/j.bspc.2026.109656","url":null,"abstract":"<div><div>Accurately detecting and staging epileptic seizures is crucial in medical diagnostics, as early identification significantly improves patient survival rates. Electroencephalogram (EEG) signals are widely used for seizure detection, yet no universally accepted feature set exists for this purpose. While incorporating all possible EEG features may enhance classification accuracy, excessive dimensionality introduces redundancy and inefficiency, ultimately reducing overall performance. To overcome this challenge, we propose a novel wrapper-based feature selection method, ECO-OL, which integrates Orthogonal Learning (OL) with the Educational Competition Optimizer (ECO) to form a hybrid optimization algorithm. ECO-OL enhances the search process, selects the most relevant features, and prevents premature convergence while maintaining classifier performance. Additionally, we introduce a hybrid classification model, ECO-OL with MLPNN, which combines the improved ECO-OL feature selection approach with a multi-layer perceptron neural network (MLPNN) to optimize EEG seizure classification. ECO-OL was thoroughly assessed with the CEC’22 test suite and the Khas EEG seizure detection dataset, consisting of preictal, interictal, and ictal EEG signals. The method was applied to classify interictal vs. ictal, interictal vs. preictal, preictal vs. ictal, and all three classes combined. Experimental results demonstrate that ECO-OL outperforms ECO and six widely used metaheuristic algorithms in diversity, convergence, and statistical measures. The proposed method achieves 99.2% accuracy in ictal–preictal classification and 95.9% in interictal–preictal classification, surpassing existing methods by 1.2% and 15.8%, respectively. This study provides a robust computational framework for seizure detection, with promising applications in AI-driven medical research and bioinformatics.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"117 ","pages":"Article 109656"},"PeriodicalIF":4.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.bspc.2026.109672
Qing Lu , Longbo Zheng , Jionglong Su , Weimin Ma , Hui Ma , Yulin Zhang
An innovative multimodal deep learning model, MCAB-GFEResNet, was developed to predict treatment response to neoadjuvant chemoradiotherapy (nCRT) in patients with locally advanced rectal cancer. The system integrates multiparametric baseline data acquired prior to nCRT initiation: (1) pre-treatment whole-slide biopsy images (WSI) obtained during initial diagnostic workup, (2) baseline magnetic resonance imaging (MRI) performed after biopsy but before nCRT commencement, and (3) pretreatment clinical biomarkers (including carcinoembryonic antigen [CEA] levels) collected contemporaneously with imaging. The architecture innovatively introduces two key components: a Multimodal Clue Attention Bridge (MCAB) fusion strategy that achieves deep feature fusion through cross-modal interaction, and a Global Feature Enhancement (GFE) module that precisely captures discriminative features in treatment response via dual-dimensional (channel and spatial) feature aggregation mechanisms. Prospective validation on 185 patients achieved 97.21% accuracy, a 5.6–20.2% absolute improvement over reference fusion methods, with 97.25% sensitivity using only pretreatment data. This enables therapeutic decisions to be made at least 4–6 weeks earlier compared to post-surgical pathology-dependent approaches. It demonstrates significant advantages in both predictive accuracy and temporal efficiency, establishing the first multimodal framework for precision prediction of nCRT response using solely pretreatment data in rectal cancer.
{"title":"MCAB-GFEResNet: A multimodal fusion model for pre-treatment prediction of neoadjuvant chemoradiotherapy response in rectal cancer","authors":"Qing Lu , Longbo Zheng , Jionglong Su , Weimin Ma , Hui Ma , Yulin Zhang","doi":"10.1016/j.bspc.2026.109672","DOIUrl":"10.1016/j.bspc.2026.109672","url":null,"abstract":"<div><div>An innovative multimodal deep learning model, MCAB-GFEResNet, was developed to predict treatment response to neoadjuvant chemoradiotherapy (nCRT) in patients with locally advanced rectal cancer. The system integrates multiparametric baseline data acquired prior to nCRT initiation: (1) pre-treatment whole-slide biopsy images (WSI) obtained during initial diagnostic workup, (2) baseline magnetic resonance imaging (MRI) performed after biopsy but before nCRT commencement, and (3) pretreatment clinical biomarkers (including carcinoembryonic antigen [CEA] levels) collected contemporaneously with imaging. The architecture innovatively introduces two key components: a Multimodal Clue Attention Bridge (MCAB) fusion strategy that achieves deep feature fusion through cross-modal interaction, and a Global Feature Enhancement (GFE) module that precisely captures discriminative features in treatment response via dual-dimensional (channel and spatial) feature aggregation mechanisms. Prospective validation on 185 patients achieved 97.21% accuracy, a 5.6–20.2% absolute improvement over reference fusion methods, with 97.25% sensitivity using only pretreatment data. This enables therapeutic decisions to be made at least 4–6 weeks earlier compared to post-surgical pathology-dependent approaches. It demonstrates significant advantages in both predictive accuracy and temporal efficiency, establishing the first multimodal framework for precision prediction of nCRT response using solely pretreatment data in rectal cancer.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"117 ","pages":"Article 109672"},"PeriodicalIF":4.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.bspc.2026.109689
Yuna Park , Heeseung Cho , Junhyoung Oh
Sleep stages are vital for assessing sleep quality, and vary significantly between individuals. Polysomnography (PSG), the clinical gold standard, is impractical for continuous monitoring due to its complexity. Wearable devices using photoplethysmography (PPG) offer a scalable, non-invasive alternative, but PPG signals are prone to motion artifacts. To address this, we utilized multiwavelength PPG signals (two green, red, and infrared channels) from a commercial smartwatch, adhering to American Academy of Sleep Medicine (AASM) standards, to classify sleep stages. We compared machine learning and deep learning models, evaluating single-channel and multi-channel fusion strategies. Our convolutional neural network combined with the gated recurrent unit (CNN+GRU) late ensemble fusion model achieved the highest performance, with a precision of 73.8% and a Cohen kappa of 0.535, outperforming the best single green channel model by 2.8% (, rank-biserial correlation = 0.6670). Green channels excelled in detecting rapid eye movement stages, while the red channel enhanced the differentiation between deep sleep (DS) and light sleep (LS) by 10%–14%. Temporal Saliency Rescaling (TSR) analysis confirmed the focus of the model on relevant signal regions, improving robustness against motion artifacts. Wilcoxon signed rank tests validated the superiority of the multichannel model ( for green channel 1, for green channel 2). These findings highlight the potential of multiwavelength PPG for accurate, interpretable sleep stage classification, allowing scalable sleep monitoring as a viable alternative to PSG.
{"title":"Deep learning ensemble approach to multi-wavelength photoplethysmogram for sleep stage detection","authors":"Yuna Park , Heeseung Cho , Junhyoung Oh","doi":"10.1016/j.bspc.2026.109689","DOIUrl":"10.1016/j.bspc.2026.109689","url":null,"abstract":"<div><div>Sleep stages are vital for assessing sleep quality, and vary significantly between individuals. Polysomnography (PSG), the clinical gold standard, is impractical for continuous monitoring due to its complexity. Wearable devices using photoplethysmography (PPG) offer a scalable, non-invasive alternative, but PPG signals are prone to motion artifacts. To address this, we utilized multiwavelength PPG signals (two green, red, and infrared channels) from a commercial smartwatch, adhering to American Academy of Sleep Medicine (AASM) standards, to classify sleep stages. We compared machine learning and deep learning models, evaluating single-channel and multi-channel fusion strategies. Our convolutional neural network combined with the gated recurrent unit (CNN+GRU) late ensemble fusion model achieved the highest performance, with a precision of 73.8% and a Cohen kappa of 0.535, outperforming the best single green channel model by 2.8% (<span><math><mrow><mi>p</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>0015</mn></mrow></math></span>, rank-biserial correlation = 0.6670). Green channels excelled in detecting rapid eye movement stages, while the red channel enhanced the differentiation between deep sleep (DS) and light sleep (LS) by 10%–14%. Temporal Saliency Rescaling (TSR) analysis confirmed the focus of the model on relevant signal regions, improving robustness against motion artifacts. Wilcoxon signed rank tests validated the superiority of the multichannel model (<span><math><mrow><mi>p</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>0015</mn></mrow></math></span> for green channel 1, <span><math><mrow><mi>p</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>0147</mn></mrow></math></span> for green channel 2). These findings highlight the potential of multiwavelength PPG for accurate, interpretable sleep stage classification, allowing scalable sleep monitoring as a viable alternative to PSG.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"117 ","pages":"Article 109689"},"PeriodicalIF":4.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1016/j.bspc.2026.109642
Zuoping Tan , Xuan Chen , Shuangcheng Li , Lijuan Yue , Tinghui Huang , Rui Yao , Jianying Lv , Jing Li , Caiye Fan , Riwei Wang , Yuanyuan Wang , Yan Wang
Background
The variational encoding Bayesian Gaussian mixture model is a novel unsupervised machine learning model that combines variational encoding and Bayesian inference. The model uses a variational encoder to learn the continuous, structured latent space of a dataset and combines multiple Gaussian mixture distributions, offering robust learning and generalisation capabilities. Although it is well-suited for modelling complex relationships, this model has not yet been explored in grading keratoconus, a blinding eye disease with unclear etiology or standardised diagnostic treatment systems. Therefore, this study applies the Bayesian Gaussian mixture model to categorise keratoconus severity and identifies relatively sensitive features for early diagnoses and interventions, potentially improving clinical decision-making and visual outcomes.
Results
The eyes of 456 patients with keratoconus are analysed. Using the variational self-encoder Bayesian Gaussian mixture model for unsupervised keratoconus grading, with classification into four categories, an accuracy of 84% is achieved, over 30% higher than that of other unsupervised algorithms, such as K-means and DBSCAN. The area under the curve values of the four categories are 0.923, 0.856, 0.789, and 0.986. Moreover, further analyses show that features such as minimum thickness (Pachy min) are more sensitive to grading outcomes.
Conclusions
The variational encoded Bayesian Gaussian model effectively captures the key features of keratoconus severity and enables accurate, automatic grading. This provides valuable clinical references for assessing disease progression and treatment, and presents a novel approach for the early diagnosis of keratoconus.
{"title":"Innovative Algorithm for Keratoconus Intelligent Grading Using Variational Encoding Bayesian Gaussian Mixture Model","authors":"Zuoping Tan , Xuan Chen , Shuangcheng Li , Lijuan Yue , Tinghui Huang , Rui Yao , Jianying Lv , Jing Li , Caiye Fan , Riwei Wang , Yuanyuan Wang , Yan Wang","doi":"10.1016/j.bspc.2026.109642","DOIUrl":"10.1016/j.bspc.2026.109642","url":null,"abstract":"<div><h3>Background</h3><div>The variational encoding Bayesian Gaussian mixture model is a novel unsupervised machine learning model that combines variational encoding and Bayesian inference. The model uses a variational encoder to learn the continuous, structured latent space of a dataset and combines multiple Gaussian mixture distributions, offering robust learning and generalisation capabilities. Although it is well-suited for modelling complex relationships, this model has not yet been explored in grading keratoconus, a blinding eye disease with unclear etiology or standardised diagnostic treatment systems. Therefore, this study applies the Bayesian Gaussian mixture model to categorise keratoconus severity and identifies relatively sensitive features for early diagnoses and interventions, potentially improving clinical decision-making and visual outcomes.</div></div><div><h3>Results</h3><div>The eyes of 456 patients with keratoconus are analysed. Using the variational self-encoder Bayesian Gaussian mixture model for unsupervised keratoconus grading, with classification into four categories, an accuracy of 84% is achieved, over 30% higher than that of other unsupervised algorithms, such as K-means and DBSCAN. The area under the curve values of the four categories are 0.923, 0.856, 0.789, and 0.986. Moreover, further analyses show that features such as minimum thickness (Pachy min) are more sensitive to grading outcomes.</div></div><div><h3>Conclusions</h3><div>The variational encoded Bayesian Gaussian model effectively captures the key features of keratoconus severity and enables accurate, automatic grading. This provides valuable clinical references for assessing disease progression and treatment, and presents a novel approach for the early diagnosis of keratoconus.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"117 ","pages":"Article 109642"},"PeriodicalIF":4.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The prediction of treatment resistance in myeloperoxidase (MPO)-anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitis (MPO-AAV) with lung involvement is critical, as a significant number of patients develop resistance to existing therapies, resulting in more severe consequences. Previous methods leverage radiomics analysis to extract features from CT images, which are then combined with clinical information for prediction. However, these methods rely on radiologists to manually delineate regions of interest (ROIs) for CT images, a process that is time-consuming and labor-intensive, and their performance is constrained by a simple linear fusion of radiomic features and clinical information. In this paper, we propose a novel multimodal learning method for predicting treatment resistance in MPO-AAV with lung involvement. We first introduce the lesion-aware re-embedding (LARE) module and the cross-slice interaction (CSI) module to adapt the powerful vision foundation model (VFM) for extracting visual representation from CT images, thereby eliminating the dependence on lesion ROIs during inference. Furthermore, we utilize Transformer to extract high-level semantic information from raw clinical data and incorporate a learnable multimodal feature fusion module (MFFM) to enhance the interaction and integration of multimodal features. We construct experiments on a dual-center dataset. The experimental results show that our proposed method achieves superior performance, exceeding previous methods by a clear margin. Additionally, we visualize the perception map of lesion ROIs generated by the LARE module and assess the importance of clinical attributes, qualitatively analyzing the performance of the proposed method. The code and trained models are publicly available at https://github.com/CVIU-CSU/PTRNet.
{"title":"A novel multimodal learning method for predicting treatment resistance in MPO-AAV with lung involvement","authors":"Yinan Zhang , Yigang Pei , Ting Meng , Yu Wang , Yong Zhong , Yixiong Liang","doi":"10.1016/j.bspc.2026.109643","DOIUrl":"10.1016/j.bspc.2026.109643","url":null,"abstract":"<div><div>The prediction of treatment resistance in myeloperoxidase (MPO)-anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitis (MPO-AAV) with lung involvement is critical, as a significant number of patients develop resistance to existing therapies, resulting in more severe consequences. Previous methods leverage radiomics analysis to extract features from CT images, which are then combined with clinical information for prediction. However, these methods rely on radiologists to manually delineate regions of interest (ROIs) for CT images, a process that is time-consuming and labor-intensive, and their performance is constrained by a simple linear fusion of radiomic features and clinical information. In this paper, we propose a novel multimodal learning method for predicting treatment resistance in MPO-AAV with lung involvement. We first introduce the lesion-aware re-embedding (LARE) module and the cross-slice interaction (CSI) module to adapt the powerful vision foundation model (VFM) for extracting visual representation from CT images, thereby eliminating the dependence on lesion ROIs during inference. Furthermore, we utilize Transformer to extract high-level semantic information from raw clinical data and incorporate a learnable multimodal feature fusion module (MFFM) to enhance the interaction and integration of multimodal features. We construct experiments on a dual-center dataset. The experimental results show that our proposed method achieves superior performance, exceeding previous methods by a clear margin. Additionally, we visualize the perception map of lesion ROIs generated by the LARE module and assess the importance of clinical attributes, qualitatively analyzing the performance of the proposed method. The code and trained models are publicly available at <span><span>https://github.com/CVIU-CSU/PTRNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"117 ","pages":"Article 109643"},"PeriodicalIF":4.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}