Pub Date : 2025-02-26DOI: 10.1109/JBHI.2025.3546019
Zehui Feng, Tongtong Zhou, Ting Han
Facial paralysis, as a common nerve system disease, seriously affects the patients' facial muscle function and appearance. Accurate facial paralysis grading is of great significance for the formulation of personalized treatment. Existing artificial intelligence based grading methods extensively focus on static image classification, which fails to capture the dynamic facial movements. Additionally, due to private concerns, building comprehensive facial paralysis datasets is challenging, making it impractical to fully train a robust model from scratch. Finally, maintaining precision and inference speed on edge devices remains a key challenge. To address these shortcomings, we propose MLST-Net, a novel and explainable three-stage deep-learning method based on multi-task learning. In the first stage, the pre-trained model is used to extract the facial static appearance structure and dynamic texture changes. The second stage fuses the proxy task results to construct a unified face semantic expression and outputs the "with or without facial paralysis" simple task results. In the third stage, we use spatial-temporal disentanglement to capture the spatial-temporal combinatorial-dependencies in video sequences. Finally, we input the classifier to get the results of complex tasks of facial paralysis classification. Compared with all advanced methods, MLST-Net is computationally inexpensive and achieves state-of-the-art results on the 1241 public dataset videos. It significantly benefits the digital diagnosis of facial palsy and offers innovative and explainable ideas for video-based digital medical treatment.
{"title":"MLST-Net: Multi-Task Learning based SpatialTemporal Disentanglement Scheme for Video Facial Paralysis Severity Grading.","authors":"Zehui Feng, Tongtong Zhou, Ting Han","doi":"10.1109/JBHI.2025.3546019","DOIUrl":"https://doi.org/10.1109/JBHI.2025.3546019","url":null,"abstract":"<p><p>Facial paralysis, as a common nerve system disease, seriously affects the patients' facial muscle function and appearance. Accurate facial paralysis grading is of great significance for the formulation of personalized treatment. Existing artificial intelligence based grading methods extensively focus on static image classification, which fails to capture the dynamic facial movements. Additionally, due to private concerns, building comprehensive facial paralysis datasets is challenging, making it impractical to fully train a robust model from scratch. Finally, maintaining precision and inference speed on edge devices remains a key challenge. To address these shortcomings, we propose MLST-Net, a novel and explainable three-stage deep-learning method based on multi-task learning. In the first stage, the pre-trained model is used to extract the facial static appearance structure and dynamic texture changes. The second stage fuses the proxy task results to construct a unified face semantic expression and outputs the \"with or without facial paralysis\" simple task results. In the third stage, we use spatial-temporal disentanglement to capture the spatial-temporal combinatorial-dependencies in video sequences. Finally, we input the classifier to get the results of complex tasks of facial paralysis classification. Compared with all advanced methods, MLST-Net is computationally inexpensive and achieves state-of-the-art results on the 1241 public dataset videos. It significantly benefits the digital diagnosis of facial palsy and offers innovative and explainable ideas for video-based digital medical treatment.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143541617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As the scenarios for electrocardiogram (ECG) monitoring become increasingly diverse, particularly with the development of wearable ECG, the influence of ambiguous factors in diagnosis has been amplified. Reliable ECG information must be extracted from abundant noises and confusing artifacts. To address this issue, we suggest an uncertainty-inspired model for beat-level diagnosis (UI-Beat). The base architecture of UI-Beat separates heartbeat localization and event diagnosis in two branches to address the problem of heterogeneous data sources. To disentangle the epistemic and aleatoric uncertainty within one stage in a deterministic neural network, we propose a new method derived from uncertainty formulation and realize it by introducing the class-biased transformation. Then the disentangled uncertainty can be utilized to screen out noise and identify ambiguous heartbeat synchronously. The results indicate that UI-Beat can significantly improve the performance of noise detection (from 91.60% to 97.50% for real-world noise detection and from 61.40% to 82.41% for real-world artifact detection). For multi-lead ECG analysis, UI-Beat is approaching the performance upper bound in heartbeat localization (only 15 false positives and 9 false negatives out of the 175,907 heartbeats in the INCART database) and achieving a significant performance improvement in heartbeat classification through uncertainty-based cross-lead fusion compared to single-lead prediction and other state-of-the-art methods (an average improvement of 14.28% for detecting heartbeats of S and 3.37% for detecting heartbeats of V). Considering the characteristic of one-stage ECG analysis within one model, it is suggested that the proposed UI-Beat has the potential to be employed as a general model for arbitrary scenarios of ECG monitoring, with the capacity to remove invalid episodes, and realize heartbeat-level diagnosis with confidence provided.
{"title":"Uncertainty-Inspired Multi-Task Learning in Arbitrary Scenarios of ECG Monitoring.","authors":"Xingyao Wang, Hongxiang Gao, Caiyun Ma, Tingting Zhu, Feng Yang, Chengyu Liu, Huazhu Fu","doi":"10.1109/JBHI.2025.3545927","DOIUrl":"https://doi.org/10.1109/JBHI.2025.3545927","url":null,"abstract":"<p><p>As the scenarios for electrocardiogram (ECG) monitoring become increasingly diverse, particularly with the development of wearable ECG, the influence of ambiguous factors in diagnosis has been amplified. Reliable ECG information must be extracted from abundant noises and confusing artifacts. To address this issue, we suggest an uncertainty-inspired model for beat-level diagnosis (UI-Beat). The base architecture of UI-Beat separates heartbeat localization and event diagnosis in two branches to address the problem of heterogeneous data sources. To disentangle the epistemic and aleatoric uncertainty within one stage in a deterministic neural network, we propose a new method derived from uncertainty formulation and realize it by introducing the class-biased transformation. Then the disentangled uncertainty can be utilized to screen out noise and identify ambiguous heartbeat synchronously. The results indicate that UI-Beat can significantly improve the performance of noise detection (from 91.60% to 97.50% for real-world noise detection and from 61.40% to 82.41% for real-world artifact detection). For multi-lead ECG analysis, UI-Beat is approaching the performance upper bound in heartbeat localization (only 15 false positives and 9 false negatives out of the 175,907 heartbeats in the INCART database) and achieving a significant performance improvement in heartbeat classification through uncertainty-based cross-lead fusion compared to single-lead prediction and other state-of-the-art methods (an average improvement of 14.28% for detecting heartbeats of S and 3.37% for detecting heartbeats of V). Considering the characteristic of one-stage ECG analysis within one model, it is suggested that the proposed UI-Beat has the potential to be employed as a general model for arbitrary scenarios of ECG monitoring, with the capacity to remove invalid episodes, and realize heartbeat-level diagnosis with confidence provided.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143541666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-25DOI: 10.1109/JBHI.2025.3545138
Cheng Chen, Min Deng, Yuan Zhong, Jinyue Cai, Karen Kar Wun Chan, Qi Dou, Kelvin Kam Lung Chong, Pheng-Ann Heng, Winnie Chiu-Wing Chu
Thyroid-associated orbitopathy (TAO) is a prevalent inflammatory autoimmune disorder, leading to orbital disfigurement and visual disability. Automatic comprehensive segmentation tailored for quantitative multi-modal MRI assessment of TAO holds enormous promise but is still lacking. In this paper, we propose a novel method, named cross-modal attentive self-training (CMAST), for the multi-organ segmentation in TAO using partially labeled and unaligned multi-modal MRI data. Our method first introduces a dedicatedly designed cross-modal pseudo label self-training scheme, which leverages self-training to refine the initial pseudo labels generated by cross-modal registration, so as to complete the label sets for comprehensive segmentation. With the obtained pseudo labels, we further devise a learnable attentive fusion module to aggregate multi-modal knowledge based on learned cross-modal feature attention, which relaxes the requirement of pixel-wise alignment across modalities. A prototypical contrastive learning loss is further incorporated to facilitate cross-modal feature alignment. We evaluate our method on a large clinical TAO cohort with 100 cases of multi-modal orbital MRI. The experimental results demonstrate the promising performance of our method in achieving comprehensive segmentation of TAO-affected organs on both T1 and T1c modalities, outperforming previous methods by a large margin. Code will be released upon acceptance.
{"title":"Multi-organ Segmentation from Partially Labeled and Unaligned Multi-modal MRI in Thyroid-associated Orbitopathy.","authors":"Cheng Chen, Min Deng, Yuan Zhong, Jinyue Cai, Karen Kar Wun Chan, Qi Dou, Kelvin Kam Lung Chong, Pheng-Ann Heng, Winnie Chiu-Wing Chu","doi":"10.1109/JBHI.2025.3545138","DOIUrl":"https://doi.org/10.1109/JBHI.2025.3545138","url":null,"abstract":"<p><p>Thyroid-associated orbitopathy (TAO) is a prevalent inflammatory autoimmune disorder, leading to orbital disfigurement and visual disability. Automatic comprehensive segmentation tailored for quantitative multi-modal MRI assessment of TAO holds enormous promise but is still lacking. In this paper, we propose a novel method, named cross-modal attentive self-training (CMAST), for the multi-organ segmentation in TAO using partially labeled and unaligned multi-modal MRI data. Our method first introduces a dedicatedly designed cross-modal pseudo label self-training scheme, which leverages self-training to refine the initial pseudo labels generated by cross-modal registration, so as to complete the label sets for comprehensive segmentation. With the obtained pseudo labels, we further devise a learnable attentive fusion module to aggregate multi-modal knowledge based on learned cross-modal feature attention, which relaxes the requirement of pixel-wise alignment across modalities. A prototypical contrastive learning loss is further incorporated to facilitate cross-modal feature alignment. We evaluate our method on a large clinical TAO cohort with 100 cases of multi-modal orbital MRI. The experimental results demonstrate the promising performance of our method in achieving comprehensive segmentation of TAO-affected organs on both T1 and T1c modalities, outperforming previous methods by a large margin. Code will be released upon acceptance.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143541596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-25DOI: 10.1109/JBHI.2025.3545159
June-Woo Kim, Miika Toikkanen, Amin Jalali, Minseok Kim, Hye-Ji Han, Hyunwoo Kim, Wonwoo Shin, Ho-Young Jung, Kyunghoon Kim
Despite considerable advancements in deep learning, optimizing respiratory sound classification (RSC) models remains challenging. This is partly due to the bias from inconsistent respiratory sound recording processes and imbalanced representation of demographics, which leads to poor performance when a model trained with the dataset is applied to real-world use cases. RSC datasets usually include various metadata attributes describing certain aspects of the data, such as environmental and demographic factors. To address the issues caused by bias, we take advantage of the metadata provided by RSC datasets and explore approaches for metadata-guided domain adaptation. We thoroughly evaluate the effect of various metadata attributes and their combinations on a simple metadata-guided approach, but also introduce a more advanced method that adaptively rescales the suitable metadata combinations to improve domain adaptation during training. The findings indicate a robust reduction in domain dependency and improvement in detection accuracy on both ICBHI and our own dataset. Specifically, the implementation of our proposed methods led to an improved score of 84.97%, which signifies a substantial enhancement of 7.37% compared to the baseline model.
{"title":"Adaptive Metadata-Guided Supervised Contrastive Learning for Domain Adaptation on Respiratory Sound Classification.","authors":"June-Woo Kim, Miika Toikkanen, Amin Jalali, Minseok Kim, Hye-Ji Han, Hyunwoo Kim, Wonwoo Shin, Ho-Young Jung, Kyunghoon Kim","doi":"10.1109/JBHI.2025.3545159","DOIUrl":"https://doi.org/10.1109/JBHI.2025.3545159","url":null,"abstract":"<p><p>Despite considerable advancements in deep learning, optimizing respiratory sound classification (RSC) models remains challenging. This is partly due to the bias from inconsistent respiratory sound recording processes and imbalanced representation of demographics, which leads to poor performance when a model trained with the dataset is applied to real-world use cases. RSC datasets usually include various metadata attributes describing certain aspects of the data, such as environmental and demographic factors. To address the issues caused by bias, we take advantage of the metadata provided by RSC datasets and explore approaches for metadata-guided domain adaptation. We thoroughly evaluate the effect of various metadata attributes and their combinations on a simple metadata-guided approach, but also introduce a more advanced method that adaptively rescales the suitable metadata combinations to improve domain adaptation during training. The findings indicate a robust reduction in domain dependency and improvement in detection accuracy on both ICBHI and our own dataset. Specifically, the implementation of our proposed methods led to an improved score of 84.97%, which signifies a substantial enhancement of 7.37% compared to the baseline model.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143541576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-25DOI: 10.1109/JBHI.2025.3545172
Yuping Huang, Weisheng Li, Bin Xiao, Guofen Wang, Dan He, Xiaoyu Qiao
Medical image fusion technology provides professionals with more detailed and precise diagnostic information. This paper introduces a new efficient CT and MRI fusion network, CLGFusion, based on a contrastive learning-guided network. CLGFusion includes two encoding branches at the feature encoding stage, enabling them to interact and learn from each other. The approach begins with training a single-view encoder to predict the feature representation of an image from varied augmented views. Simultaneously, the multi-view encoder is improved using the exponential moving average of the single-view encoder. Contrastive learning is integrated into medical image fusion by creating a feature contrast space without constructing negative samples. This feature contrast space cleverly uses the information of the difference in the feature product of the source image and its corresponding augmented image. It continuously guides the network to constantly optimize its fusion effect by combining the method of structural similarity loss, to achieve more accurate and efficient image fusion. This approach represents an end-to-end unsupervised fusion model. Experimental validation shows that our proposed method demonstrates performance comparable to state-of-the-art techniques in both subjective evaluation and objective metrics.
{"title":"Contrastive Learning Guided Fusion Network for Brain CT and MRI.","authors":"Yuping Huang, Weisheng Li, Bin Xiao, Guofen Wang, Dan He, Xiaoyu Qiao","doi":"10.1109/JBHI.2025.3545172","DOIUrl":"https://doi.org/10.1109/JBHI.2025.3545172","url":null,"abstract":"<p><p>Medical image fusion technology provides professionals with more detailed and precise diagnostic information. This paper introduces a new efficient CT and MRI fusion network, CLGFusion, based on a contrastive learning-guided network. CLGFusion includes two encoding branches at the feature encoding stage, enabling them to interact and learn from each other. The approach begins with training a single-view encoder to predict the feature representation of an image from varied augmented views. Simultaneously, the multi-view encoder is improved using the exponential moving average of the single-view encoder. Contrastive learning is integrated into medical image fusion by creating a feature contrast space without constructing negative samples. This feature contrast space cleverly uses the information of the difference in the feature product of the source image and its corresponding augmented image. It continuously guides the network to constantly optimize its fusion effect by combining the method of structural similarity loss, to achieve more accurate and efficient image fusion. This approach represents an end-to-end unsupervised fusion model. Experimental validation shows that our proposed method demonstrates performance comparable to state-of-the-art techniques in both subjective evaluation and objective metrics.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143541423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-25DOI: 10.1109/JBHI.2025.3543686
Nina Wiedemann, Dianne de Korte-de Boer, Matthias Richter, Sjors van de Weijer, Charlotte Buhre, Franz A M Eggert, Sophie Aarnoudse, Lotte Grevendonk, Steffen Rober, Carlijn M E Remie, Wolfgang Buhre, Ronald Henry, Jannis Born
As a lightweight and non-invasive imaging technique, lung ultrasound (LUS) has gained importance for assessing lung pathologies. The use of Artificial intelligence (AI) in medical decision support systems is promising due to the time- and expertise-intensive interpretation, however, due to the poor quality of existing data used for training AI models, their usability for real-world applications remains unclear.
Methods: In a prospective study, we analyze data from 63 COVID-19 suspects (33 positive) collected at Maastricht University Medical Centre. Ultrasound recordings at six body locations were acquired following the BLUE protocol and manually labeled for severity of lung involvement. Anamnesis and complete blood count (CBC) analyses were conducted. Several AI models were applied and trained for detection and severity of pulmonary infection.
Results: The severity of the lung infection, as assigned by human annotators based on the LUS videos, is not significantly different between COVID-19 positive and negative patients (). Nevertheless, the predictions of image-based AI models identify a COVID-19 infection with 65% accuracy when applied zero-shot (i.e., trained on other datasets), and up to 79% with targeted training, whereas the accuracy based on human annotations is at most 65%. Multi-modal models combining images and CBC improve significantly over image-only models.
Conclusion: Although our analysis generally supports the value of AI in LUS assessment, the evaluated models fall short of the performance expected from previous work. We find this is due to 1) the heterogeneity of LUS datasets, limiting the generalization ability to new data, 2) the frame-based processing of AI models ignoring video-level information, and 3) lack of work on multi-modal models that can extract the most relevant information from video-, image- and variable-based inputs. To aid future research, we publish the dataset at: https://github.com/NinaWie/COVID-BLUES.
{"title":"COVID-BLUeS - A Prospective Study on the Value of AI in Lung Ultrasound Analysis.","authors":"Nina Wiedemann, Dianne de Korte-de Boer, Matthias Richter, Sjors van de Weijer, Charlotte Buhre, Franz A M Eggert, Sophie Aarnoudse, Lotte Grevendonk, Steffen Rober, Carlijn M E Remie, Wolfgang Buhre, Ronald Henry, Jannis Born","doi":"10.1109/JBHI.2025.3543686","DOIUrl":"10.1109/JBHI.2025.3543686","url":null,"abstract":"<p><p>As a lightweight and non-invasive imaging technique, lung ultrasound (LUS) has gained importance for assessing lung pathologies. The use of Artificial intelligence (AI) in medical decision support systems is promising due to the time- and expertise-intensive interpretation, however, due to the poor quality of existing data used for training AI models, their usability for real-world applications remains unclear.</p><p><strong>Methods: </strong>In a prospective study, we analyze data from 63 COVID-19 suspects (33 positive) collected at Maastricht University Medical Centre. Ultrasound recordings at six body locations were acquired following the BLUE protocol and manually labeled for severity of lung involvement. Anamnesis and complete blood count (CBC) analyses were conducted. Several AI models were applied and trained for detection and severity of pulmonary infection.</p><p><strong>Results: </strong>The severity of the lung infection, as assigned by human annotators based on the LUS videos, is not significantly different between COVID-19 positive and negative patients (). Nevertheless, the predictions of image-based AI models identify a COVID-19 infection with 65% accuracy when applied zero-shot (i.e., trained on other datasets), and up to 79% with targeted training, whereas the accuracy based on human annotations is at most 65%. Multi-modal models combining images and CBC improve significantly over image-only models.</p><p><strong>Conclusion: </strong>Although our analysis generally supports the value of AI in LUS assessment, the evaluated models fall short of the performance expected from previous work. We find this is due to 1) the heterogeneity of LUS datasets, limiting the generalization ability to new data, 2) the frame-based processing of AI models ignoring video-level information, and 3) lack of work on multi-modal models that can extract the most relevant information from video-, image- and variable-based inputs. To aid future research, we publish the dataset at: https://github.com/NinaWie/COVID-BLUES.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143541431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-24DOI: 10.1109/JBHI.2025.3544966
Enrique Gurdiel, Fernando Vaquerizo-Villar, Javier Gomez-Pilar, Gonzalo C Gutierrez-Tobal, Felix Del Campo, Roberto Hornero
Sleep spindles are microevents of the electroencephalogram (EEG) during sleep whose functional interpretation is not fully clear. To streamline the identification process and make it more replicable, multiple automatic detectors have been proposed in the literature. Among these methods, algorithms based on deep learning usually demonstrate superior accuracy in performance assessment up to now. However, using these methods, the rationale behind the model decision-making process is hard to understand. In this study, we propose a novel machine-learning detection framework (SpinCo) based on an exhaustive sliding window feature extraction and the application of XGBoost algorithm, achieving performance close to state-of-the-art deep-learning techniques while depending on a fixed set of easily interpretable features. Additionally, we have developed a novel by-event metric for evaluation that ensures symmetricity and allows a probabilistic interpretation of the results. Through the utilization of this metric, we have enhanced the interpretability of our evaluations and enabled a direct assessment of inter-expert agreement in the manual annotation of spindle events. Finally, we propose a new type of performance assessment test based on estimations of the automatic method's ability to generalize to unseen experts and its comparison with inter-expert agreement measurements. Hence, Spinco is a robust automatic spindle detection technique that can be used for labeling raw EEG signals and shed light on the metrics used for evaluation in this problem.
{"title":"Beyond the ground truth, XGBoost model applied to sleep spindle event detection.","authors":"Enrique Gurdiel, Fernando Vaquerizo-Villar, Javier Gomez-Pilar, Gonzalo C Gutierrez-Tobal, Felix Del Campo, Roberto Hornero","doi":"10.1109/JBHI.2025.3544966","DOIUrl":"https://doi.org/10.1109/JBHI.2025.3544966","url":null,"abstract":"<p><p>Sleep spindles are microevents of the electroencephalogram (EEG) during sleep whose functional interpretation is not fully clear. To streamline the identification process and make it more replicable, multiple automatic detectors have been proposed in the literature. Among these methods, algorithms based on deep learning usually demonstrate superior accuracy in performance assessment up to now. However, using these methods, the rationale behind the model decision-making process is hard to understand. In this study, we propose a novel machine-learning detection framework (SpinCo) based on an exhaustive sliding window feature extraction and the application of XGBoost algorithm, achieving performance close to state-of-the-art deep-learning techniques while depending on a fixed set of easily interpretable features. Additionally, we have developed a novel by-event metric for evaluation that ensures symmetricity and allows a probabilistic interpretation of the results. Through the utilization of this metric, we have enhanced the interpretability of our evaluations and enabled a direct assessment of inter-expert agreement in the manual annotation of spindle events. Finally, we propose a new type of performance assessment test based on estimations of the automatic method's ability to generalize to unseen experts and its comparison with inter-expert agreement measurements. Hence, Spinco is a robust automatic spindle detection technique that can be used for labeling raw EEG signals and shed light on the metrics used for evaluation in this problem.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143541694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate diagnosis of cerebral tumors is crucial for effective clinical therapeutics and prognosis. However, limitations in brain biopsy tissues and the scarcity of pathologists specializing in cerebral tumors hinder comprehensive clinical tests for precise diagnosis. To address these challenges, we first established a brain tumor dataset of 3,520 cases collected from multiple centers. We then proposed a novel Hierarchically Optimized Multiple Instance Learning (HOMIL) method for classifying six common brain tumor types, glioma grading, and predicting the origin of brain metastatic cancers. The feature encoder and aggregator in HOMIL were trained alternately based on specific datasets and tasks. Compared to other multiple instance learning (MIL) methods, HOMIL achieved state-of-the-art performance with impressive accuracies: 93.29% / 85.60% for brain tumor classification, 91.21% / 96.93% for glioma grading, and 86.36% / 79.28% for origin determination on internal/external datasets. Additionally, HOMIL effectively located multi-scale regions of interest, enabling an in-depth analysis through features and heatmaps. Extensive visualization demonstrated HOMIL's ability to cluster features within the same type while establishing distinct boundaries between tumor types. It also identified critical areas on pathological slides, regardless of tumor size.
{"title":"Hierarchically Optimized Multiple Instance Learning With Multi-Magnification Pathological Images for Cerebral Tumor Diagnosis.","authors":"Lianghui Zhu, Renao Yan, Tian Guan, Fenfen Zhang, Linlang Guo, Qiming He, Shanshan Shi, Huijuan Shi, Yonghong He, Anjia Han","doi":"10.1109/JBHI.2025.3544612","DOIUrl":"https://doi.org/10.1109/JBHI.2025.3544612","url":null,"abstract":"<p><p>Accurate diagnosis of cerebral tumors is crucial for effective clinical therapeutics and prognosis. However, limitations in brain biopsy tissues and the scarcity of pathologists specializing in cerebral tumors hinder comprehensive clinical tests for precise diagnosis. To address these challenges, we first established a brain tumor dataset of 3,520 cases collected from multiple centers. We then proposed a novel Hierarchically Optimized Multiple Instance Learning (HOMIL) method for classifying six common brain tumor types, glioma grading, and predicting the origin of brain metastatic cancers. The feature encoder and aggregator in HOMIL were trained alternately based on specific datasets and tasks. Compared to other multiple instance learning (MIL) methods, HOMIL achieved state-of-the-art performance with impressive accuracies: 93.29% / 85.60% for brain tumor classification, 91.21% / 96.93% for glioma grading, and 86.36% / 79.28% for origin determination on internal/external datasets. Additionally, HOMIL effectively located multi-scale regions of interest, enabling an in-depth analysis through features and heatmaps. Extensive visualization demonstrated HOMIL's ability to cluster features within the same type while establishing distinct boundaries between tumor types. It also identified critical areas on pathological slides, regardless of tumor size.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143541134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-24DOI: 10.1109/JBHI.2025.3545265
Qiulei Dong, Han Zhang, Jun Xiao, Jiayin Sun
Epileptic seizure prediction from electroencephalogram (EEG) data has attracted much attention in the clinical diagnosis and treatment of epilepsy. Most of the existing methods in literature extract either spatial or temporal features at a single scale from EEG data, however, their learned features are generally less discriminative since the EEG data is complex and severely noisy in general, leading to low-accuracy predictions. To address this problem, we propose a Multi-scale Spatio-temporal Attention Network to learn discriminative features for seizure prediction, called MSAN, which contains a backbone module, a spatial pyramid module, and a multi-scale sequential aggregation module. The backbone module is to extract initial spatial features from the input EEG spectrograms, and the pyramid module is introduced to learn multi-scale features from the initial features. Then by taking these multi-scale features as input temporal features, the sequential aggregation module employs multiple Long Short-Term Memory(LSTM) blocks to aggregate these features. In addition, a dual-loss function is introduced to alleviate the class imbalance problem. The proposed method achieves an average sensitivity of 96.27% with a mean false prediction rate of 0.00/h on the CHB-MIT dataset and an average sensitivity of 93.57% with a mean false prediction rate of 0.044/h on the Kaggle dataset. The comparative results demonstrate that the proposed method outperforms 10 state-of-the-art epileptic seizure prediction models.
{"title":"Multi-Scale Spatio-Temporal Attention Network for Epileptic Seizure Prediction.","authors":"Qiulei Dong, Han Zhang, Jun Xiao, Jiayin Sun","doi":"10.1109/JBHI.2025.3545265","DOIUrl":"https://doi.org/10.1109/JBHI.2025.3545265","url":null,"abstract":"<p><p>Epileptic seizure prediction from electroencephalogram (EEG) data has attracted much attention in the clinical diagnosis and treatment of epilepsy. Most of the existing methods in literature extract either spatial or temporal features at a single scale from EEG data, however, their learned features are generally less discriminative since the EEG data is complex and severely noisy in general, leading to low-accuracy predictions. To address this problem, we propose a Multi-scale Spatio-temporal Attention Network to learn discriminative features for seizure prediction, called MSAN, which contains a backbone module, a spatial pyramid module, and a multi-scale sequential aggregation module. The backbone module is to extract initial spatial features from the input EEG spectrograms, and the pyramid module is introduced to learn multi-scale features from the initial features. Then by taking these multi-scale features as input temporal features, the sequential aggregation module employs multiple Long Short-Term Memory(LSTM) blocks to aggregate these features. In addition, a dual-loss function is introduced to alleviate the class imbalance problem. The proposed method achieves an average sensitivity of 96.27% with a mean false prediction rate of 0.00/h on the CHB-MIT dataset and an average sensitivity of 93.57% with a mean false prediction rate of 0.044/h on the Kaggle dataset. The comparative results demonstrate that the proposed method outperforms 10 state-of-the-art epileptic seizure prediction models.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143541601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-24DOI: 10.1109/JBHI.2025.3544548
Pengchen Liang, Leijun Shi, Bin Pu, Renkai Wu, Jianguo Chen, Lixin Zhou, Lite Xu, Zhuangzhuang Chen, Qing Chang, Yiwei Li
The Segment Anything Model (SAM) has shown exceptional versatility in segmentation tasks across various natural image scenarios. However, its application to medical image segmentation poses significant challenges due to the intricate anatomical details and domain-specific characteristics inherent in medical images. To address these challenges, we propose a novel VMamba adapter framework that integrates a lightweight, trainable Visual Mamba (VMamba) branch with the pre-trained SAM ViT encoder. The VMamba adapter accurately captures multi-scale contextual correlations, integrates global and local information, and reduces ambiguities arising from local features only. Specifically, we propose a novel cross-branch attention (CBA) mechanism to facilitate effective interaction between the SAM and VMamba branches. This mechanism enables the model to learn and adapt more efficiently to the nuances of medical images, extracting rich, complementary features that enhance its representational capacity. Beyond architectural enhancements, we streamline the segmentation workflow by eliminating the need for prompt-driven input mechanisms. This results in an autonomous prediction model that reduces manual input requirements and improves operational efficiency. In addition, our method introduces only minimal additional trainable parameters, offering an efficient solution for medical image segmentation. Extensive evaluations of four medical image datasets demonstrate that our VMamba adapter framework achieves state-of-the-art performance. Specifically, on the ACDC dataset with limited training data, our method achieves an average Dice coefficient improvement of 0.18 and reduces the Hausdorff distance by 20.38 mm compared to the AutoSAM.
{"title":"MambaSAM: A Visual Mamba-Adapted SAM Framework for Medical Image Segmentation.","authors":"Pengchen Liang, Leijun Shi, Bin Pu, Renkai Wu, Jianguo Chen, Lixin Zhou, Lite Xu, Zhuangzhuang Chen, Qing Chang, Yiwei Li","doi":"10.1109/JBHI.2025.3544548","DOIUrl":"https://doi.org/10.1109/JBHI.2025.3544548","url":null,"abstract":"<p><p>The Segment Anything Model (SAM) has shown exceptional versatility in segmentation tasks across various natural image scenarios. However, its application to medical image segmentation poses significant challenges due to the intricate anatomical details and domain-specific characteristics inherent in medical images. To address these challenges, we propose a novel VMamba adapter framework that integrates a lightweight, trainable Visual Mamba (VMamba) branch with the pre-trained SAM ViT encoder. The VMamba adapter accurately captures multi-scale contextual correlations, integrates global and local information, and reduces ambiguities arising from local features only. Specifically, we propose a novel cross-branch attention (CBA) mechanism to facilitate effective interaction between the SAM and VMamba branches. This mechanism enables the model to learn and adapt more efficiently to the nuances of medical images, extracting rich, complementary features that enhance its representational capacity. Beyond architectural enhancements, we streamline the segmentation workflow by eliminating the need for prompt-driven input mechanisms. This results in an autonomous prediction model that reduces manual input requirements and improves operational efficiency. In addition, our method introduces only minimal additional trainable parameters, offering an efficient solution for medical image segmentation. Extensive evaluations of four medical image datasets demonstrate that our VMamba adapter framework achieves state-of-the-art performance. Specifically, on the ACDC dataset with limited training data, our method achieves an average Dice coefficient improvement of 0.18 and reduces the Hausdorff distance by 20.38 mm compared to the AutoSAM.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143541570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}