{"title":"CrossConvPyramid: Deep Multimodal Fusion for Epileptic Magnetoencephalography Spike Detection.","authors":"Liang Zhang, Shurong Sheng, Xiongfei Wang, Jia-Hong Gao, Yi Sun, Kuntao Xiao, Wanli Yang, Pengfei Teng, Guoming Luan, Zhao Lv","doi":"10.1109/JBHI.2025.3538582","DOIUrl":null,"url":null,"abstract":"<p><p>Magnetoencephalography (MEG) is a vital non-invasive tool for epilepsy analysis, as it captures high-resolution signals that reflect changes in brain activity over time. The automated detection of epileptic spikes within these signals can significantly reduce the labor and time required for manual annotation of MEG recording data, thereby aiding clinicians in identifying epileptogenic foci and evaluating treatment prognosis. Research in this domain often utilizes the raw, multi-channel signals from MEG scans for spike detection, commonly neglecting the multi-channel spiking patterns from spatially adjacent channels. Moreover, epileptic spikes share considerable morphological similarities with artifact signals within the recordings, posing a challenge for models to differentiate between the two. In this paper, we introduce a multimodal fusion framework that addresses these two challenges collectively. Instead of relying solely on the signal recordings, our framework also mines knowledge from their corresponding topography-map images, which encapsulate the spatial context and amplitude distribution of the input signals. To facilitate more effective data fusion, we present a novel multimodal feature fusion technique called CrossConvPyramid, built upon a convolutional pyramid architecture augmented by an attention mechanism. It initially employs cross-attention and a convolutional pyramid to encode inter-modal correlations within the intermediate features extracted by individual unimodal networks. Subsequently, it utilizes a self-attention mechanism to refine and select the most salient features from both inter-modal and unimodal features, specifically tailored for the spike classification task. Our method achieved the average F1 scores of 92.88% and 95.23% across two distinct real-world MEG datasets from separate centers, respectively outperforming the current state-of-the-art by 2.31% and 0.88%. We plan to release the code on GitHub later.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3538582","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Magnetoencephalography (MEG) is a vital non-invasive tool for epilepsy analysis, as it captures high-resolution signals that reflect changes in brain activity over time. The automated detection of epileptic spikes within these signals can significantly reduce the labor and time required for manual annotation of MEG recording data, thereby aiding clinicians in identifying epileptogenic foci and evaluating treatment prognosis. Research in this domain often utilizes the raw, multi-channel signals from MEG scans for spike detection, commonly neglecting the multi-channel spiking patterns from spatially adjacent channels. Moreover, epileptic spikes share considerable morphological similarities with artifact signals within the recordings, posing a challenge for models to differentiate between the two. In this paper, we introduce a multimodal fusion framework that addresses these two challenges collectively. Instead of relying solely on the signal recordings, our framework also mines knowledge from their corresponding topography-map images, which encapsulate the spatial context and amplitude distribution of the input signals. To facilitate more effective data fusion, we present a novel multimodal feature fusion technique called CrossConvPyramid, built upon a convolutional pyramid architecture augmented by an attention mechanism. It initially employs cross-attention and a convolutional pyramid to encode inter-modal correlations within the intermediate features extracted by individual unimodal networks. Subsequently, it utilizes a self-attention mechanism to refine and select the most salient features from both inter-modal and unimodal features, specifically tailored for the spike classification task. Our method achieved the average F1 scores of 92.88% and 95.23% across two distinct real-world MEG datasets from separate centers, respectively outperforming the current state-of-the-art by 2.31% and 0.88%. We plan to release the code on GitHub later.
期刊介绍:
IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.