{"title":"EEGCiD: EEG Condensation Into Diffusion Model","authors":"Junfu Chen;Dechang Pi;Xiaoyi Jiang;Feng Gao;Bi Wang;Yang Chen","doi":"10.1109/TASE.2024.3486203","DOIUrl":null,"url":null,"abstract":"Electroencephalography (EEG)-based applications in Brain-Computer Interfaces (BCIs), neurological disease diagnosis, rehabilitation, and other areas rely on the utilization of extensive data for model development. Nevertheless, this raises concerns regarding storage and privacy, since model development needs a significant amount of data, and EEG sharing discloses sensitive information such as identity and health. To address this challenging problem, we provide the paradigm of EEG condensation, aiming to generate a synthetic sample set that is highly information-concentrated yet not visually similar. Correspondingly, we propose a novel dataset condensation framework where the knowledge of the original EEG dataset is condensed into diffusion models, named EEGCiD. Specifically, EEGCiD first utilizes a deterministic denoising diffusion implicit model (DDIM) to store the information of the original dataset and optimizes the condensation latent codes z to obtain the EEG condensation dataset. Further, to enhance the modeling of EEG knowledge in DDIM, we design a transformer architecture incorporating the spatial and temporal self-attention block (STSA) to replace the traditional U-Net backbone. In the condensation phase, EEGCiD randomly initializes a subset of samples from the original dataset to obtain the condensation latent codes z through the forward process in DDIM. Then, it optimizes z by matching the feature distributions in multiple EEG decoding models between the synthetic samples and the original dataset. Extensive experiments across three EEG datasets demonstrate that the condensation dataset from the proposed model not only achieves superior classification performance with limited sample sizes, but also effectively prevents membership inference attacks (MIA). Note to Practitioners—This paper aims to investigate a novel EEG generation paradigm that extracts representative synthetic samples from large-scale datasets. Existing studies in EEG generation primarily concentrate on generating real-like signals, and some work claims that the generated EEG can serve as a substitute for the original dataset to achieve privacy preservation. In the EEGCiD framework, the deterministic DDIM is pre-trained with the original dataset to store the knowledge. Besides, an ensemble feature matching strategy is proposed to condense the information from the original dataset into a small latent code set. Experiments on three datasets demonstrate that EEGCiD addresses two fundamental challenges: 1) obtaining superior classification performance within a small dataset (limited storage capacity); 2) avoiding potential privacy issues during EEG sharing and transmission.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"8502-8518"},"PeriodicalIF":6.4000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10738504/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Electroencephalography (EEG)-based applications in Brain-Computer Interfaces (BCIs), neurological disease diagnosis, rehabilitation, and other areas rely on the utilization of extensive data for model development. Nevertheless, this raises concerns regarding storage and privacy, since model development needs a significant amount of data, and EEG sharing discloses sensitive information such as identity and health. To address this challenging problem, we provide the paradigm of EEG condensation, aiming to generate a synthetic sample set that is highly information-concentrated yet not visually similar. Correspondingly, we propose a novel dataset condensation framework where the knowledge of the original EEG dataset is condensed into diffusion models, named EEGCiD. Specifically, EEGCiD first utilizes a deterministic denoising diffusion implicit model (DDIM) to store the information of the original dataset and optimizes the condensation latent codes z to obtain the EEG condensation dataset. Further, to enhance the modeling of EEG knowledge in DDIM, we design a transformer architecture incorporating the spatial and temporal self-attention block (STSA) to replace the traditional U-Net backbone. In the condensation phase, EEGCiD randomly initializes a subset of samples from the original dataset to obtain the condensation latent codes z through the forward process in DDIM. Then, it optimizes z by matching the feature distributions in multiple EEG decoding models between the synthetic samples and the original dataset. Extensive experiments across three EEG datasets demonstrate that the condensation dataset from the proposed model not only achieves superior classification performance with limited sample sizes, but also effectively prevents membership inference attacks (MIA). Note to Practitioners—This paper aims to investigate a novel EEG generation paradigm that extracts representative synthetic samples from large-scale datasets. Existing studies in EEG generation primarily concentrate on generating real-like signals, and some work claims that the generated EEG can serve as a substitute for the original dataset to achieve privacy preservation. In the EEGCiD framework, the deterministic DDIM is pre-trained with the original dataset to store the knowledge. Besides, an ensemble feature matching strategy is proposed to condense the information from the original dataset into a small latent code set. Experiments on three datasets demonstrate that EEGCiD addresses two fundamental challenges: 1) obtaining superior classification performance within a small dataset (limited storage capacity); 2) avoiding potential privacy issues during EEG sharing and transmission.
期刊介绍:
The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.