EEGCiD: EEG Condensation Into Diffusion Model

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-10-30 DOI:10.1109/TASE.2024.3486203

Junfu Chen;Dechang Pi;Xiaoyi Jiang;Feng Gao;Bi Wang;Yang Chen

{"title":"EEGCiD: EEG Condensation Into Diffusion Model","authors":"Junfu Chen;Dechang Pi;Xiaoyi Jiang;Feng Gao;Bi Wang;Yang Chen","doi":"10.1109/TASE.2024.3486203","DOIUrl":null,"url":null,"abstract":"Electroencephalography (EEG)-based applications in Brain-Computer Interfaces (BCIs), neurological disease diagnosis, rehabilitation, and other areas rely on the utilization of extensive data for model development. Nevertheless, this raises concerns regarding storage and privacy, since model development needs a significant amount of data, and EEG sharing discloses sensitive information such as identity and health. To address this challenging problem, we provide the paradigm of EEG condensation, aiming to generate a synthetic sample set that is highly information-concentrated yet not visually similar. Correspondingly, we propose a novel dataset condensation framework where the knowledge of the original EEG dataset is condensed into diffusion models, named EEGCiD. Specifically, EEGCiD first utilizes a deterministic denoising diffusion implicit model (DDIM) to store the information of the original dataset and optimizes the condensation latent codes z to obtain the EEG condensation dataset. Further, to enhance the modeling of EEG knowledge in DDIM, we design a transformer architecture incorporating the spatial and temporal self-attention block (STSA) to replace the traditional U-Net backbone. In the condensation phase, EEGCiD randomly initializes a subset of samples from the original dataset to obtain the condensation latent codes z through the forward process in DDIM. Then, it optimizes z by matching the feature distributions in multiple EEG decoding models between the synthetic samples and the original dataset. Extensive experiments across three EEG datasets demonstrate that the condensation dataset from the proposed model not only achieves superior classification performance with limited sample sizes, but also effectively prevents membership inference attacks (MIA). Note to Practitioners—This paper aims to investigate a novel EEG generation paradigm that extracts representative synthetic samples from large-scale datasets. Existing studies in EEG generation primarily concentrate on generating real-like signals, and some work claims that the generated EEG can serve as a substitute for the original dataset to achieve privacy preservation. In the EEGCiD framework, the deterministic DDIM is pre-trained with the original dataset to store the knowledge. Besides, an ensemble feature matching strategy is proposed to condense the information from the original dataset into a small latent code set. Experiments on three datasets demonstrate that EEGCiD addresses two fundamental challenges: 1) obtaining superior classification performance within a small dataset (limited storage capacity); 2) avoiding potential privacy issues during EEG sharing and transmission.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"8502-8518"},"PeriodicalIF":6.4000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10738504/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Electroencephalography (EEG)-based applications in Brain-Computer Interfaces (BCIs), neurological disease diagnosis, rehabilitation, and other areas rely on the utilization of extensive data for model development. Nevertheless, this raises concerns regarding storage and privacy, since model development needs a significant amount of data, and EEG sharing discloses sensitive information such as identity and health. To address this challenging problem, we provide the paradigm of EEG condensation, aiming to generate a synthetic sample set that is highly information-concentrated yet not visually similar. Correspondingly, we propose a novel dataset condensation framework where the knowledge of the original EEG dataset is condensed into diffusion models, named EEGCiD. Specifically, EEGCiD first utilizes a deterministic denoising diffusion implicit model (DDIM) to store the information of the original dataset and optimizes the condensation latent codes z to obtain the EEG condensation dataset. Further, to enhance the modeling of EEG knowledge in DDIM, we design a transformer architecture incorporating the spatial and temporal self-attention block (STSA) to replace the traditional U-Net backbone. In the condensation phase, EEGCiD randomly initializes a subset of samples from the original dataset to obtain the condensation latent codes z through the forward process in DDIM. Then, it optimizes z by matching the feature distributions in multiple EEG decoding models between the synthetic samples and the original dataset. Extensive experiments across three EEG datasets demonstrate that the condensation dataset from the proposed model not only achieves superior classification performance with limited sample sizes, but also effectively prevents membership inference attacks (MIA). Note to Practitioners—This paper aims to investigate a novel EEG generation paradigm that extracts representative synthetic samples from large-scale datasets. Existing studies in EEG generation primarily concentrate on generating real-like signals, and some work claims that the generated EEG can serve as a substitute for the original dataset to achieve privacy preservation. In the EEGCiD framework, the deterministic DDIM is pre-trained with the original dataset to store the knowledge. Besides, an ensemble feature matching strategy is proposed to condense the information from the original dataset into a small latent code set. Experiments on three datasets demonstrate that EEGCiD addresses two fundamental challenges: 1) obtaining superior classification performance within a small dataset (limited storage capacity); 2) avoiding potential privacy issues during EEG sharing and transmission.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

EEGCiD：脑电图浓缩扩散模型

基于脑电图（EEG）的应用在脑机接口（bci）、神经疾病诊断、康复和其他领域依赖于利用大量数据进行模型开发。然而，这引起了对存储和隐私的担忧，因为模型开发需要大量数据，而脑电图共享会泄露身份和健康等敏感信息。为了解决这一具有挑战性的问题，我们提供了EEG凝聚的范例，旨在生成一个高度信息集中但视觉上不相似的合成样本集。相应地，我们提出了一种新的数据集浓缩框架，将原始脑电图数据集的知识浓缩到扩散模型中，称为EEGCiD。具体而言，EEGCiD首先利用确定性去噪扩散隐式模型（DDIM）存储原始数据集的信息，并对凝聚潜码z进行优化，得到脑电凝聚数据集。此外，为了增强DDIM中脑电知识的建模能力，我们设计了一种包含时空自注意块（STSA）的变压器架构来取代传统的U-Net骨干网。在凝结阶段，EEGCiD从原始数据集中随机初始化一个样本子集，通过DDIM中的前向处理获得凝结潜码z。然后，通过匹配合成样本与原始数据集之间多个EEG解码模型中的特征分布来优化z。在三个EEG数据集上的大量实验表明，该模型的压缩数据集不仅在有限的样本容量下取得了优异的分类性能，而且有效地防止了隶属度推理攻击（MIA）。从业人员注意：本文旨在研究一种新的脑电图生成范式，从大规模数据集中提取有代表性的合成样本。现有的脑电图生成研究主要集中在生成逼真的信号上，一些研究声称生成的脑电图可以替代原始数据集来实现隐私保护。在EEGCiD框架中，使用原始数据集对确定性DDIM进行预训练以存储知识。此外，提出了一种集成特征匹配策略，将原始数据集中的信息压缩成一个小的潜在代码集。在三个数据集上的实验表明，EEGCiD解决了两个基本挑战：1)在小数据集（有限的存储容量）中获得优异的分类性能；2)避免在EEG共享和传输过程中潜在的隐私问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.