EEGCiD: EEG Condensation Into Diffusion Model

IF 6.4 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-10-30 DOI:10.1109/TASE.2024.3486203
Junfu Chen;Dechang Pi;Xiaoyi Jiang;Feng Gao;Bi Wang;Yang Chen
{"title":"EEGCiD: EEG Condensation Into Diffusion Model","authors":"Junfu Chen;Dechang Pi;Xiaoyi Jiang;Feng Gao;Bi Wang;Yang Chen","doi":"10.1109/TASE.2024.3486203","DOIUrl":null,"url":null,"abstract":"Electroencephalography (EEG)-based applications in Brain-Computer Interfaces (BCIs), neurological disease diagnosis, rehabilitation, and other areas rely on the utilization of extensive data for model development. Nevertheless, this raises concerns regarding storage and privacy, since model development needs a significant amount of data, and EEG sharing discloses sensitive information such as identity and health. To address this challenging problem, we provide the paradigm of EEG condensation, aiming to generate a synthetic sample set that is highly information-concentrated yet not visually similar. Correspondingly, we propose a novel dataset condensation framework where the knowledge of the original EEG dataset is condensed into diffusion models, named EEGCiD. Specifically, EEGCiD first utilizes a deterministic denoising diffusion implicit model (DDIM) to store the information of the original dataset and optimizes the condensation latent codes z to obtain the EEG condensation dataset. Further, to enhance the modeling of EEG knowledge in DDIM, we design a transformer architecture incorporating the spatial and temporal self-attention block (STSA) to replace the traditional U-Net backbone. In the condensation phase, EEGCiD randomly initializes a subset of samples from the original dataset to obtain the condensation latent codes z through the forward process in DDIM. Then, it optimizes z by matching the feature distributions in multiple EEG decoding models between the synthetic samples and the original dataset. Extensive experiments across three EEG datasets demonstrate that the condensation dataset from the proposed model not only achieves superior classification performance with limited sample sizes, but also effectively prevents membership inference attacks (MIA). Note to Practitioners—This paper aims to investigate a novel EEG generation paradigm that extracts representative synthetic samples from large-scale datasets. Existing studies in EEG generation primarily concentrate on generating real-like signals, and some work claims that the generated EEG can serve as a substitute for the original dataset to achieve privacy preservation. In the EEGCiD framework, the deterministic DDIM is pre-trained with the original dataset to store the knowledge. Besides, an ensemble feature matching strategy is proposed to condense the information from the original dataset into a small latent code set. Experiments on three datasets demonstrate that EEGCiD addresses two fundamental challenges: 1) obtaining superior classification performance within a small dataset (limited storage capacity); 2) avoiding potential privacy issues during EEG sharing and transmission.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"8502-8518"},"PeriodicalIF":6.4000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10738504/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Electroencephalography (EEG)-based applications in Brain-Computer Interfaces (BCIs), neurological disease diagnosis, rehabilitation, and other areas rely on the utilization of extensive data for model development. Nevertheless, this raises concerns regarding storage and privacy, since model development needs a significant amount of data, and EEG sharing discloses sensitive information such as identity and health. To address this challenging problem, we provide the paradigm of EEG condensation, aiming to generate a synthetic sample set that is highly information-concentrated yet not visually similar. Correspondingly, we propose a novel dataset condensation framework where the knowledge of the original EEG dataset is condensed into diffusion models, named EEGCiD. Specifically, EEGCiD first utilizes a deterministic denoising diffusion implicit model (DDIM) to store the information of the original dataset and optimizes the condensation latent codes z to obtain the EEG condensation dataset. Further, to enhance the modeling of EEG knowledge in DDIM, we design a transformer architecture incorporating the spatial and temporal self-attention block (STSA) to replace the traditional U-Net backbone. In the condensation phase, EEGCiD randomly initializes a subset of samples from the original dataset to obtain the condensation latent codes z through the forward process in DDIM. Then, it optimizes z by matching the feature distributions in multiple EEG decoding models between the synthetic samples and the original dataset. Extensive experiments across three EEG datasets demonstrate that the condensation dataset from the proposed model not only achieves superior classification performance with limited sample sizes, but also effectively prevents membership inference attacks (MIA). Note to Practitioners—This paper aims to investigate a novel EEG generation paradigm that extracts representative synthetic samples from large-scale datasets. Existing studies in EEG generation primarily concentrate on generating real-like signals, and some work claims that the generated EEG can serve as a substitute for the original dataset to achieve privacy preservation. In the EEGCiD framework, the deterministic DDIM is pre-trained with the original dataset to store the knowledge. Besides, an ensemble feature matching strategy is proposed to condense the information from the original dataset into a small latent code set. Experiments on three datasets demonstrate that EEGCiD addresses two fundamental challenges: 1) obtaining superior classification performance within a small dataset (limited storage capacity); 2) avoiding potential privacy issues during EEG sharing and transmission.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
EEGCiD:脑电图浓缩扩散模型
基于脑电图(EEG)的应用在脑机接口(bci)、神经疾病诊断、康复和其他领域依赖于利用大量数据进行模型开发。然而,这引起了对存储和隐私的担忧,因为模型开发需要大量数据,而脑电图共享会泄露身份和健康等敏感信息。为了解决这一具有挑战性的问题,我们提供了EEG凝聚的范例,旨在生成一个高度信息集中但视觉上不相似的合成样本集。相应地,我们提出了一种新的数据集浓缩框架,将原始脑电图数据集的知识浓缩到扩散模型中,称为EEGCiD。具体而言,EEGCiD首先利用确定性去噪扩散隐式模型(DDIM)存储原始数据集的信息,并对凝聚潜码z进行优化,得到脑电凝聚数据集。此外,为了增强DDIM中脑电知识的建模能力,我们设计了一种包含时空自注意块(STSA)的变压器架构来取代传统的U-Net骨干网。在凝结阶段,EEGCiD从原始数据集中随机初始化一个样本子集,通过DDIM中的前向处理获得凝结潜码z。然后,通过匹配合成样本与原始数据集之间多个EEG解码模型中的特征分布来优化z。在三个EEG数据集上的大量实验表明,该模型的压缩数据集不仅在有限的样本容量下取得了优异的分类性能,而且有效地防止了隶属度推理攻击(MIA)。从业人员注意:本文旨在研究一种新的脑电图生成范式,从大规模数据集中提取有代表性的合成样本。现有的脑电图生成研究主要集中在生成逼真的信号上,一些研究声称生成的脑电图可以替代原始数据集来实现隐私保护。在EEGCiD框架中,使用原始数据集对确定性DDIM进行预训练以存储知识。此外,提出了一种集成特征匹配策略,将原始数据集中的信息压缩成一个小的潜在代码集。在三个数据集上的实验表明,EEGCiD解决了两个基本挑战:1)在小数据集(有限的存储容量)中获得优异的分类性能;2)避免在EEG共享和传输过程中潜在的隐私问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Automation Science and Engineering
IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统
CiteScore
12.50
自引率
14.30%
发文量
404
审稿时长
3.0 months
期刊介绍: The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.
期刊最新文献
Dynamic Event-triggered H ∞ Control for Networked Control Systems with Stochastic Delay: An Envelope-Guided Partial Reset Approach Embedded Grating Sensing and Compensation Enabling Cross-Scale Nanopositioning Robotic Non-Contact Three-Dimensional Micromanipulation by Acoustohydrodynamic Effects SETKNet: Stochastic Event-Triggered Kalman Net with Sensor Scheduling for Remote State Estimation Important-Data-Based Attack Strategy and Resilient H ∞ Estimator Design for Autonomous Vehicle
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1