A Deep Spatio-Temporal Model for EEG-Based Imagined Speech Recognition

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI:10.1109/ICASSP39728.2021.9413989

Pradeep Kumar, E. Scheme

{"title":"A Deep Spatio-Temporal Model for EEG-Based Imagined Speech Recognition","authors":"Pradeep Kumar, E. Scheme","doi":"10.1109/ICASSP39728.2021.9413989","DOIUrl":null,"url":null,"abstract":"Automatic speech recognition interfaces are becoming increasingly pervasive in daily life as a means of interacting with and controlling electronic devices. Current speech interfaces, however, are infeasible for a variety of users and use cases, such as patients who suffer from locked-in syndrome or those who need privacy. In these cases, an interface that works based on envisioned speech, the idea of imagining what one wants to say, could be of benefit. Consequently, in this work, we propose an imagined speech Brain-Computer-Interface (BCI) using Electroencephalogram (EEG) signals. EEG signals are processed using a deep spatio-temporal learning architecture with 1D Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM), respectively. LSTM units are implemented in a many-to-many fashion to produce a time series of imagined speech outputs. Using this series, the performance of the system is boosted using majority vote (MV) post-processing to further improve results. The performance is evaluated on two publicly available datasets; one to test the performance of the tuned model, and another to test its generalization to a new dataset. The proposed architecture outperforms previous results with improvements of up to 23.7%.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP39728.2021.9413989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Automatic speech recognition interfaces are becoming increasingly pervasive in daily life as a means of interacting with and controlling electronic devices. Current speech interfaces, however, are infeasible for a variety of users and use cases, such as patients who suffer from locked-in syndrome or those who need privacy. In these cases, an interface that works based on envisioned speech, the idea of imagining what one wants to say, could be of benefit. Consequently, in this work, we propose an imagined speech Brain-Computer-Interface (BCI) using Electroencephalogram (EEG) signals. EEG signals are processed using a deep spatio-temporal learning architecture with 1D Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM), respectively. LSTM units are implemented in a many-to-many fashion to produce a time series of imagined speech outputs. Using this series, the performance of the system is boosted using majority vote (MV) post-processing to further improve results. The performance is evaluated on two publicly available datasets; one to test the performance of the tuned model, and another to test its generalization to a new dataset. The proposed architecture outperforms previous results with improvements of up to 23.7%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于脑电图的想象语音识别的深度时空模型

自动语音识别接口作为一种与电子设备交互和控制的手段，在日常生活中越来越普遍。然而，目前的语音界面对于许多用户和用例来说都是不可行的，比如患有闭锁综合症的患者或需要隐私的人。在这些情况下，一个基于想象语言的界面，想象一个人想说什么，可能是有益的。因此，在这项工作中，我们提出了一个使用脑电图(EEG)信号的想象语音脑机接口(BCI)。脑电信号分别采用一维卷积神经网络(cnn)和长短期记忆(LSTM)的深度时空学习架构进行处理。LSTM单元以多对多的方式实现，以产生想象语音输出的时间序列。利用该系列，采用多数投票(MV)后处理进一步提高了系统的性能。性能在两个公开可用的数据集上进行评估;一个用于测试调优模型的性能，另一个用于测试其对新数据集的泛化。所提出的架构优于先前的结果，改进幅度高达23.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量