{"title":"A Multi-view Spectral-Spatial-Temporal Masked Autoencoder for Decoding Emotions with Self-supervised Learning","authors":"Rui Li, Yiting Wang, Wei-Long Zheng, Baoliang Lu","doi":"10.1145/3503161.3548243","DOIUrl":null,"url":null,"abstract":"Affective Brain-computer Interface has achieved considerable advances that researchers can successfully interpret labeled and flawless EEG data collected in laboratory settings. However, the annotation of EEG data is time-consuming and requires a vast workforce which limits the application in practical scenarios. Furthermore, daily collected EEG data may be partially damaged since EEG signals are sensitive to noise. In this paper, we propose a Multi-view Spectral-Spatial-Temporal Masked Autoencoder (MV-SSTMA) with self-supervised learning to tackle these challenges towards daily applications. The MV-SSTMA is based on a multi-view CNN-Transformer hybrid structure, interpreting the emotion-related knowledge of EEG signals from spectral, spatial, and temporal perspectives. Our model consists of three stages: 1) In the generalized pre-training stage, channels of unlabeled EEG data from all subjects are randomly masked and later reconstructed to learn the generic representations from EEG data; 2) In the personalized calibration stage, only few labeled data from a specific subject are used to calibrate the model; 3) In the personal test stage, our model can decode personal emotions from the sound EEG data as well as damaged ones with missing channels. Extensive experiments on two open emotional EEG datasets demonstrate that our proposed model achieves state-of-the-art performance on emotion recognition. In addition, under the abnormal circumstance of missing channels, the proposed model can still effectively recognize emotions.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503161.3548243","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Affective Brain-computer Interface has achieved considerable advances that researchers can successfully interpret labeled and flawless EEG data collected in laboratory settings. However, the annotation of EEG data is time-consuming and requires a vast workforce which limits the application in practical scenarios. Furthermore, daily collected EEG data may be partially damaged since EEG signals are sensitive to noise. In this paper, we propose a Multi-view Spectral-Spatial-Temporal Masked Autoencoder (MV-SSTMA) with self-supervised learning to tackle these challenges towards daily applications. The MV-SSTMA is based on a multi-view CNN-Transformer hybrid structure, interpreting the emotion-related knowledge of EEG signals from spectral, spatial, and temporal perspectives. Our model consists of three stages: 1) In the generalized pre-training stage, channels of unlabeled EEG data from all subjects are randomly masked and later reconstructed to learn the generic representations from EEG data; 2) In the personalized calibration stage, only few labeled data from a specific subject are used to calibrate the model; 3) In the personal test stage, our model can decode personal emotions from the sound EEG data as well as damaged ones with missing channels. Extensive experiments on two open emotional EEG datasets demonstrate that our proposed model achieves state-of-the-art performance on emotion recognition. In addition, under the abnormal circumstance of missing channels, the proposed model can still effectively recognize emotions.