Chisco: An EEG-based BCI dataset for decoding of imagined speech.

IF 5.8 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Scientific Data Pub Date : 2024-11-21 DOI:10.1038/s41597-024-04114-1

Zihan Zhang, Xiao Ding, Yu Bao, Yi Zhao, Xia Liang, Bing Qin, Ting Liu

{"title":"Chisco: An EEG-based BCI dataset for decoding of imagined speech.","authors":"Zihan Zhang, Xiao Ding, Yu Bao, Yi Zhao, Xia Liang, Bing Qin, Ting Liu","doi":"10.1038/s41597-024-04114-1","DOIUrl":null,"url":null,"abstract":"<p><p>The rapid advancement of deep learning has enabled Brain-Computer Interfaces (BCIs) technology, particularly neural decoding techniques, to achieve higher accuracy and deeper levels of interpretation. Interest in decoding imagined speech has significantly increased because its concept akin to \"mind reading\". However, previous studies on decoding neural language have predominantly focused on brain activity patterns during human reading. The absence of imagined speech electroencephalography (EEG) datasets has constrained further research in this field. We present the Chinese Imagined Speech Corpus (Chisco), including over 20,000 sentences of high-density EEG recordings of imagined speech from healthy adults. Each subject's EEG data exceeds 900 minutes, representing the largest dataset per individual currently available for decoding neural language to date. Furthermore, the experimental stimuli include over 6,000 everyday phrases across 39 semantic categories, covering nearly all aspects of daily language. We believe that Chisco represents a valuable resource for the fields of BCIs, facilitating the development of more user-friendly BCIs.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1265"},"PeriodicalIF":5.8000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-024-04114-1","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid advancement of deep learning has enabled Brain-Computer Interfaces (BCIs) technology, particularly neural decoding techniques, to achieve higher accuracy and deeper levels of interpretation. Interest in decoding imagined speech has significantly increased because its concept akin to "mind reading". However, previous studies on decoding neural language have predominantly focused on brain activity patterns during human reading. The absence of imagined speech electroencephalography (EEG) datasets has constrained further research in this field. We present the Chinese Imagined Speech Corpus (Chisco), including over 20,000 sentences of high-density EEG recordings of imagined speech from healthy adults. Each subject's EEG data exceeds 900 minutes, representing the largest dataset per individual currently available for decoding neural language to date. Furthermore, the experimental stimuli include over 6,000 everyday phrases across 39 semantic categories, covering nearly all aspects of daily language. We believe that Chisco represents a valuable resource for the fields of BCIs, facilitating the development of more user-friendly BCIs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Chisco：基于脑电图的 BCI 数据集，用于解码想象中的语音。

深度学习的快速发展使得脑机接口（BCI）技术，尤其是神经解码技术，能够实现更高精度和更深层次的解读。由于想象语音的概念类似于 "读心术"，人们对其解码的兴趣大增。然而，以往对神经语言解码的研究主要集中在人类阅读时的大脑活动模式。想象语音脑电图（EEG）数据集的缺乏限制了这一领域的进一步研究。我们提出了中国想象语音语料库（Chisco），其中包括 20,000 多句来自健康成年人的高密度想象语音脑电图记录。每个受试者的脑电图数据都超过 900 分钟，是迄今为止用于解码神经语言的最大单个数据集。此外，实验刺激包括 39 个语义类别的 6000 多个日常短语，几乎涵盖了日常语言的所有方面。我们相信，Chisco 是 BCIs 领域的宝贵资源，有助于开发更多用户友好型 BCIs。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Scientific Data Social Sciences-Education

CiteScore

11.20

自引率

4.10%

发文量

689

审稿时长

16 weeks

期刊介绍： Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data. The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.