利用肌电图进行跨主体无声语音识别的简化对抗架构

Journal of neural engineering Pub Date : 2024-09-03 DOI:10.1088/1741-2552/ad7321

Qiang Cui, Xingyu Zhang, Yakun Zhang, Changyan Zheng, Liang Xie, Ye Yan, Edmond Q Wu, Erwei Yin

{"title":"利用肌电图进行跨主体无声语音识别的简化对抗架构","authors":"Qiang Cui, Xingyu Zhang, Yakun Zhang, Changyan Zheng, Liang Xie, Ye Yan, Edmond Q Wu, Erwei Yin","doi":"10.1088/1741-2552/ad7321","DOIUrl":null,"url":null,"abstract":"Objective. The decline in the performance of electromyography (EMG)-based silent speech recognition is widely attributed to disparities in speech patterns, articulation habits, and individual physiology among speakers. Feature alignment by learning a discriminative network that resolves domain offsets across speakers is an effective method to address this problem. The prevailing adversarial network with a branching discriminator specializing in domain discrimination renders insufficiently direct contribution to categorical predictions of the classifier.Approach. To this end, we propose a simplified discrepancy-based adversarial network with a streamlined end-to-end structure for EMG-based cross-subject silent speech recognition. Highly aligned features across subjects are obtained by introducing a Nuclear-norm Wasserstein discrepancy metric on the back end of the classification network, which could be utilized for both classification and domain discrimination. Given the low-level and implicitly noisy nature of myoelectric signals, we devise a cascaded adaptive rectification network as the front-end feature extraction network, adaptively reshaping the intermediate feature map with automatically learnable channel-wise thresholds. The resulting features effectively filter out domain-specific information between subjects while retaining domain-invariant features critical for cross-subject recognition.Main results. A series of sentence-level classification experiments with 100 Chinese sentences demonstrate the efficacy of our method, achieving an average accuracy of 89.46% tested on 40 new subjects by training with data from 60 subjects. Especially, our method achieves a remarkable 10.07% improvement compared to the state-of-the-art model when tested on 10 new subjects with 20 subjects employed for training, surpassing its result even with three times training subjects.Significance. Our study demonstrates an improved classification performance of the proposed adversarial architecture using cross-subject myoelectric signals, providing a promising prospect for EMG-based speech interactive application.","PeriodicalId":94096,"journal":{"name":"Journal of neural engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A simplified adversarial architecture for cross-subject silent speech recognition using electromyography.\",\"authors\":\"Qiang Cui, Xingyu Zhang, Yakun Zhang, Changyan Zheng, Liang Xie, Ye Yan, Edmond Q Wu, Erwei Yin\",\"doi\":\"10.1088/1741-2552/ad7321\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective. The decline in the performance of electromyography (EMG)-based silent speech recognition is widely attributed to disparities in speech patterns, articulation habits, and individual physiology among speakers. Feature alignment by learning a discriminative network that resolves domain offsets across speakers is an effective method to address this problem. The prevailing adversarial network with a branching discriminator specializing in domain discrimination renders insufficiently direct contribution to categorical predictions of the classifier.Approach. To this end, we propose a simplified discrepancy-based adversarial network with a streamlined end-to-end structure for EMG-based cross-subject silent speech recognition. Highly aligned features across subjects are obtained by introducing a Nuclear-norm Wasserstein discrepancy metric on the back end of the classification network, which could be utilized for both classification and domain discrimination. Given the low-level and implicitly noisy nature of myoelectric signals, we devise a cascaded adaptive rectification network as the front-end feature extraction network, adaptively reshaping the intermediate feature map with automatically learnable channel-wise thresholds. The resulting features effectively filter out domain-specific information between subjects while retaining domain-invariant features critical for cross-subject recognition.Main results. A series of sentence-level classification experiments with 100 Chinese sentences demonstrate the efficacy of our method, achieving an average accuracy of 89.46% tested on 40 new subjects by training with data from 60 subjects. Especially, our method achieves a remarkable 10.07% improvement compared to the state-of-the-art model when tested on 10 new subjects with 20 subjects employed for training, surpassing its result even with three times training subjects.Significance. Our study demonstrates an improved classification performance of the proposed adversarial architecture using cross-subject myoelectric signals, providing a promising prospect for EMG-based speech interactive application.\",\"PeriodicalId\":94096,\"journal\":{\"name\":\"Journal of neural engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of neural engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1088/1741-2552/ad7321\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of neural engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/1741-2552/ad7321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基于肌电图的无声语音识别性能下降的主要原因是说话者之间的语音模式、发音习惯和个体生理差异。通过学习能解决不同说话者之间领域偏移的判别网络进行特征对齐是解决这一问题的有效方法。目前流行的对抗网络带有一个专门从事领域分辨的分支分辨器，对分类器的分类预测没有足够的直接贡献。为此，我们提出了一种简化的基于差异的对抗网络，其端到端结构精简，适用于基于肌电图的跨主体无声语音识别。通过在分类网络的后端引入核规范 Wasserstein 差异度量，可获得跨主体的高度一致特征，该特征可用于分类和领域判别。鉴于肌电信号的低水平和隐含噪声特性，我们设计了一个级联自适应整流网络作为前端特征提取网络，利用可自动学习的通道阈值自适应重塑中间特征图。由此产生的特征能有效过滤掉不同受试者之间的特定领域信息，同时保留对跨受试者识别至关重要的领域不变特征。我们使用 100 个中文句子进行了一系列句子级分类实验，证明了我们方法的有效性，通过使用 60 个受试者的数据进行训练，在 40 个新受试者身上测试的平均准确率达到了 89.46%。特别是在使用 20 个受试者进行训练的情况下，在 10 个新受试者上进行测试时，我们的方法比最先进的模型显著提高了 10.07%，甚至超过了其三倍训练受试者的结果。我们的研究表明，利用跨受试者肌电信号的对抗结构提高了分类性能，为基于肌电信号的语音交互应用提供了广阔的前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A simplified adversarial architecture for cross-subject silent speech recognition using electromyography.

Objective. The decline in the performance of electromyography (EMG)-based silent speech recognition is widely attributed to disparities in speech patterns, articulation habits, and individual physiology among speakers. Feature alignment by learning a discriminative network that resolves domain offsets across speakers is an effective method to address this problem. The prevailing adversarial network with a branching discriminator specializing in domain discrimination renders insufficiently direct contribution to categorical predictions of the classifier.Approach. To this end, we propose a simplified discrepancy-based adversarial network with a streamlined end-to-end structure for EMG-based cross-subject silent speech recognition. Highly aligned features across subjects are obtained by introducing a Nuclear-norm Wasserstein discrepancy metric on the back end of the classification network, which could be utilized for both classification and domain discrimination. Given the low-level and implicitly noisy nature of myoelectric signals, we devise a cascaded adaptive rectification network as the front-end feature extraction network, adaptively reshaping the intermediate feature map with automatically learnable channel-wise thresholds. The resulting features effectively filter out domain-specific information between subjects while retaining domain-invariant features critical for cross-subject recognition.Main results. A series of sentence-level classification experiments with 100 Chinese sentences demonstrate the efficacy of our method, achieving an average accuracy of 89.46% tested on 40 new subjects by training with data from 60 subjects. Especially, our method achieves a remarkable 10.07% improvement compared to the state-of-the-art model when tested on 10 new subjects with 20 subjects employed for training, surpassing its result even with three times training subjects.Significance. Our study demonstrates an improved classification performance of the proposed adversarial architecture using cross-subject myoelectric signals, providing a promising prospect for EMG-based speech interactive application.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of neural engineering

自引率

0.00%

发文量