Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet Implementation for Edge Motor-Imagery Brain-Machine Interfaces

Tibor Schneider, Xiaying Wang, Michael Hersche, L. Cavigelli, L. Benini
{"title":"Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet Implementation for Edge Motor-Imagery Brain-Machine Interfaces","authors":"Tibor Schneider, Xiaying Wang, Michael Hersche, L. Cavigelli, L. Benini","doi":"10.1109/SMARTCOMP50058.2020.00065","DOIUrl":null,"url":null,"abstract":"Motor-Imagery Brain-Machine Interfaces (MI-BMIs) promise direct and accessible communication between human brains and machines by analyzing brain activities recorded with Electroencephalography (EEG). Latency, reliability, and privacy constraints make it unsuitable to offload the computation to the cloud. Practical use cases demand a wearable, battery-operated device with low average power consumption for longterm use. Recently, sophisticated algorithms, in particular deep learning models, have emerged for classifying EEG signals. While reaching outstanding accuracy, these models often exceed the limitations of edge devices due to their memory and computational requirements. In this paper, we demonstrate algorithmic and implementation optimizations for EEGNET, a compact Convolutional Neural Network (CNN) suitable for many BMI paradigms. We quantize weights and activations to 8-bit fixed-point with a negligible accuracy loss of 0.4% on 4-class MI, and present an energy-efficient hardware-aware implementation on the Mr. Wolf parallel ultra-low power (PULP) System-on-Chip (SoC) by utilizing its custom RISC-VISA extensions and 8-core compute cluster. With our proposed optimization steps, we can obtain an overall speedup of 64 × and a reduction of up to 85% in memory footprint with respect to a single-core layer-wise baseline implementation. Our implementation takes only 5.82 ms and consumes 0.627 mJ per inference. With 21.0 GMAC/s/W, it is 256× more energy-efficient than an EEGNET implementation on an ARM Cortex-M7 (0.082 GMAC/s/W).","PeriodicalId":346827,"journal":{"name":"2020 IEEE International Conference on Smart Computing (SMARTCOMP)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Smart Computing (SMARTCOMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMARTCOMP50058.2020.00065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

Motor-Imagery Brain-Machine Interfaces (MI-BMIs) promise direct and accessible communication between human brains and machines by analyzing brain activities recorded with Electroencephalography (EEG). Latency, reliability, and privacy constraints make it unsuitable to offload the computation to the cloud. Practical use cases demand a wearable, battery-operated device with low average power consumption for longterm use. Recently, sophisticated algorithms, in particular deep learning models, have emerged for classifying EEG signals. While reaching outstanding accuracy, these models often exceed the limitations of edge devices due to their memory and computational requirements. In this paper, we demonstrate algorithmic and implementation optimizations for EEGNET, a compact Convolutional Neural Network (CNN) suitable for many BMI paradigms. We quantize weights and activations to 8-bit fixed-point with a negligible accuracy loss of 0.4% on 4-class MI, and present an energy-efficient hardware-aware implementation on the Mr. Wolf parallel ultra-low power (PULP) System-on-Chip (SoC) by utilizing its custom RISC-VISA extensions and 8-core compute cluster. With our proposed optimization steps, we can obtain an overall speedup of 64 × and a reduction of up to 85% in memory footprint with respect to a single-core layer-wise baseline implementation. Our implementation takes only 5.82 ms and consumes 0.627 mJ per inference. With 21.0 GMAC/s/W, it is 256× more energy-efficient than an EEGNET implementation on an ARM Cortex-M7 (0.082 GMAC/s/W).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Q-EEGNet:一种高效的8位量化并行EEGNet边缘运动-图像脑机接口实现
运动-图像脑机接口(mi - bmi)通过分析脑电图(EEG)记录的大脑活动,实现了人类大脑和机器之间直接和可访问的通信。延迟、可靠性和隐私限制使得不适合将计算卸载到云。实际用例需要一种可穿戴的、电池供电的设备,平均功耗低,可以长期使用。最近出现了一些复杂的算法,特别是深度学习模型,用于对EEG信号进行分类。虽然达到出色的准确性,但由于其内存和计算要求,这些模型通常超出边缘设备的限制。在本文中,我们展示了EEGNET的算法和实现优化,EEGNET是一种适用于许多BMI范式的紧凑型卷积神经网络(CNN)。我们将权重和激活量化为8位定点,在4级MI上的精度损失可忽略不计,仅为0.4%,并利用其定制的RISC-VISA扩展和8核计算集群,在Mr. Wolf并行超低功耗(PULP)片上系统(SoC)上提出了一种节能的硬件感知实现。通过我们提出的优化步骤,相对于单核分层基准实现,我们可以获得64倍的总体加速,并减少高达85%的内存占用。我们的实现只需要5.82 ms,每次推理消耗0.627 mJ。它具有21.0 GMAC/s/W,比在ARM Cortex-M7上实现的EEGNET (0.082 GMAC/s/W)节能256倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Industry 4.0 Solutions for Interoperability: a Use Case about Tools and Tool Chains in the Arrowhead Tools Project A NodeRED-based dashboard to deploy pipelines on top of IoT infrastructure Enhanced Support of LWM2M in Low Power and Lossy Networks Simulating Smart Campus Applications in Edge and Fog Computing A Scalable Distributed System for Precision Irrigation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1