Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet Implementation for Edge Motor-Imagery Brain-Machine Interfaces

2020 IEEE International Conference on Smart Computing (SMARTCOMP) Pub Date : 2020-04-24 DOI:10.1109/SMARTCOMP50058.2020.00065

Tibor Schneider, Xiaying Wang, Michael Hersche, L. Cavigelli, L. Benini

{"title":"Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet Implementation for Edge Motor-Imagery Brain-Machine Interfaces","authors":"Tibor Schneider, Xiaying Wang, Michael Hersche, L. Cavigelli, L. Benini","doi":"10.1109/SMARTCOMP50058.2020.00065","DOIUrl":null,"url":null,"abstract":"Motor-Imagery Brain-Machine Interfaces (MI-BMIs) promise direct and accessible communication between human brains and machines by analyzing brain activities recorded with Electroencephalography (EEG). Latency, reliability, and privacy constraints make it unsuitable to offload the computation to the cloud. Practical use cases demand a wearable, battery-operated device with low average power consumption for longterm use. Recently, sophisticated algorithms, in particular deep learning models, have emerged for classifying EEG signals. While reaching outstanding accuracy, these models often exceed the limitations of edge devices due to their memory and computational requirements. In this paper, we demonstrate algorithmic and implementation optimizations for EEGNET, a compact Convolutional Neural Network (CNN) suitable for many BMI paradigms. We quantize weights and activations to 8-bit fixed-point with a negligible accuracy loss of 0.4% on 4-class MI, and present an energy-efficient hardware-aware implementation on the Mr. Wolf parallel ultra-low power (PULP) System-on-Chip (SoC) by utilizing its custom RISC-VISA extensions and 8-core compute cluster. With our proposed optimization steps, we can obtain an overall speedup of 64 × and a reduction of up to 85% in memory footprint with respect to a single-core layer-wise baseline implementation. Our implementation takes only 5.82 ms and consumes 0.627 mJ per inference. With 21.0 GMAC/s/W, it is 256× more energy-efficient than an EEGNET implementation on an ARM Cortex-M7 (0.082 GMAC/s/W).","PeriodicalId":346827,"journal":{"name":"2020 IEEE International Conference on Smart Computing (SMARTCOMP)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Smart Computing (SMARTCOMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMARTCOMP50058.2020.00065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Motor-Imagery Brain-Machine Interfaces (MI-BMIs) promise direct and accessible communication between human brains and machines by analyzing brain activities recorded with Electroencephalography (EEG). Latency, reliability, and privacy constraints make it unsuitable to offload the computation to the cloud. Practical use cases demand a wearable, battery-operated device with low average power consumption for longterm use. Recently, sophisticated algorithms, in particular deep learning models, have emerged for classifying EEG signals. While reaching outstanding accuracy, these models often exceed the limitations of edge devices due to their memory and computational requirements. In this paper, we demonstrate algorithmic and implementation optimizations for EEGNET, a compact Convolutional Neural Network (CNN) suitable for many BMI paradigms. We quantize weights and activations to 8-bit fixed-point with a negligible accuracy loss of 0.4% on 4-class MI, and present an energy-efficient hardware-aware implementation on the Mr. Wolf parallel ultra-low power (PULP) System-on-Chip (SoC) by utilizing its custom RISC-VISA extensions and 8-core compute cluster. With our proposed optimization steps, we can obtain an overall speedup of 64 × and a reduction of up to 85% in memory footprint with respect to a single-core layer-wise baseline implementation. Our implementation takes only 5.82 ms and consumes 0.627 mJ per inference. With 21.0 GMAC/s/W, it is 256× more energy-efficient than an EEGNET implementation on an ARM Cortex-M7 (0.082 GMAC/s/W).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Q-EEGNet:一种高效的8位量化并行EEGNet边缘运动-图像脑机接口实现

运动-图像脑机接口(mi - bmi)通过分析脑电图(EEG)记录的大脑活动，实现了人类大脑和机器之间直接和可访问的通信。延迟、可靠性和隐私限制使得不适合将计算卸载到云。实际用例需要一种可穿戴的、电池供电的设备，平均功耗低，可以长期使用。最近出现了一些复杂的算法，特别是深度学习模型，用于对EEG信号进行分类。虽然达到出色的准确性，但由于其内存和计算要求，这些模型通常超出边缘设备的限制。在本文中，我们展示了EEGNET的算法和实现优化，EEGNET是一种适用于许多BMI范式的紧凑型卷积神经网络(CNN)。我们将权重和激活量化为8位定点，在4级MI上的精度损失可忽略不计，仅为0.4%，并利用其定制的RISC-VISA扩展和8核计算集群，在Mr. Wolf并行超低功耗(PULP)片上系统(SoC)上提出了一种节能的硬件感知实现。通过我们提出的优化步骤，相对于单核分层基准实现，我们可以获得64倍的总体加速，并减少高达85%的内存占用。我们的实现只需要5.82 ms，每次推理消耗0.627 mJ。它具有21.0 GMAC/s/W，比在ARM Cortex-M7上实现的EEGNET (0.082 GMAC/s/W)节能256倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE International Conference on Smart Computing (SMARTCOMP)

自引率

0.00%

发文量

期刊最新文献

Industry 4.0 Solutions for Interoperability: a Use Case about Tools and Tool Chains in the Arrowhead Tools Project A NodeRED-based dashboard to deploy pipelines on top of IoT infrastructure Enhanced Support of LWM2M in Low Power and Lossy Networks Simulating Smart Campus Applications in Edge and Fog Computing A Scalable Distributed System for Precision Irrigation