{"title":"Efficient CORDIC-Based Activation Functions for RNN Acceleration on FPGAs","authors":"Wan Shen;Junye Jiang;Minghan Li;Shuanglong Liu","doi":"10.1109/TAI.2024.3474648","DOIUrl":null,"url":null,"abstract":"Recurrent neural networks (RNNs), particularly long short-term memory (LSTM) networks, have emerged as standard tools for tackling a wide range of time series applications, such as natural language processing. However, deploying these models on edge devices presents great challenges due to limited computational resources. Additionally, the implementation of RNN activation functions on low-end hardware devices significantly impacts the overall network performance, as activations constitute the dominant part of execution time. In this work, we propose an efficient approach for implementing commonly used RNN activations, leveraging an optimized coordinate rotation digital computer algorithm (CORDIC). Moreover, we propose a unified hardware architecture for mapping the CORDIC-based method onto field-programmable gate arrays (FPGAs), which can be configured to implement multiple nonlinear activation functions. Our architecture reduces the computational time with fewer iterations in CORDIC compared with existing methods, rendering it particularly suitable for resource-constrained edge devices. Our design is implemented on a Xilinx Zynq-7000 device and evaluated across three RNNs and benchmark datasets. Experimental results demonstrate that our design achieves up to a 2<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula> speedup while maintaining model accuracy compared with the state-of-the-art designs.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 1","pages":"199-210"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10706602/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recurrent neural networks (RNNs), particularly long short-term memory (LSTM) networks, have emerged as standard tools for tackling a wide range of time series applications, such as natural language processing. However, deploying these models on edge devices presents great challenges due to limited computational resources. Additionally, the implementation of RNN activation functions on low-end hardware devices significantly impacts the overall network performance, as activations constitute the dominant part of execution time. In this work, we propose an efficient approach for implementing commonly used RNN activations, leveraging an optimized coordinate rotation digital computer algorithm (CORDIC). Moreover, we propose a unified hardware architecture for mapping the CORDIC-based method onto field-programmable gate arrays (FPGAs), which can be configured to implement multiple nonlinear activation functions. Our architecture reduces the computational time with fewer iterations in CORDIC compared with existing methods, rendering it particularly suitable for resource-constrained edge devices. Our design is implemented on a Xilinx Zynq-7000 device and evaluated across three RNNs and benchmark datasets. Experimental results demonstrate that our design achieves up to a 2$\boldsymbol{\times}$ speedup while maintaining model accuracy compared with the state-of-the-art designs.