首页 > 最新文献

2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)最新文献

英文 中文
Toward A Real-Time Elliptic Curve Cryptography-Based Facial Security System 基于实时椭圆曲线密码的面部安全系统研究
Pub Date : 2022-11-11 DOI: 10.1109/APCCAS55924.2022.10090407
T. Tan, Hanho Lee
This paper presents a novel approach for a facial security system using elliptic curve cryptography. Face images extracted from input video are encrypted before sending to a remote server. The input face images are completely encrypted by mapping each pixel value of the detected face from the input video frame to a point on an elliptic curve. The original image can be recovered when needed using the elliptic curve cryptography decryption function. Specifically, we modify point multiplication designed for projective coordinates and apply the modified approach in affine coordinates to speed up scalar point multiplication operation. Image encryption and decryption operations are also facilitated using our existing scheme. Simulation results on Visual Studio demonstrate that the proposed systems help accelerate encryption and decryption operations while maintaining information confidentiality.
提出了一种利用椭圆曲线密码实现面部安全系统的新方法。从输入视频中提取的人脸图像在发送到远程服务器之前进行加密。通过将输入视频帧中检测到的人脸的每个像素值映射到椭圆曲线上的一点,对输入的人脸图像进行完全加密。利用椭圆曲线加密解密功能,可以在需要时恢复原始图像。具体来说,我们修改了针对射影坐标设计的点乘法,并将改进的方法应用于仿射坐标,以加快标量点乘法运算。使用我们现有的方案也方便了图像加密和解密操作。在Visual Studio上的仿真结果表明,所提出的系统有助于加速加密和解密操作,同时保持信息的机密性。
{"title":"Toward A Real-Time Elliptic Curve Cryptography-Based Facial Security System","authors":"T. Tan, Hanho Lee","doi":"10.1109/APCCAS55924.2022.10090407","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090407","url":null,"abstract":"This paper presents a novel approach for a facial security system using elliptic curve cryptography. Face images extracted from input video are encrypted before sending to a remote server. The input face images are completely encrypted by mapping each pixel value of the detected face from the input video frame to a point on an elliptic curve. The original image can be recovered when needed using the elliptic curve cryptography decryption function. Specifically, we modify point multiplication designed for projective coordinates and apply the modified approach in affine coordinates to speed up scalar point multiplication operation. Image encryption and decryption operations are also facilitated using our existing scheme. Simulation results on Visual Studio demonstrate that the proposed systems help accelerate encryption and decryption operations while maintaining information confidentiality.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117069174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Energy Efficient Precision Scalable Computation Array for Neural Radiance Field Accelerator 一种用于神经辐射场加速器的高能效精确可扩展计算阵列
Pub Date : 2022-11-11 DOI: 10.1109/APCCAS55924.2022.10090268
Chaolin Rao, Haochuan Wan, Yueyang Zheng, Pingqiang Zhou, Xin Lou
Neural Radiance Field (NeRF), a recent advance in neural rendering, demonstrates impressive results for photo-realistic novel view synthesis. However, it faces challenges for deployment in practical rendering applications due to the large amount of multiply-accumulate (MAC) operations. For hardware accelerator design, precision-scalable MAC array, which can support computations with various precision can be used to optimize the power consumption of NeRF rendering accelerators. Recently, a variety of precision-scalable MAC arrays have been proposed to reduce the computational complexity of Convolutional Neural Networks (CNN). However, most of them require a lot of control logic to support different levels of precision. This paper proposes a precision-scalable MAC array with serial mode, which can support the multiplication with different precision of weight in multiple cycles with little overhead. Implementation results show that the energy efficiency of the proposed MAC array is about 14.54 TOPS/W and 4.83 TOPS/W for 4-bit and 8-bit computation modes, superior to other existing precision-scalable solutions.
神经辐射场(Neural Radiance Field, NeRF)是神经渲染技术的最新进展,它展示了令人印象深刻的图像真实感新视图合成结果。然而,由于大量的乘法累加(multiple -accumulate, MAC)操作,它在实际渲染应用中的部署面临挑战。对于硬件加速器设计,可支持各种精度计算的精度可扩展MAC阵列可用于优化NeRF渲染加速器的功耗。最近,人们提出了各种精度可扩展的MAC阵列来降低卷积神经网络(CNN)的计算复杂度。然而,它们中的大多数都需要大量的控制逻辑来支持不同级别的精度。本文提出了一种串行模式的精度可扩展MAC阵列,该阵列可以支持不同权值精度的多周期乘法,且开销很小。实现结果表明,在4位和8位计算模式下,所提出的MAC阵列的能量效率分别为14.54 TOPS/W和4.83 TOPS/W,优于现有的精度可扩展解决方案。
{"title":"An Energy Efficient Precision Scalable Computation Array for Neural Radiance Field Accelerator","authors":"Chaolin Rao, Haochuan Wan, Yueyang Zheng, Pingqiang Zhou, Xin Lou","doi":"10.1109/APCCAS55924.2022.10090268","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090268","url":null,"abstract":"Neural Radiance Field (NeRF), a recent advance in neural rendering, demonstrates impressive results for photo-realistic novel view synthesis. However, it faces challenges for deployment in practical rendering applications due to the large amount of multiply-accumulate (MAC) operations. For hardware accelerator design, precision-scalable MAC array, which can support computations with various precision can be used to optimize the power consumption of NeRF rendering accelerators. Recently, a variety of precision-scalable MAC arrays have been proposed to reduce the computational complexity of Convolutional Neural Networks (CNN). However, most of them require a lot of control logic to support different levels of precision. This paper proposes a precision-scalable MAC array with serial mode, which can support the multiplication with different precision of weight in multiple cycles with little overhead. Implementation results show that the energy efficiency of the proposed MAC array is about 14.54 TOPS/W and 4.83 TOPS/W for 4-bit and 8-bit computation modes, superior to other existing precision-scalable solutions.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115520752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Evasive Path Planning with Velocity Constraint 速度约束下的最优避障路径规划
Pub Date : 2022-11-11 DOI: 10.1109/APCCAS55924.2022.10090375
Karnika Biswas, Hakim Ghazzai, I. Kar, Y. Massoud
Pursuit evasion is an important category of mobile robotics application related to surveillance, spying and gathering ambient information. This paper presents a novel optimal approach to evasion planning, considering physical limitations of the environment and the evader. The results show that the proposed formulation is applicable irrespective of the number of pursuing agents and the relative velocities of the pursuers and the evader, contrary to the traditional requirement that evasion strategies need to be configured according to situation-dependent cases. The proposed policy is generic and can be implemented in real-time by iterative optimization using model predictive controllers, the objective being avoidance of capture or at the least, maximizing the capture time.
躲避追捕是移动机器人应用的一个重要领域,涉及监视、间谍和收集环境信息。本文提出了一种考虑环境和规避者物理限制的规避规划优化方法。结果表明,该公式与传统的逃避策略需要根据情境依赖情况配置的要求不同,无论追捕者的数量和追捕者与逃避者的相对速度如何,都是适用的。所提出的策略是通用的,可以通过使用模型预测控制器的迭代优化实时实现,目标是避免捕获或至少最大化捕获时间。
{"title":"Optimal Evasive Path Planning with Velocity Constraint","authors":"Karnika Biswas, Hakim Ghazzai, I. Kar, Y. Massoud","doi":"10.1109/APCCAS55924.2022.10090375","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090375","url":null,"abstract":"Pursuit evasion is an important category of mobile robotics application related to surveillance, spying and gathering ambient information. This paper presents a novel optimal approach to evasion planning, considering physical limitations of the environment and the evader. The results show that the proposed formulation is applicable irrespective of the number of pursuing agents and the relative velocities of the pursuers and the evader, contrary to the traditional requirement that evasion strategies need to be configured according to situation-dependent cases. The proposed policy is generic and can be implemented in real-time by iterative optimization using model predictive controllers, the objective being avoidance of capture or at the least, maximizing the capture time.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114216132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Quantization Model Based on a Floating-point Computing-in-Memory Architecture 基于浮点内存计算体系结构的量化模型
Pub Date : 2022-11-11 DOI: 10.1109/APCCAS55924.2022.10090283
X. Chen, An Guo, Xinbing Xu, Xin Si, Jun Yang
Computing-in-memory (CIM) has been proved to perform high energy efficiency and significant acceleration effect for high computational parallelism neural networks. Floating-point numbers and floating-point CIMs (FP-CIM) are required to execute high performance training and high accuracy inference for neural networks. However, none of former works discuss the relationship between circuit design based on the FP-CIM architecture and neural networks. In this paper, we propose a quantization model based on a FP-CIM architecture to figure out this relationship in PYTORCH. According to experimental results we summarize some principles on FP-CIM macro design. Using our quantization model can reduce data storage overhead by more than 70.0%, and control floating-point networks inference accuracy loss within 0.5%, which is 1.7% better than integer networks.
内存计算(CIM)已被证明对高并行计算的神经网络具有高能效和显著的加速效果。对神经网络进行高性能训练和高精度推理,需要浮点数和浮点cim (FP-CIM)。然而,以前的作品都没有讨论基于FP-CIM架构的电路设计与神经网络之间的关系。在PYTORCH中,我们提出了一个基于FP-CIM架构的量化模型来计算这种关系。根据实验结果,总结了FP-CIM宏观设计的一些原则。使用我们的量化模型可以将数据存储开销降低70.0%以上,将浮点网络的推理精度损失控制在0.5%以内,比整数网络提高1.7%。
{"title":"A Quantization Model Based on a Floating-point Computing-in-Memory Architecture","authors":"X. Chen, An Guo, Xinbing Xu, Xin Si, Jun Yang","doi":"10.1109/APCCAS55924.2022.10090283","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090283","url":null,"abstract":"Computing-in-memory (CIM) has been proved to perform high energy efficiency and significant acceleration effect for high computational parallelism neural networks. Floating-point numbers and floating-point CIMs (FP-CIM) are required to execute high performance training and high accuracy inference for neural networks. However, none of former works discuss the relationship between circuit design based on the FP-CIM architecture and neural networks. In this paper, we propose a quantization model based on a FP-CIM architecture to figure out this relationship in PYTORCH. According to experimental results we summarize some principles on FP-CIM macro design. Using our quantization model can reduce data storage overhead by more than 70.0%, and control floating-point networks inference accuracy loss within 0.5%, which is 1.7% better than integer networks.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114237678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An 8-T Processing-in-Memory SRAM Cell-Based Pixel-Parallel Array Processor for Vision Chips 一种用于视觉芯片的8t内存SRAM单元像素并行阵列处理器
Pub Date : 2022-11-11 DOI: 10.1109/APCCAS55924.2022.10090359
Leyi Chen, Junxian He, Jianyi Yu, Haibing Wang, Jing Lu, Liyuan Liu, N. Wu, Cong Shi, Tian Min
Vision chip is a high-speed image processing device, featuring a massively-parallel pixel-level processing element (PE) array to boost pixel processing speed. However, the collocated processing unit and fine-grained data memory unit inside each PE impose a huge requirement on memory access bandwidth as well as big area and energy consumption. To overcome this bottleneck, this paper proposes a full custom $mathbf{8T}$ SRAM-based Processing-in-memory (PIM) architecture to realize pixel-parallel array processor for high-speed energy-efficient vision chips. The proposed PIM architecture is constructed by emending multiplexer-based computing circuits into a dual port 8T SRAM array, so as to form a PIM PE array. Each PIM PE holds a 66-bit 8T SRAM cell block embedding in-memory logic functions, of which 64-bit 8T SRAM cells serving as the PE memory, 2-bit 8T SRAM cells acting as a buffer register in the PE. A full custom physical layout of a 16 $times boldsymbol{16}$ prototyping PIM PE array is designed and simulated using a 65 nm CMOS technology. The simulation results demonstrate that our proposed PIM PE architecture can achieve 200 MHz operation at 1.2 V, and reach a high energy efficiency of 3.97 TOPS/W while keeping a compact area of 0.129 $mathbf{mm}^{2}$.
视觉芯片是一种高速图像处理器件,采用大规模并行像素级处理元件(PE)阵列来提高像素处理速度。然而,每个PE内部的并发处理单元和细粒度数据存储单元对内存访问带宽提出了巨大的要求,并且占用了较大的面积和能耗。为了克服这一瓶颈,本文提出了一种基于全定制$mathbf{8T}$ sram的内存处理(PIM)架构,以实现用于高速节能视觉芯片的像素并行阵列处理器。提出的PIM架构是将基于多路复用器的计算电路修正为双端口8T SRAM阵列,从而形成PIM PE阵列。每个PIM PE持有一个嵌入内存逻辑功能的66位8T SRAM单元块,其中64位8T SRAM单元作为PE内存,2位8T SRAM单元作为PE中的缓冲寄存器。采用65纳米CMOS技术,设计并模拟了一个完整的自定义物理布局的16 $times boldsymbol{16}$原型PIM PE阵列。仿真结果表明,我们提出的PIM PE架构可以在1.2 V下实现200 MHz的工作,并达到3.97 TOPS/W的高能效,同时保持0.129 $mathbf{mm}^{2}$的紧凑面积。
{"title":"An 8-T Processing-in-Memory SRAM Cell-Based Pixel-Parallel Array Processor for Vision Chips","authors":"Leyi Chen, Junxian He, Jianyi Yu, Haibing Wang, Jing Lu, Liyuan Liu, N. Wu, Cong Shi, Tian Min","doi":"10.1109/APCCAS55924.2022.10090359","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090359","url":null,"abstract":"Vision chip is a high-speed image processing device, featuring a massively-parallel pixel-level processing element (PE) array to boost pixel processing speed. However, the collocated processing unit and fine-grained data memory unit inside each PE impose a huge requirement on memory access bandwidth as well as big area and energy consumption. To overcome this bottleneck, this paper proposes a full custom $mathbf{8T}$ SRAM-based Processing-in-memory (PIM) architecture to realize pixel-parallel array processor for high-speed energy-efficient vision chips. The proposed PIM architecture is constructed by emending multiplexer-based computing circuits into a dual port 8T SRAM array, so as to form a PIM PE array. Each PIM PE holds a 66-bit 8T SRAM cell block embedding in-memory logic functions, of which 64-bit 8T SRAM cells serving as the PE memory, 2-bit 8T SRAM cells acting as a buffer register in the PE. A full custom physical layout of a 16 $times boldsymbol{16}$ prototyping PIM PE array is designed and simulated using a 65 nm CMOS technology. The simulation results demonstrate that our proposed PIM PE architecture can achieve 200 MHz operation at 1.2 V, and reach a high energy efficiency of 3.97 TOPS/W while keeping a compact area of 0.129 $mathbf{mm}^{2}$.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114240781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RRAM Computing-in-Memory Using Transient Charge Transferring for Low-Power and Small-Latency AI Edge Inference 基于瞬态电荷转移的低功耗小延迟AI边缘推断的内存中RRAM计算
Pub Date : 2022-11-11 DOI: 10.1109/APCCAS55924.2022.10090254
Linfang Wang, Junjie An, Wang Ye, Weizeng Li, Hanghang Gao, Yangu He, Jianfeng Gao, Jinshan Yue, Lingyan Fan, C. Dou
RRAM-based computing-in-memory (CIM) can potentially improve the energy- and area-efficiency for AI edge processors, yet may still suffer from performance degradations due to the large DC current and parasitic capacitance in the cell array during computation. In this work, we propose a new CIM design leveraging the transient-charge-transferring (TCT) between the parasitic capacitors in the high-density foundry-compatible two-transistor-two-resistor (2T2R) RRAM array, which can perform DC-current-free multiply-and-accumulate (MAC) operations with improved energy-efficiency, reduced latency and enhanced signal margin. The concept of TCT-CIM is silicon demonstrated using a 180nm 400Kb RRAM test-chip, which has achieved 7.36 times power reduction compared to the conventional scheme and measured read access time less than 17.22 ns.
基于rram的内存计算(CIM)可以潜在地提高人工智能边缘处理器的能量和面积效率,但在计算过程中,由于单元阵列中的大直流电流和寄生电容,仍然可能遭受性能下降。在这项工作中,我们提出了一种新的CIM设计,利用高密度铸造厂兼容双晶体管-双电阻(2T2R) RRAM阵列中寄生电容器之间的瞬态电荷转移(TCT),可以执行无直流的乘法和累积(MAC)操作,提高能效,减少延迟和增强信号余量。TCT-CIM的概念是用180nm 400Kb RRAM测试芯片演示的,与传统方案相比,该方案的功耗降低了7.36倍,测量的读取访问时间小于17.22 ns。
{"title":"RRAM Computing-in-Memory Using Transient Charge Transferring for Low-Power and Small-Latency AI Edge Inference","authors":"Linfang Wang, Junjie An, Wang Ye, Weizeng Li, Hanghang Gao, Yangu He, Jianfeng Gao, Jinshan Yue, Lingyan Fan, C. Dou","doi":"10.1109/APCCAS55924.2022.10090254","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090254","url":null,"abstract":"RRAM-based computing-in-memory (CIM) can potentially improve the energy- and area-efficiency for AI edge processors, yet may still suffer from performance degradations due to the large DC current and parasitic capacitance in the cell array during computation. In this work, we propose a new CIM design leveraging the transient-charge-transferring (TCT) between the parasitic capacitors in the high-density foundry-compatible two-transistor-two-resistor (2T2R) RRAM array, which can perform DC-current-free multiply-and-accumulate (MAC) operations with improved energy-efficiency, reduced latency and enhanced signal margin. The concept of TCT-CIM is silicon demonstrated using a 180nm 400Kb RRAM test-chip, which has achieved 7.36 times power reduction compared to the conventional scheme and measured read access time less than 17.22 ns.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121851316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Power-Aware ECG Transmission Framework with Server Aided Lossless Compression 基于服务器辅助无损压缩的功率感知心电传输框架
Pub Date : 2022-11-11 DOI: 10.1109/APCCAS55924.2022.10090374
Jitumani Sarma, Rakesh Biswas
Wearable sensor nodes based WBAN system is utilized to reduce individuals' life risk by detecting various cardiac anomalies via remote ECG signal monitoring. In this context, a power-aware WBAN transmission system through a server-aided ECG compression technique is presented in this paper. For that, a lossless compression technique to deal with the power consumption issue of a sensor node is proposed. The proposed compression approach employs a frame-adaptive Golomb-rice coding in coordination with k-means clustering at the remote server. The proposed algorithm effectively achieves a similar compression ratio under different levels of noise incorporated in the digitized ECG signal. The algorithm is validated with ECG signals from the MIT-BIH arrhythmia database, resulting in an average compression ratio of 2.89. The VLSI architecture of the proposed technique is implemented on TSMC 90 nm technology, which consumes a power of 65 $mu W$ with 0.0049 $mm^{2}$ area overhead.
基于可穿戴传感器节点的WBAN系统,通过远程心电信号监测,检测各种心脏异常,降低个体生命风险。在此背景下,本文提出了一种基于服务器辅助心电压缩技术的功率感知WBAN传输系统。为此,提出了一种无损压缩技术来解决传感器节点的功耗问题。所提出的压缩方法采用帧自适应的Golomb-rice编码,并与远程服务器上的k-means聚类相协调。该算法在数字化心电信号中不同噪声水平下均能有效实现相似的压缩比。利用MIT-BIH心律失常数据库的心电信号对算法进行了验证,平均压缩比为2.89。该技术的VLSI架构在台积电90nm技术上实现,功耗为65 $mu W$,面积开销为0.0049 $mm^{2}$。
{"title":"A Power-Aware ECG Transmission Framework with Server Aided Lossless Compression","authors":"Jitumani Sarma, Rakesh Biswas","doi":"10.1109/APCCAS55924.2022.10090374","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090374","url":null,"abstract":"Wearable sensor nodes based WBAN system is utilized to reduce individuals' life risk by detecting various cardiac anomalies via remote ECG signal monitoring. In this context, a power-aware WBAN transmission system through a server-aided ECG compression technique is presented in this paper. For that, a lossless compression technique to deal with the power consumption issue of a sensor node is proposed. The proposed compression approach employs a frame-adaptive Golomb-rice coding in coordination with k-means clustering at the remote server. The proposed algorithm effectively achieves a similar compression ratio under different levels of noise incorporated in the digitized ECG signal. The algorithm is validated with ECG signals from the MIT-BIH arrhythmia database, resulting in an average compression ratio of 2.89. The VLSI architecture of the proposed technique is implemented on TSMC 90 nm technology, which consumes a power of 65 $mu W$ with 0.0049 $mm^{2}$ area overhead.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125523466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 55nm 32Mb Digital Flash CIM Using Compressed LUT Multiplier and Low Power WL Voltage Trimming Scheme for AI Edge Inference 采用压缩LUT乘法器和低功耗WL电压微调方案的55nm 32Mb数字Flash CIM用于AI边缘推断
Pub Date : 2022-11-11 DOI: 10.1109/APCCAS55924.2022.10090358
Hongyang Hu, Zi Wang, Xiaoxin Xu, K. Xi, Kun Zhang, Junyu Zhang, C. Dou
In this work, we proposed a digital flash computing-in-memory (CIM) architecture using compressed lookup-table multiplier (CLUTM) and low power word-line voltage trimming (LP-WLVT) schemes. The proposed concept is highly compatible to the standard commodity NOR flash memory. Compared to the conventional lookup-table (LUT) multipliers, CLUTM results in 32 times reduction on the area cost in the case of 8-bit multiplication. The LP-WLVT scheme can further reduce the inference power by 14%. The concept is silicon demonstrated in a 55nm 32Mb commercial flash memory, which can perform 8-bit multiply-and-accumulate (MAC) with a throughput of 51.2 GOPs. It provides 1.778ms frame shift when running TC-resnet8 network, which is $5 times$ more efficient than the previous works. The CLUTM-based digital CIM architecture can play an important role to enable commercial flash for highly-efficient AI edge inference.
在这项工作中,我们提出了一个使用压缩查找表乘法器(CLUTM)和低功耗字线电压微调(LP-WLVT)方案的数字闪存内存计算(CIM)架构。所提出的概念与标准的商用NOR快闪记忆体高度相容。与传统的查找表(LUT)乘法器相比,在8位乘法的情况下,CLUTM的面积成本降低了32倍。LP-WLVT方案可以进一步降低14%的推理能力。该概念在55nm 32Mb商用闪存中得到了验证,该闪存可以执行8位乘法和累积(MAC),吞吐量为51.2 GOPs。它在运行TC-resnet8网络时提供1.778ms的帧移位,比以前的工作效率提高了5倍。基于clutm的数字CIM架构可以为实现高效AI边缘推理的商用闪存发挥重要作用。
{"title":"A 55nm 32Mb Digital Flash CIM Using Compressed LUT Multiplier and Low Power WL Voltage Trimming Scheme for AI Edge Inference","authors":"Hongyang Hu, Zi Wang, Xiaoxin Xu, K. Xi, Kun Zhang, Junyu Zhang, C. Dou","doi":"10.1109/APCCAS55924.2022.10090358","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090358","url":null,"abstract":"In this work, we proposed a digital flash computing-in-memory (CIM) architecture using compressed lookup-table multiplier (CLUTM) and low power word-line voltage trimming (LP-WLVT) schemes. The proposed concept is highly compatible to the standard commodity NOR flash memory. Compared to the conventional lookup-table (LUT) multipliers, CLUTM results in 32 times reduction on the area cost in the case of 8-bit multiplication. The LP-WLVT scheme can further reduce the inference power by 14%. The concept is silicon demonstrated in a 55nm 32Mb commercial flash memory, which can perform 8-bit multiply-and-accumulate (MAC) with a throughput of 51.2 GOPs. It provides 1.778ms frame shift when running TC-resnet8 network, which is $5 times$ more efficient than the previous works. The CLUTM-based digital CIM architecture can play an important role to enable commercial flash for highly-efficient AI edge inference.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131165078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A 27–29.5GHz 6-Bit Phase Shifter with 0.67 −1.5 degrees RMS Phase Error in 65nm CMOS 一种27-29.5GHz 6位移相器,相位误差RMS为0.67 ~ 1.5度
Pub Date : 2022-11-11 DOI: 10.1109/APCCAS55924.2022.10090332
Qin Duan, Zhijian Chen, Feng-yuan Mao, Y. Zou, Bin Li, Guangyin Feng, Yanjie Wang, Xiao-Ling Lin
A 27-29.5GHz 6-bit switch-type phase shifter (PS) using 65nm CMOS process is presented in this paper. The PS incorporates 6 series phase shift bits to realize the relative phase shift varying from 0° to 354.375° with a step of 5.625°. Novel design approaches for phase shift bit and bits cascading sequence are proposed to improve the bandwidth and the RMS phase error. The post-layout simulation results show that the PS exhibits an ultra-low RMS phase error of 0.67°-1.5° and RMS gain error of 0.63dB-0.8dB from 27GHz to 29.5GHz. The input and output return loss are both better than −10dB and the core size iS $0.90times 0.35text{mm}^{2}$.
提出了一种采用65nm CMOS工艺的27-29.5GHz 6位开关型移相器(PS)。PS集成了6个系列相移位,实现了0°到354.375°的相对相移,步长为5.625°。为了提高带宽和均方根相位误差,提出了相移位和位级联序列的新设计方法。布局后仿真结果表明,在27GHz ~ 29.5GHz范围内,PS的相位误差均方根值为0.67°~ 1.5°,增益误差均方根值为0.63 db ~ 0.8 db。输入输出回波损耗均优于- 10dB,磁芯尺寸为$0.90 × 0.35text{mm}^{2}$。
{"title":"A 27–29.5GHz 6-Bit Phase Shifter with 0.67 −1.5 degrees RMS Phase Error in 65nm CMOS","authors":"Qin Duan, Zhijian Chen, Feng-yuan Mao, Y. Zou, Bin Li, Guangyin Feng, Yanjie Wang, Xiao-Ling Lin","doi":"10.1109/APCCAS55924.2022.10090332","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090332","url":null,"abstract":"A 27-29.5GHz 6-bit switch-type phase shifter (PS) using 65nm CMOS process is presented in this paper. The PS incorporates 6 series phase shift bits to realize the relative phase shift varying from 0° to 354.375° with a step of 5.625°. Novel design approaches for phase shift bit and bits cascading sequence are proposed to improve the bandwidth and the RMS phase error. The post-layout simulation results show that the PS exhibits an ultra-low RMS phase error of 0.67°-1.5° and RMS gain error of 0.63dB-0.8dB from 27GHz to 29.5GHz. The input and output return loss are both better than −10dB and the core size iS $0.90times 0.35text{mm}^{2}$.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134164616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Complexity Dynamic Single-Minimum Min-Sum Algorithm and Hardware Implementation for LDPC Codes LDPC码的低复杂度动态单最小最小和算法及硬件实现
Pub Date : 2022-11-11 DOI: 10.1109/APCCAS55924.2022.10090379
Qinyuan Zhang, Suwen Song, Zhongfeng Wang
As a type of low-complexity decoding algorithm for low-density parity-check (LDPC) codes, the single-minimum min-sum (smMS) algorithm avoids finding the second minimum, while estimates it by adding a fixed value to the minimum instead. However, the inaccurate estimation of the sub-minimum results in obvious performance degradation. In this work, we propose an improved smMS algorithm, which adds a dynamic value to the minimum based on a special variable that can be easily computed and largely represents the convergence degree of iterative decoding. This new algorithm is thus called dynamic smMS (dsmMS) algorithm. In comparison to the standard normalized min-sum (NMS) algorithm, the performance gap for LDPC code (672,588) is narrowed from 0.55 dB of the smMS to 0.12 dB of the dsmMS. We also present a partially parallel decoding architecture for the dsmMS algorithm, and implement it under 55nm CMOS technology with an area of 0.21 mm2, Furthermore, compared with the traditional NMS decoder, the proposed design can reduce the area of the total decoder by 22%.
作为一种低密度校验码的低复杂度译码算法,单最小最小和(smMS)算法避免了寻找第二个最小值,而是通过在最小值上增加一个固定值来估计第二个最小值。然而,对次最小值的不准确估计会导致明显的性能下降。在这项工作中,我们提出了一种改进的smMS算法,该算法基于一个易于计算的特殊变量在最小值上添加一个动态值,该变量在很大程度上代表了迭代解码的收敛程度。这种新算法被称为动态smMS (dsmMS)算法。与标准归一化最小和(NMS)算法相比,LDPC码(672,588)的性能差距从smMS的0.55 dB缩小到dsmMS的0.12 dB。我们还提出了一种dsmMS算法的部分并行译码架构,并在55nm CMOS技术下实现,其译码面积为0.21 mm2,并且与传统的NMS译码器相比,该设计可将译码器的总面积减少22%。
{"title":"Low-Complexity Dynamic Single-Minimum Min-Sum Algorithm and Hardware Implementation for LDPC Codes","authors":"Qinyuan Zhang, Suwen Song, Zhongfeng Wang","doi":"10.1109/APCCAS55924.2022.10090379","DOIUrl":"https://doi.org/10.1109/APCCAS55924.2022.10090379","url":null,"abstract":"As a type of low-complexity decoding algorithm for low-density parity-check (LDPC) codes, the single-minimum min-sum (smMS) algorithm avoids finding the second minimum, while estimates it by adding a fixed value to the minimum instead. However, the inaccurate estimation of the sub-minimum results in obvious performance degradation. In this work, we propose an improved smMS algorithm, which adds a dynamic value to the minimum based on a special variable that can be easily computed and largely represents the convergence degree of iterative decoding. This new algorithm is thus called dynamic smMS (dsmMS) algorithm. In comparison to the standard normalized min-sum (NMS) algorithm, the performance gap for LDPC code (672,588) is narrowed from 0.55 dB of the smMS to 0.12 dB of the dsmMS. We also present a partially parallel decoding architecture for the dsmMS algorithm, and implement it under 55nm CMOS technology with an area of 0.21 mm2, Furthermore, compared with the traditional NMS decoder, the proposed design can reduce the area of the total decoder by 22%.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129619050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1