Pub Date : 2024-04-15DOI: 10.1109/OJCAS.2024.3358106
Yan Liu;Yuan Du;Yang Zhao
This special section of the IEEE Open Journal of Circuits and Systems (OJCAS) aims to highlight a selection of papers from 2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). Due to COVID-19 and travel restrictions, APCCAS 2022 was organised as a hybrid conference during 11 - 13 November 2022 in Shenzhen China. As the regional flagship conference of IEEE Circuits and Systems Society, APCCAS 2022 was driven by the theme “Building a Fully-connected AIoT World” to emphasize the CAS Society’s potential for finding multidisciplinary solutions to societal and industrial challenges. The papers in this special issue were selected from a comprehensive list of papers presented in the sessions of APCCAS 2022.
IEEE Open Journal of Circuits and Systems(OJCAS)的这一专栏旨在重点介绍 2022 年 IEEE 亚太电路与系统会议(APCCAS)的部分论文。由于 COVID-19 和旅行限制,2022 年亚太电路与系统会议将于 2022 年 11 月 11 - 13 日在中国深圳举行。作为 IEEE 电路与系统学会的地区旗舰会议,APCCAS 2022 的主题是 "构建全连接的人工智能物联网世界",以强调中国科学院学会在为社会和工业挑战寻找多学科解决方案方面的潜力。本特刊中的论文是从 APCCAS 2022 年会议的论文综合清单中挑选出来的。
{"title":"Special Issue on Selected Papers From APCCAS 2022","authors":"Yan Liu;Yuan Du;Yang Zhao","doi":"10.1109/OJCAS.2024.3358106","DOIUrl":"https://doi.org/10.1109/OJCAS.2024.3358106","url":null,"abstract":"This special section of the IEEE Open Journal of Circuits and Systems (OJCAS) aims to highlight a selection of papers from 2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). Due to COVID-19 and travel restrictions, APCCAS 2022 was organised as a hybrid conference during 11 - 13 November 2022 in Shenzhen China. As the regional flagship conference of IEEE Circuits and Systems Society, APCCAS 2022 was driven by the theme “Building a Fully-connected AIoT World” to emphasize the CAS Society’s potential for finding multidisciplinary solutions to societal and industrial challenges. The papers in this special issue were selected from a comprehensive list of papers presented in the sessions of APCCAS 2022.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"55-56"},"PeriodicalIF":0.0,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10500494","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140555871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-15DOI: 10.1109/OJCAS.2023.3317944
Chien-Cheng Tseng;Su-Ling Lee
In this paper, the computation of graph Fourier transform centrality (GFTC) of complex network using graph filter is presented. For conventional computation method, it needs to use the non-sparse transform matrix of graph Fourier transform (GFT) to compute GFTC scores. To reduce the computational complexity of GFTC, a linear algebra method based on Frobenius norm of error matrix is applied to convert the spectral-domain GFTC computation task to vertex-domain one such that GFTC can be computed by using polynomial graph filtering method. There are two kinds of designs of graph filters to be studied. One is the graph-aware method; the other is the graph-unaware method. The computational complexity comparison and experimental results show that the proposed graph filter method is more computationally efficient than conventional GFT method because the sparsity of Laplacian matrix is used in the implementation structure. Finally, the centrality computations of social network, metro network and sensor network are used to demonstrate the effectiveness of the proposed GFTC computation method using graph filter.
{"title":"Computation of Graph Fourier Transform Centrality Using Graph Filter","authors":"Chien-Cheng Tseng;Su-Ling Lee","doi":"10.1109/OJCAS.2023.3317944","DOIUrl":"https://doi.org/10.1109/OJCAS.2023.3317944","url":null,"abstract":"In this paper, the computation of graph Fourier transform centrality (GFTC) of complex network using graph filter is presented. For conventional computation method, it needs to use the non-sparse transform matrix of graph Fourier transform (GFT) to compute GFTC scores. To reduce the computational complexity of GFTC, a linear algebra method based on Frobenius norm of error matrix is applied to convert the spectral-domain GFTC computation task to vertex-domain one such that GFTC can be computed by using polynomial graph filtering method. There are two kinds of designs of graph filters to be studied. One is the graph-aware method; the other is the graph-unaware method. The computational complexity comparison and experimental results show that the proposed graph filter method is more computationally efficient than conventional GFT method because the sparsity of Laplacian matrix is used in the implementation structure. Finally, the centrality computations of social network, metro network and sensor network are used to demonstrate the effectiveness of the proposed GFTC computation method using graph filter.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"69-80"},"PeriodicalIF":0.0,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10500497","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140555836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-12DOI: 10.1109/OJCAS.2024.3388210
Lennart Bamberg;Ardalan Najafi;Alberto Garcia-Ortiz
Specialized compute blocks have been developed for efficient nn execution. However, due to the vast amount of data and parameter movements, the interconnects and on-chip memories form another bottleneck, impairing power and performance. This work addresses this bottleneck by contributing a low-power technique for edge-AI inference engines that combines overhead-free coding with a statistical analysis of the data and parameters of neural networks. Our approach reduces the power consumption of the logic, interconnect, and memory blocks used for data storage and movements by up to 80% for state-of-the-art benchmarks while providing additional power savings for the compute blocks by up to 39 %. These power improvements are achieved with no loss of accuracy and negligible hardware cost.
为了实现高效的 nn 执行,我们开发了专门的计算模块。然而,由于大量数据和参数的移动,互连和片上存储器构成了另一个瓶颈,损害了功耗和性能。本研究针对这一瓶颈,为边缘人工智能推理引擎提供了一种低功耗技术,该技术将无开销编码与神经网络数据和参数的统计分析相结合。对于最先进的基准,我们的方法将用于数据存储和移动的逻辑、互连和内存块的功耗降低了 80%,同时为计算块节省了 39% 的额外功耗。在实现这些功耗改进的同时,不会降低精度,硬件成本也可忽略不计。
{"title":"Exploiting Neural-Network Statistics for Low-Power DNN Inference","authors":"Lennart Bamberg;Ardalan Najafi;Alberto Garcia-Ortiz","doi":"10.1109/OJCAS.2024.3388210","DOIUrl":"10.1109/OJCAS.2024.3388210","url":null,"abstract":"Specialized compute blocks have been developed for efficient nn execution. However, due to the vast amount of data and parameter movements, the interconnects and on-chip memories form another bottleneck, impairing power and performance. This work addresses this bottleneck by contributing a low-power technique for edge-AI inference engines that combines overhead-free coding with a statistical analysis of the data and parameters of neural networks. Our approach reduces the power consumption of the logic, interconnect, and memory blocks used for data storage and movements by up to 80% for state-of-the-art benchmarks while providing additional power savings for the compute blocks by up to 39 %. These power improvements are achieved with no loss of accuracy and negligible hardware cost.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"178-188"},"PeriodicalIF":0.0,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10498075","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-11DOI: 10.1109/OJCAS.2024.3387849
Ci-Hao Wu;Tian-Sheuan Chang
Transformer-based speech enhancement models yield impressive results. However, their heterogeneous and complex structure restricts model compression potential, resulting in greater complexity and reduced hardware efficiency. Additionally, these models are not tailored for streaming and low-power applications. Addressing these challenges, this paper proposes a low-power streaming speech enhancement accelerator through model and hardware optimization. The proposed high performance model is optimized for hardware execution with the co-design of model compression and target application, which reduces 93.9% of model size by the proposed domain-aware and streaming-aware pruning techniques. The required latency is further reduced with batch normalization-based transformers. Additionally, we employed softmax-free attention, complemented by an extra batch normalization, facilitating simpler hardware design. The tailored hardware accommodates these diverse computing patterns by breaking them down into element-wise multiplication and accumulation (MAC). This is achieved through a 1-D processing array, utilizing configurable SRAM addressing, thereby minimizing hardware complexities and simplifying zero skipping. Using the TSMC 40nm CMOS process, the final implementation requires merely 207.8K gates and 53.75KB SRAM. It consumes only 8.08 mW for real-time inference at a 62.5MHz frequency.
{"title":"A Low-Power Streaming Speech Enhancement Accelerator for Edge Devices","authors":"Ci-Hao Wu;Tian-Sheuan Chang","doi":"10.1109/OJCAS.2024.3387849","DOIUrl":"10.1109/OJCAS.2024.3387849","url":null,"abstract":"Transformer-based speech enhancement models yield impressive results. However, their heterogeneous and complex structure restricts model compression potential, resulting in greater complexity and reduced hardware efficiency. Additionally, these models are not tailored for streaming and low-power applications. Addressing these challenges, this paper proposes a low-power streaming speech enhancement accelerator through model and hardware optimization. The proposed high performance model is optimized for hardware execution with the co-design of model compression and target application, which reduces 93.9% of model size by the proposed domain-aware and streaming-aware pruning techniques. The required latency is further reduced with batch normalization-based transformers. Additionally, we employed softmax-free attention, complemented by an extra batch normalization, facilitating simpler hardware design. The tailored hardware accommodates these diverse computing patterns by breaking them down into element-wise multiplication and accumulation (MAC). This is achieved through a 1-D processing array, utilizing configurable SRAM addressing, thereby minimizing hardware complexities and simplifying zero skipping. Using the TSMC 40nm CMOS process, the final implementation requires merely 207.8K gates and 53.75KB SRAM. It consumes only 8.08 mW for real-time inference at a 62.5MHz frequency.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"128-140"},"PeriodicalIF":0.0,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10496994","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-30DOI: 10.1109/OJCAS.2024.3407663
Vassilis Alimisis;Dimitrios G. Arnaoutoglou;Emmanouil Anastasios Serlis;Argyro Kamperi;Konstantinos Metaxas;George A. Kyriacou;Paul P. Sotiriadis
A fall-detection system was implemented utilizing a 2.45 GHz continuous wave radar along with power-efficient and fully-analog integrated classifier architectures. The Power Burst Curve and the effective acceleration were derived from the short time Fourier transform, and then processed by the analog classifier. The proposed classifier architectures are based on different approximations of the Decision tree classification model. The architectures consist of three main building blocks: sigmoid function circuit, analog multiplier and an argmax operator circuit. To assess the hardware design, a thorough analysis is performed, comparing it to commonly used analog classifiers while exploiting the extracted data. The architectures were trained using Python and were compared to software-based classifiers. The circuit designs were executed using TSMC’s 90 nm CMOS process technology and the Cadence IC Suite was employed for tasks including design, schematic implementation, and post-layout simulations.
利用 2.45 GHz 连续波雷达和高能效全模拟集成分类器架构,实现了跌倒检测系统。功率突发曲线和有效加速度由短时傅立叶变换得出,然后由模拟分类器进行处理。所提出的分类器架构基于决策树分类模型的不同近似值。这些架构由三个主要构件组成:sigmoid 函数电路、模拟乘法器和 argmax 运算器电路。为了评估硬件设计,我们进行了全面分析,将其与常用的模拟分类器进行比较,同时利用提取的数据。使用 Python 对架构进行了训练,并与基于软件的分类器进行了比较。电路设计采用台积电的 90 纳米 CMOS 工艺技术,并使用 Cadence IC Suite 完成设计、原理图实现和布局后仿真等任务。
{"title":"A Radar-Based System for Detection of Human Fall Utilizing Analog Hardware Architectures of Decision Tree Model","authors":"Vassilis Alimisis;Dimitrios G. Arnaoutoglou;Emmanouil Anastasios Serlis;Argyro Kamperi;Konstantinos Metaxas;George A. Kyriacou;Paul P. Sotiriadis","doi":"10.1109/OJCAS.2024.3407663","DOIUrl":"10.1109/OJCAS.2024.3407663","url":null,"abstract":"A fall-detection system was implemented utilizing a 2.45 GHz continuous wave radar along with power-efficient and fully-analog integrated classifier architectures. The Power Burst Curve and the effective acceleration were derived from the short time Fourier transform, and then processed by the analog classifier. The proposed classifier architectures are based on different approximations of the Decision tree classification model. The architectures consist of three main building blocks: sigmoid function circuit, analog multiplier and an argmax operator circuit. To assess the hardware design, a thorough analysis is performed, comparing it to commonly used analog classifiers while exploiting the extracted data. The architectures were trained using Python and were compared to software-based classifiers. The circuit designs were executed using TSMC’s 90 nm CMOS process technology and the Cadence IC Suite was employed for tasks including design, schematic implementation, and post-layout simulations.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"224-242"},"PeriodicalIF":0.0,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10542293","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-28DOI: 10.1109/OJCAS.2024.3382355
Mohammad Javad Karimi;Menghe Jin;Catherine Dehollain;Alexandre Schmid
This paper presents a wireless power conversion system designed for biomedical implants, with integrated automatic resonance tuning. The automatic tuning mechanism improves power transfer efficiency (PTE) by finely tuning the resonant frequency of the power link and maximizing the rectified voltage. This adjustment ensures robust and reliable remote powering, even in the face of environmental changes and process variations, while also minimizing tissue exposure to power. On-chip switched array capacitors are connected in parallel with the resonant capacitor, and the system identifies the optimal switched capacitor combination for the highest rectified voltage by iterating over each of them. The proposed system is implemented and fabricated in standard 180nm CMOS technology, with a total area of 0.339 mm2, and its operation is verified. The measurement results demonstrate that this system provides tolerance up to mismatches equivalent to 75 pF capacitance variation in LC tank, ±15% LC variation in this design. The system offers a PTE enhancement from 9.1% to 30.2% in case of high LC variation, and the tuning control consumes 154.7 $mu text{W}$