IEEE open journal of circuits and systems最新文献_第4页

Special Issue on Selected Papers From APCCAS 2022 2022 年亚太文化与艺术中心论文选特刊

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of circuits and systems

Pub Date : 2024-04-15 DOI: 10.1109/OJCAS.2024.3358106

Yan Liu;Yuan Du;Yang Zhao

This special section of the IEEE Open Journal of Circuits and Systems (OJCAS) aims to highlight a selection of papers from 2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). Due to COVID-19 and travel restrictions, APCCAS 2022 was organised as a hybrid conference during 11 - 13 November 2022 in Shenzhen China. As the regional flagship conference of IEEE Circuits and Systems Society, APCCAS 2022 was driven by the theme “Building a Fully-connected AIoT World” to emphasize the CAS Society’s potential for finding multidisciplinary solutions to societal and industrial challenges. The papers in this special issue were selected from a comprehensive list of papers presented in the sessions of APCCAS 2022.

IEEE Open Journal of Circuits and Systems（OJCAS）的这一专栏旨在重点介绍 2022 年 IEEE 亚太电路与系统会议（APCCAS）的部分论文。由于 COVID-19 和旅行限制，2022 年亚太电路与系统会议将于 2022 年 11 月 11 - 13 日在中国深圳举行。作为 IEEE 电路与系统学会的地区旗舰会议，APCCAS 2022 的主题是 "构建全连接的人工智能物联网世界"，以强调中国科学院学会在为社会和工业挑战寻找多学科解决方案方面的潜力。本特刊中的论文是从 APCCAS 2022 年会议的论文综合清单中挑选出来的。

引用次数: 0

Computation of Graph Fourier Transform Centrality Using Graph Filter 利用图形过滤器计算图形傅立叶变换中心性

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of circuits and systems

Pub Date : 2024-04-15 DOI: 10.1109/OJCAS.2023.3317944

Chien-Cheng Tseng;Su-Ling Lee

In this paper, the computation of graph Fourier transform centrality (GFTC) of complex network using graph filter is presented. For conventional computation method, it needs to use the non-sparse transform matrix of graph Fourier transform (GFT) to compute GFTC scores. To reduce the computational complexity of GFTC, a linear algebra method based on Frobenius norm of error matrix is applied to convert the spectral-domain GFTC computation task to vertex-domain one such that GFTC can be computed by using polynomial graph filtering method. There are two kinds of designs of graph filters to be studied. One is the graph-aware method; the other is the graph-unaware method. The computational complexity comparison and experimental results show that the proposed graph filter method is more computationally efficient than conventional GFT method because the sparsity of Laplacian matrix is used in the implementation structure. Finally, the centrality computations of social network, metro network and sensor network are used to demonstrate the effectiveness of the proposed GFTC computation method using graph filter.

本文介绍了利用图滤波器计算复杂网络的图傅里叶变换中心性（GFTC）。传统的计算方法需要使用图傅里叶变换（GFT）的非稀疏变换矩阵来计算 GFTC 分数。为了降低 GFTC 的计算复杂度，本文采用了一种基于误差矩阵 Frobenius norm 的线性代数方法，将频谱域 GFTC 计算任务转换为顶点域计算任务，从而使 GFTC 可以通过多项式图滤波方法计算。需要研究的图滤波器设计有两种。一种是图感知方法，另一种是图非感知方法。计算复杂度比较和实验结果表明，由于在实现结构中使用了拉普拉斯矩阵的稀疏性，因此所提出的图滤波器方法比传统的 GFT 方法计算效率更高。最后，通过对社交网络、地铁网络和传感器网络的中心性计算，证明了利用图滤波器计算 GFTC 方法的有效性。

{"title":"Computation of Graph Fourier Transform Centrality Using Graph Filter","authors":"Chien-Cheng Tseng;Su-Ling Lee","doi":"10.1109/OJCAS.2023.3317944","DOIUrl":"https://doi.org/10.1109/OJCAS.2023.3317944","url":null,"abstract":"In this paper, the computation of graph Fourier transform centrality (GFTC) of complex network using graph filter is presented. For conventional computation method, it needs to use the non-sparse transform matrix of graph Fourier transform (GFT) to compute GFTC scores. To reduce the computational complexity of GFTC, a linear algebra method based on Frobenius norm of error matrix is applied to convert the spectral-domain GFTC computation task to vertex-domain one such that GFTC can be computed by using polynomial graph filtering method. There are two kinds of designs of graph filters to be studied. One is the graph-aware method; the other is the graph-unaware method. The computational complexity comparison and experimental results show that the proposed graph filter method is more computationally efficient than conventional GFT method because the sparsity of Laplacian matrix is used in the implementation structure. Finally, the centrality computations of social network, metro network and sensor network are used to demonstrate the effectiveness of the proposed GFTC computation method using graph filter.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"69-80"},"PeriodicalIF":0.0,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10500497","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140555836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploiting Neural-Network Statistics for Low-Power DNN Inference 利用神经网络统计实现低功耗 DNN 推断

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of circuits and systems

Pub Date : 2024-04-12 DOI: 10.1109/OJCAS.2024.3388210

Lennart Bamberg;Ardalan Najafi;Alberto Garcia-Ortiz

Specialized compute blocks have been developed for efficient nn execution. However, due to the vast amount of data and parameter movements, the interconnects and on-chip memories form another bottleneck, impairing power and performance. This work addresses this bottleneck by contributing a low-power technique for edge-AI inference engines that combines overhead-free coding with a statistical analysis of the data and parameters of neural networks. Our approach reduces the power consumption of the logic, interconnect, and memory blocks used for data storage and movements by up to 80% for state-of-the-art benchmarks while providing additional power savings for the compute blocks by up to 39 %. These power improvements are achieved with no loss of accuracy and negligible hardware cost.

为了实现高效的 nn 执行，我们开发了专门的计算模块。然而，由于大量数据和参数的移动，互连和片上存储器构成了另一个瓶颈，损害了功耗和性能。本研究针对这一瓶颈，为边缘人工智能推理引擎提供了一种低功耗技术，该技术将无开销编码与神经网络数据和参数的统计分析相结合。对于最先进的基准，我们的方法将用于数据存储和移动的逻辑、互连和内存块的功耗降低了 80%，同时为计算块节省了 39% 的额外功耗。在实现这些功耗改进的同时，不会降低精度，硬件成本也可忽略不计。

引用次数: 0

A Low-Power Streaming Speech Enhancement Accelerator for Edge Devices 面向边缘设备的低功耗流式语音增强加速器

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of circuits and systems

Pub Date : 2024-04-11 DOI: 10.1109/OJCAS.2024.3387849

Ci-Hao Wu;Tian-Sheuan Chang

Transformer-based speech enhancement models yield impressive results. However, their heterogeneous and complex structure restricts model compression potential, resulting in greater complexity and reduced hardware efficiency. Additionally, these models are not tailored for streaming and low-power applications. Addressing these challenges, this paper proposes a low-power streaming speech enhancement accelerator through model and hardware optimization. The proposed high performance model is optimized for hardware execution with the co-design of model compression and target application, which reduces 93.9% of model size by the proposed domain-aware and streaming-aware pruning techniques. The required latency is further reduced with batch normalization-based transformers. Additionally, we employed softmax-free attention, complemented by an extra batch normalization, facilitating simpler hardware design. The tailored hardware accommodates these diverse computing patterns by breaking them down into element-wise multiplication and accumulation (MAC). This is achieved through a 1-D processing array, utilizing configurable SRAM addressing, thereby minimizing hardware complexities and simplifying zero skipping. Using the TSMC 40nm CMOS process, the final implementation requires merely 207.8K gates and 53.75KB SRAM. It consumes only 8.08 mW for real-time inference at a 62.5MHz frequency.

基于变压器的语音增强模型产生了令人印象深刻的结果。然而，它们的异构和复杂结构限制了模型压缩的潜力，导致复杂性增加和硬件效率降低。此外，这些模型并不适合流媒体和低功耗应用。针对这些挑战，本文通过模型和硬件优化，提出了一种低功耗流式语音增强加速器。通过模型压缩和目标应用的协同设计，提出的高性能模型针对硬件执行进行了优化，通过提出的领域感知和流感知剪枝技术，减少了 93.9% 的模型大小。基于批量归一化的转换器进一步降低了所需的延迟。此外，我们还采用了无软最大关注（softmax-free attention），并辅以额外的批量归一化，从而简化了硬件设计。量身定制的硬件可将这些不同的计算模式分解为元素乘法和累加（MAC）。这是通过一维处理阵列，利用可配置的 SRAM 寻址来实现的，从而最大限度地降低了硬件复杂性，简化了跳零过程。利用台积电 40nm CMOS 工艺，最终实现仅需 207.8K 门和 53.75KB SRAM。在 62.5MHz 频率下进行实时推理时，功耗仅为 8.08 mW。

{"title":"A Low-Power Streaming Speech Enhancement Accelerator for Edge Devices","authors":"Ci-Hao Wu;Tian-Sheuan Chang","doi":"10.1109/OJCAS.2024.3387849","DOIUrl":"10.1109/OJCAS.2024.3387849","url":null,"abstract":"Transformer-based speech enhancement models yield impressive results. However, their heterogeneous and complex structure restricts model compression potential, resulting in greater complexity and reduced hardware efficiency. Additionally, these models are not tailored for streaming and low-power applications. Addressing these challenges, this paper proposes a low-power streaming speech enhancement accelerator through model and hardware optimization. The proposed high performance model is optimized for hardware execution with the co-design of model compression and target application, which reduces 93.9% of model size by the proposed domain-aware and streaming-aware pruning techniques. The required latency is further reduced with batch normalization-based transformers. Additionally, we employed softmax-free attention, complemented by an extra batch normalization, facilitating simpler hardware design. The tailored hardware accommodates these diverse computing patterns by breaking them down into element-wise multiplication and accumulation (MAC). This is achieved through a 1-D processing array, utilizing configurable SRAM addressing, thereby minimizing hardware complexities and simplifying zero skipping. Using the TSMC 40nm CMOS process, the final implementation requires merely 207.8K gates and 53.75KB SRAM. It consumes only 8.08 mW for real-time inference at a 62.5MHz frequency.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"128-140"},"PeriodicalIF":0.0,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10496994","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140587361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Radar-Based System for Detection of Human Fall Utilizing Analog Hardware Architectures of Decision Tree Model 利用决策树模型模拟硬件架构的人体坠落雷达探测系统

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of circuits and systems

Pub Date : 2024-03-30 DOI: 10.1109/OJCAS.2024.3407663

Vassilis Alimisis;Dimitrios G. Arnaoutoglou;Emmanouil Anastasios Serlis;Argyro Kamperi;Konstantinos Metaxas;George A. Kyriacou;Paul P. Sotiriadis

A fall-detection system was implemented utilizing a 2.45 GHz continuous wave radar along with power-efficient and fully-analog integrated classifier architectures. The Power Burst Curve and the effective acceleration were derived from the short time Fourier transform, and then processed by the analog classifier. The proposed classifier architectures are based on different approximations of the Decision tree classification model. The architectures consist of three main building blocks: sigmoid function circuit, analog multiplier and an argmax operator circuit. To assess the hardware design, a thorough analysis is performed, comparing it to commonly used analog classifiers while exploiting the extracted data. The architectures were trained using Python and were compared to software-based classifiers. The circuit designs were executed using TSMC’s 90 nm CMOS process technology and the Cadence IC Suite was employed for tasks including design, schematic implementation, and post-layout simulations.

利用 2.45 GHz 连续波雷达和高能效全模拟集成分类器架构，实现了跌倒检测系统。功率突发曲线和有效加速度由短时傅立叶变换得出，然后由模拟分类器进行处理。所提出的分类器架构基于决策树分类模型的不同近似值。这些架构由三个主要构件组成：sigmoid 函数电路、模拟乘法器和 argmax 运算器电路。为了评估硬件设计，我们进行了全面分析，将其与常用的模拟分类器进行比较，同时利用提取的数据。使用 Python 对架构进行了训练，并与基于软件的分类器进行了比较。电路设计采用台积电的 90 纳米 CMOS 工艺技术，并使用 Cadence IC Suite 完成设计、原理图实现和布局后仿真等任务。

{"title":"A Radar-Based System for Detection of Human Fall Utilizing Analog Hardware Architectures of Decision Tree Model","authors":"Vassilis Alimisis;Dimitrios G. Arnaoutoglou;Emmanouil Anastasios Serlis;Argyro Kamperi;Konstantinos Metaxas;George A. Kyriacou;Paul P. Sotiriadis","doi":"10.1109/OJCAS.2024.3407663","DOIUrl":"10.1109/OJCAS.2024.3407663","url":null,"abstract":"A fall-detection system was implemented utilizing a 2.45 GHz continuous wave radar along with power-efficient and fully-analog integrated classifier architectures. The Power Burst Curve and the effective acceleration were derived from the short time Fourier transform, and then processed by the analog classifier. The proposed classifier architectures are based on different approximations of the Decision tree classification model. The architectures consist of three main building blocks: sigmoid function circuit, analog multiplier and an argmax operator circuit. To assess the hardware design, a thorough analysis is performed, comparing it to commonly used analog classifiers while exploiting the extracted data. The architectures were trained using Python and were compared to software-based classifiers. The circuit designs were executed using TSMC’s 90 nm CMOS process technology and the Cadence IC Suite was employed for tasks including design, schematic implementation, and post-layout simulations.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"224-242"},"PeriodicalIF":0.0,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10542293","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Wireless Power Conversion Chain With Fully On-Chip Automatic Resonance Tuning System for Biomedical Implants 用于生物医学植入物的带有片上全自动共振调谐系统的无线功率转换链

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of circuits and systems

Pub Date : 2024-03-28 DOI: 10.1109/OJCAS.2024.3382355

Mohammad Javad Karimi;Menghe Jin;Catherine Dehollain;Alexandre Schmid

This paper presents a wireless power conversion system designed for biomedical implants, with integrated automatic resonance tuning. The automatic tuning mechanism improves power transfer efficiency (PTE) by finely tuning the resonant frequency of the power link and maximizing the rectified voltage. This adjustment ensures robust and reliable remote powering, even in the face of environmental changes and process variations, while also minimizing tissue exposure to power. On-chip switched array capacitors are connected in parallel with the resonant capacitor, and the system identifies the optimal switched capacitor combination for the highest rectified voltage by iterating over each of them. The proposed system is implemented and fabricated in standard 180nm CMOS technology, with a total area of 0.339 mm2, and its operation is verified. The measurement results demonstrate that this system provides tolerance up to mismatches equivalent to 75 pF capacitance variation in LC tank, ±15% LC variation in this design. The system offers a PTE enhancement from 9.1% to 30.2% in case of high LC variation, and the tuning control consumes 154.7

$mu text{W}$

of power during resonance tuning. Moreover, the power conversion chain delivers an optimized rectified voltage along with a regulated voltage of 1.8 V.

本文介绍了一种专为生物医学植入物设计的无线功率转换系统，该系统集成了自动谐振调谐功能。自动调谐机制通过微调功率链路的谐振频率和最大化整流电压来提高功率传输效率（PTE）。即使在环境变化和工艺变化的情况下，这种调整也能确保稳健可靠的远程供电，同时最大限度地减少组织对电源的暴露。片上开关阵列电容器与谐振电容器并联，系统通过对每个电容器进行迭代，找出整流电压最高的最佳开关电容器组合。所提出的系统采用标准 180 纳米 CMOS 技术实现和制造，总面积为 0.339 平方毫米，并对其运行进行了验证。测量结果表明，该系统对失配的容差相当于 LC 槽中 75 pF 的电容变化，在此设计中 LC 变化率为 ±15%。在 LC 变化较大的情况下，该系统可将 PTE 从 9.1% 提高到 30.2%，在谐振调谐期间，调谐控制消耗的功率为 154.7 $mu text{W}$。此外，功率转换链还能提供优化的整流电压和 1.8 V 的稳压电压。

{"title":"A Wireless Power Conversion Chain With Fully On-Chip Automatic Resonance Tuning System for Biomedical Implants","authors":"Mohammad Javad Karimi;Menghe Jin;Catherine Dehollain;Alexandre Schmid","doi":"10.1109/OJCAS.2024.3382355","DOIUrl":"10.1109/OJCAS.2024.3382355","url":null,"abstract":"This paper presents a wireless power conversion system designed for biomedical implants, with integrated automatic resonance tuning. The automatic tuning mechanism improves power transfer efficiency (PTE) by finely tuning the resonant frequency of the power link and maximizing the rectified voltage. This adjustment ensures robust and reliable remote powering, even in the face of environmental changes and process variations, while also minimizing tissue exposure to power. On-chip switched array capacitors are connected in parallel with the resonant capacitor, and the system identifies the optimal switched capacitor combination for the highest rectified voltage by iterating over each of them. The proposed system is implemented and fabricated in standard 180nm CMOS technology, with a total area of 0.339 mm2, and its operation is verified. The measurement results demonstrate that this system provides tolerance up to mismatches equivalent to 75 pF capacitance variation in LC tank, ±15% LC variation in this design. The system offers a PTE enhancement from 9.1% to 30.2% in case of high LC variation, and the tuning control consumes 154.7\u0000<inline-formula> <tex-math>$mu text{W}$ </tex-math></inline-formula>\u0000 of power during resonance tuning. Moreover, the power conversion chain delivers an optimized rectified voltage along with a regulated voltage of 1.8 V.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"117-127"},"PeriodicalIF":0.0,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10481676","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140324283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Clustering Using Hyperdimensional Computing 利用超维计算进行稳健聚类

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of circuits and systems

Pub Date : 2024-03-26 DOI: 10.1109/OJCAS.2024.3381508

Lulu Ge;Keshab K. Parhi

This paper addresses the clustering of data in the hyperdimensional computing (HDC) domain. In prior work, an HDC-based clustering framework, referred to as HDCluster, has been proposed. However, the performance of the existing HDCluster is not robust. The performance of HDCluster is degraded as the hypervectors for the clusters are chosen at random during the initialization step. To overcome this bottleneck, we assign the initial cluster hypervectors by exploring the similarity of the encoded data, referred to as query hypervectors. Intra-cluster hypervectors have a higher similarity than inter-cluster hypervectors. Harnessing the similarity results among query hypervectors, this paper proposes four HDC-based clustering algorithms: similarity-based k-means, equal bin-width histogram, equal bin-height histogram, and similarity-based affinity propagation. Experimental results illustrate that: (i) Compared to the existing HDCluster, our proposed HDC-based clustering algorithms can achieve better accuracy, more robust performance, fewer iterations, and less execution time. Similarity-based affinity propagation outperforms the other three HDC-based clustering algorithms on eight datasets by 2% ~ 38% in clustering accuracy. (ii) Even for one-pass clustering, i.e., without any iterative update of the cluster hypervectors, our proposed algorithms can provide more robust clustering accuracy than HDCluster. (iii) Over eight datasets, five out of eight can achieve higher or comparable accuracy when projected onto the hyperdimensional space. Traditional clustering is more desirable than HDC when the number of clusters,

$k$

, is large.

本文探讨了超维计算（HDC）领域的数据聚类问题。在之前的工作中，已经提出了一种基于 HDC 的聚类框架，称为 HDCluster。然而，现有 HDCluster 的性能并不稳定。由于簇的超向量是在初始化步骤中随机选择的，因此 HDCluster 的性能有所下降。为了克服这一瓶颈，我们通过探索编码数据的相似性来分配初始簇超向量，即查询超向量。簇内超向量比簇间超向量具有更高的相似性。利用查询超向量之间的相似性结果，本文提出了四种基于 HDC 的聚类算法：基于相似性的 K-均值、等二进制宽度直方图、等二进制高度直方图和基于相似性的亲和传播。实验结果表明(i) 与现有的 HDCluster 相比，我们提出的基于 HDC 的聚类算法可以获得更好的准确性、更稳健的性能、更少的迭代次数和更短的执行时间。在八个数据集上，基于相似性的亲和传播聚类算法的聚类准确率比其他三种基于 HDC 的聚类算法高出 2% ~ 38%。(ii) 即使是一次聚类，即不对聚类超向量进行任何迭代更新，我们提出的算法也能提供比 HDCluster 更稳健的聚类精度。(iii) 在 8 个数据集中，有 5 个数据集在投射到超维空间时可以达到更高或相当的精度。当簇的数量（$k$）较大时，传统聚类比 HDC 更理想。

{"title":"Robust Clustering Using Hyperdimensional Computing","authors":"Lulu Ge;Keshab K. Parhi","doi":"10.1109/OJCAS.2024.3381508","DOIUrl":"10.1109/OJCAS.2024.3381508","url":null,"abstract":"This paper addresses the clustering of data in the hyperdimensional computing (HDC) domain. In prior work, an HDC-based clustering framework, referred to as HDCluster, has been proposed. However, the performance of the existing HDCluster is not robust. The performance of HDCluster is degraded as the hypervectors for the clusters are chosen at random during the initialization step. To overcome this bottleneck, we assign the initial cluster hypervectors by exploring the similarity of the encoded data, referred to as query hypervectors. Intra-cluster hypervectors have a higher similarity than inter-cluster hypervectors. Harnessing the similarity results among query hypervectors, this paper proposes four HDC-based clustering algorithms: similarity-based k-means, equal bin-width histogram, equal bin-height histogram, and similarity-based affinity propagation. Experimental results illustrate that: (i) Compared to the existing HDCluster, our proposed HDC-based clustering algorithms can achieve better accuracy, more robust performance, fewer iterations, and less execution time. Similarity-based affinity propagation outperforms the other three HDC-based clustering algorithms on eight datasets by 2% ~ 38% in clustering accuracy. (ii) Even for one-pass clustering, i.e., without any iterative update of the cluster hypervectors, our proposed algorithms can provide more robust clustering accuracy than HDCluster. (iii) Over eight datasets, five out of eight can achieve higher or comparable accuracy when projected onto the hyperdimensional space. Traditional clustering is more desirable than HDC when the number of clusters, \u0000<inline-formula> <tex-math>$k$ </tex-math></inline-formula>\u0000, is large.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"102-116"},"PeriodicalIF":0.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10480378","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140314400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Small-Area 2nd-Order Adder-Less Continuous-Time ΔΣ Modulator With Pulse Shaping FIR DAC for Magnetic Sensing 用于磁感应的带脉冲整形 FIR DAC 的小面积无二阶加法连续时间 ΔΣ 调制器

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of circuits and systems

Pub Date : 2024-03-18 DOI: 10.1109/OJCAS.2024.3378653

Manish Srivastava;Alessandro Ferro;Aleksandr Sidun;José M. De La Rosa;Kilian O’Donoghue;Pádraig Cantillon-Murphy;Daniel O’Hare

This work presents a small-area 2nd-order continuous-time

$Delta Sigma $

Modulator (CT

$Delta Sigma text{M}$

) with a single low dropout regulator (LDO) serving as both the power supply for the CT

$Delta Sigma text{M}$

and reference voltage buffer. The CT

$Delta Sigma text{M}$

is used for digitising very low amplitude signals in applications such as magnetic tracking for image-guided and robotic surgery. A cascade of integrators in a feed-forward architecture implemented with an adder-less architecture has been proposed to minimise the silicon area. In addition, a novel continuous-time pulse-shaped digital-to-analog converter (CT-PS DAC) is proposed for excess loop delay (ELD) compensation to simplify the current drive requirements of the reference voltage buffer. This enables a single low-dropout (LDO) voltage regulator to generate both power supply and

$text{V}_{ref}$

for the DAC. The circuit has been designed in 65-nm CMOS technology, achieving a peak 82-dB SNDR and 91-dB DR within a signal bandwidth of 20 kHz and the CT

$Delta Sigma text{M}$

consumes

$300 ~mu text{W}$

of power when clocked at 10.24 MHz. The CT

$Delta Sigma text{M}$

achieves a state-of-the-art area of 0.07 mm2.

本研究提出了一种小面积二阶连续时间ΔΣ调制器（CT $Delta Sigma text{M}$），它采用单个低压差稳压器（LDO）作为 CT $Delta Sigma text{M}$的电源和参考电压缓冲器。CT $Delta Sigma text{M}$用于对图像引导和机器人手术的磁跟踪等应用中的极低振幅信号进行数字化。为了最大限度地减少硅片面积，我们提出了一种采用无加法器架构的前馈架构级联积分器。此外，还提出了一种新型连续时间脉冲型数模转换器（CT-PS DAC），用于过量环路延迟（ELD）补偿，以简化基准电压缓冲器的电流驱动要求。这样，单个低压差 (LDO) 稳压器就能为 DAC 提供电源和 $text{V}_{ref}$。该电路采用 65-nm CMOS 技术设计，在 20 kHz 信号带宽内实现了 82-dB SNDR 峰值和 91-dB DR 峰值，当时钟频率为 10.24 MHz 时，CT $Delta Sigma text{M}$ 的功耗为 300 ~mu text{W}$。CT $Delta Sigma text{M}$ 的最新面积为 0.07 mm2。

{"title":"A Small-Area 2nd-Order Adder-Less Continuous-Time ΔΣ Modulator With Pulse Shaping FIR DAC for Magnetic Sensing","authors":"Manish Srivastava;Alessandro Ferro;Aleksandr Sidun;José M. De La Rosa;Kilian O’Donoghue;Pádraig Cantillon-Murphy;Daniel O’Hare","doi":"10.1109/OJCAS.2024.3378653","DOIUrl":"10.1109/OJCAS.2024.3378653","url":null,"abstract":"This work presents a small-area 2nd-order continuous-time \u0000<inline-formula> <tex-math>$Delta Sigma $ </tex-math></inline-formula>\u0000 Modulator (CT\u0000<inline-formula> <tex-math>$Delta Sigma text{M}$ </tex-math></inline-formula>\u0000) with a single low dropout regulator (LDO) serving as both the power supply for the CT\u0000<inline-formula> <tex-math>$Delta Sigma text{M}$ </tex-math></inline-formula>\u0000 and reference voltage buffer. The CT\u0000<inline-formula> <tex-math>$Delta Sigma text{M}$ </tex-math></inline-formula>\u0000 is used for digitising very low amplitude signals in applications such as magnetic tracking for image-guided and robotic surgery. A cascade of integrators in a feed-forward architecture implemented with an adder-less architecture has been proposed to minimise the silicon area. In addition, a novel continuous-time pulse-shaped digital-to-analog converter (CT-PS DAC) is proposed for excess loop delay (ELD) compensation to simplify the current drive requirements of the reference voltage buffer. This enables a single low-dropout (LDO) voltage regulator to generate both power supply and \u0000<inline-formula> <tex-math>$text{V}_{ref}$ </tex-math></inline-formula>\u0000 for the DAC. The circuit has been designed in 65-nm CMOS technology, achieving a peak 82-dB SNDR and 91-dB DR within a signal bandwidth of 20 kHz and the CT\u0000<inline-formula> <tex-math>$Delta Sigma text{M}$ </tex-math></inline-formula>\u0000 consumes \u0000<inline-formula> <tex-math>$300 ~mu text{W}$ </tex-math></inline-formula>\u0000 of power when clocked at 10.24 MHz. The CT\u0000<inline-formula> <tex-math>$Delta Sigma text{M}$ </tex-math></inline-formula>\u0000 achieves a state-of-the-art area of 0.07 mm2.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"42-54"},"PeriodicalIF":0.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10475189","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140170286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

StrideHD: A Binary Hyperdimensional Computing System Utilizing Window Striding for Image Classification StrideHD：利用窗口步进进行图像分类的二进制超维计算系统

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of circuits and systems

Pub Date : 2024-03-14 DOI: 10.1109/OJCAS.2024.3401028

Dehua Liang;Jun Shiomi;Noriyuki Miura;Hiromitsu Awano

Hyper-Dimensional (HD) computing is a brain-inspired learning approach for efficient and fast learning on today’s embedded devices. HDC first encodes all data points to high-dimensional vectors called hypervectors and then efficiently performs the classification task using a well-defined set of operations. Although HDC achieved reasonable performances in several practical tasks, it comes with huge memory requirements since the data point should be stored in a very long vector having thousands of bits. To alleviate this problem, we propose a novel HDC architecture, called StrideHD. By utilizing the window striding in image classification, StrideHD enables HDC system to be trained and tested using binary hypervectors and achieves high accuracy with fast training speed and significantly low hardware resources. StrideHD encodes data points to distributed binary hypervectors and eliminates the expensive Channel item Memory (CiM) and item Memory (iM) in the encoder, which significantly reduces the required hardware cost for inference. Our evaluation also shows that compared with two popular HD algorithms, the singlepass StrideHD model achieves a 27.6

$times$

and 8.2

$times$

reduction in inference memory cost without hurting the classification accuracy, while the iterative mode further provides 8.7

$times$

memory efficiency. Under the same inference memory cost, our single-pass mode StrideHD averagely achieves 13.56% accuracy improvement in comparison with the single-pass baseline HD, which is a similar performance even in comparison with the costly iterative baseline HD models. As an extension, the iterative retraining mode of StrideHD averagely provides 11.33% accuracy improvement to its single-pass mode, which can be accomplished in fewer iterations in comparison with the baseline HD algorithms. The hardware implementation also demonstrates that StrideHD achieves over 9.9

$times$

and 28.8

$times$

reduction compared with baseline in area and power, respectively.

超维（HD）计算是一种受大脑启发的学习方法，可在当今的嵌入式设备上实现高效、快速的学习。HDC 首先将所有数据点编码为称为超向量的高维向量，然后使用一组定义明确的操作高效地执行分类任务。虽然 HDC 在一些实际任务中取得了合理的性能，但由于数据点应存储在一个有数千比特的超长向量中，因此它需要巨大的内存。为了缓解这一问题，我们提出了一种名为 StrideHD 的新型 HDC 架构。通过在图像分类中利用窗口跨步，StrideHD 使 HDC 系统能够使用二进制超向量进行训练和测试，并以较快的训练速度和显著较低的硬件资源达到较高的准确率。StrideHD 将数据点编码为分布式二进制超向量，省去了编码器中昂贵的通道项存储器（CiM）和项存储器（iM），从而大大降低了推理所需的硬件成本。我们的评估还表明，与两种流行的高清算法相比，单通道 StrideHD 模型在不影响分类准确性的情况下，分别降低了 27.6 美元/次和 8.2 美元/次的推理内存成本，而迭代模式则进一步提供了 8.7 美元/次的内存效率。在相同的推理内存成本下，与单通道基线高清模型相比，我们的单通道模式 StrideHD 平均提高了 13.56% 的准确率，即使与成本高昂的迭代基线高清模型相比，表现也相差无几。作为扩展，StrideHD 的迭代再训练模式比其单通模式平均提高了 11.33% 的准确率，与基线高清算法相比，可以在更少的迭代次数内完成。硬件实现也表明，与基线算法相比，StrideHD的面积和功耗分别减少了9.9倍和28.8倍。

{"title":"StrideHD: A Binary Hyperdimensional Computing System Utilizing Window Striding for Image Classification","authors":"Dehua Liang;Jun Shiomi;Noriyuki Miura;Hiromitsu Awano","doi":"10.1109/OJCAS.2024.3401028","DOIUrl":"10.1109/OJCAS.2024.3401028","url":null,"abstract":"Hyper-Dimensional (HD) computing is a brain-inspired learning approach for efficient and fast learning on today’s embedded devices. HDC first encodes all data points to high-dimensional vectors called hypervectors and then efficiently performs the classification task using a well-defined set of operations. Although HDC achieved reasonable performances in several practical tasks, it comes with huge memory requirements since the data point should be stored in a very long vector having thousands of bits. To alleviate this problem, we propose a novel HDC architecture, called StrideHD. By utilizing the window striding in image classification, StrideHD enables HDC system to be trained and tested using binary hypervectors and achieves high accuracy with fast training speed and significantly low hardware resources. StrideHD encodes data points to distributed binary hypervectors and eliminates the expensive Channel item Memory (CiM) and item Memory (iM) in the encoder, which significantly reduces the required hardware cost for inference. Our evaluation also shows that compared with two popular HD algorithms, the singlepass StrideHD model achieves a 27.6\u0000<inline-formula> <tex-math>$times$ </tex-math></inline-formula>\u0000 and 8.2\u0000<inline-formula> <tex-math>$times$ </tex-math></inline-formula>\u0000 reduction in inference memory cost without hurting the classification accuracy, while the iterative mode further provides 8.7\u0000<inline-formula> <tex-math>$times$ </tex-math></inline-formula>\u0000 memory efficiency. Under the same inference memory cost, our single-pass mode StrideHD averagely achieves 13.56% accuracy improvement in comparison with the single-pass baseline HD, which is a similar performance even in comparison with the costly iterative baseline HD models. As an extension, the iterative retraining mode of StrideHD averagely provides 11.33% accuracy improvement to its single-pass mode, which can be accomplished in fewer iterations in comparison with the baseline HD algorithms. The hardware implementation also demonstrates that StrideHD achieves over 9.9\u0000<inline-formula> <tex-math>$times$ </tex-math></inline-formula>\u0000 and 28.8\u0000<inline-formula> <tex-math>$times$ </tex-math></inline-formula>\u0000 reduction compared with baseline in area and power, respectively.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"211-223"},"PeriodicalIF":0.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10530353","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141064078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Impact of Frequency Heterogeneity on Mutually Synchronized Spatially Distributed 24 GHz PLLs 频率异质性对相互同步的空间分布式 24 GHz PLL 的影响

Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of circuits and systems

Pub Date : 2024-03-02 DOI: 10.1109/OJCAS.2024.3396336

Christian Hoyer;Jens Wagner;Frank Ellinger

This research analyzes the mutual self-organized synchronization of phase-locked loops (PLLs) in the presence of variations in the free-running frequency of a PLL. In contrast to traditional synchronization methods that rely on a reference signal, this study investigates the synchronization dynamics that arise solely from the interactions of PLL nodes within a network. Previous research has proposed theoretical frameworks that can predict the synchronized states of such designs. However, these frameworks do not account for the dynamic behavior that occurs during initial synchronization. To address this gap, this work proposes a constraint that refines the understanding of initial synchronization. The results of this analysis show that there is a maximum detuning between free-running frequencies up to which synchronization is possible. Furthermore, this analysis indicates that detuning not only affects the range of time delays at which stable synchronized states emerge between PLL nodes, but also limits the allowable range of initial phase differences for stable synchronization. In the cases studied, a frequency difference of 1.56% reduces the probability of achieving stable synchronized states through self-organized synchronization to 73.5%, while no stable synchronization can be achieved at a frequency difference greater than 2.65%. The study underscores the critical importance of operating ranges when implementing mutual coupling. In particular, all PLL nodes must have overlapping lock ranges to achieve stable synchronization. It also emphasizes the need for accurate analysis of hold and lock ranges in relation to the time delays between coupled PLL nodes.

本研究分析了锁相环（PLL）自由运行频率变化时的相互自组织同步。与依赖参考信号的传统同步方法不同，本研究调查的同步动态完全来自网络内锁相环节点的相互作用。以往的研究提出了可以预测此类设计同步状态的理论框架。然而，这些框架并没有考虑到初始同步期间发生的动态行为。为弥补这一不足，本研究提出了一种约束条件，以完善对初始同步的理解。分析结果表明，自由运行频率之间存在一个最大失谐，在此失谐范围内，同步是可能的。此外，该分析表明，失谐不仅会影响 PLL 节点之间出现稳定同步状态的时间延迟范围，还会限制稳定同步所允许的初始相位差范围。在所研究的案例中，1.56% 的频率差将通过自组织同步实现稳定同步状态的概率降至 73.5%，而当频率差大于 2.65% 时则无法实现稳定同步。这项研究强调了在实施相互耦合时工作范围的重要性。特别是，所有 PLL 节点必须有重叠的锁定范围，才能实现稳定同步。研究还强调，需要根据耦合 PLL 节点之间的时间延迟准确分析保持和锁定范围。

{"title":"Impact of Frequency Heterogeneity on Mutually Synchronized Spatially Distributed 24 GHz PLLs","authors":"Christian Hoyer;Jens Wagner;Frank Ellinger","doi":"10.1109/OJCAS.2024.3396336","DOIUrl":"10.1109/OJCAS.2024.3396336","url":null,"abstract":"This research analyzes the mutual self-organized synchronization of phase-locked loops (PLLs) in the presence of variations in the free-running frequency of a PLL. In contrast to traditional synchronization methods that rely on a reference signal, this study investigates the synchronization dynamics that arise solely from the interactions of PLL nodes within a network. Previous research has proposed theoretical frameworks that can predict the synchronized states of such designs. However, these frameworks do not account for the dynamic behavior that occurs during initial synchronization. To address this gap, this work proposes a constraint that refines the understanding of initial synchronization. The results of this analysis show that there is a maximum detuning between free-running frequencies up to which synchronization is possible. Furthermore, this analysis indicates that detuning not only affects the range of time delays at which stable synchronized states emerge between PLL nodes, but also limits the allowable range of initial phase differences for stable synchronization. In the cases studied, a frequency difference of 1.56% reduces the probability of achieving stable synchronized states through self-organized synchronization to 73.5%, while no stable synchronization can be achieved at a frequency difference greater than 2.65%. The study underscores the critical importance of operating ranges when implementing mutual coupling. In particular, all PLL nodes must have overlapping lock ranges to achieve stable synchronization. It also emphasizes the need for accurate analysis of hold and lock ranges in relation to the time delays between coupled PLL nodes.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"199-210"},"PeriodicalIF":0.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10517955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140836500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0