Pub Date : 2024-04-24DOI: 10.1016/j.vlsi.2024.102199
Hui Xu , Shuo Zhu , Ruijun Ma , Zhengfeng Huang , Huaguo Liang , Haojie Sun , Chaoming Liu
CMOS devices are increasingly affected by triple-node-upset as transistor characteristics reduce, particularly in radiation environments. For the shortcomings of the existing radiation hardened designs, including high overhead and high delay, this paper proposes a novel low cost triple-node-upset self-recoverable latch. Simulation results show that compared with the existing triple-node-upset hardened designs, the proposed latch has reduced power consumption, delay, and power-delay product by 34.57 %, 6.42 %, and 34.98 %, respectively.
{"title":"Design of novel low cost triple-node-upset self-recoverable hardened latch","authors":"Hui Xu , Shuo Zhu , Ruijun Ma , Zhengfeng Huang , Huaguo Liang , Haojie Sun , Chaoming Liu","doi":"10.1016/j.vlsi.2024.102199","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102199","url":null,"abstract":"<div><p>CMOS devices are increasingly affected by triple-node-upset as transistor characteristics reduce, particularly in radiation environments. For the shortcomings of the existing radiation hardened designs, including high overhead and high delay, this paper proposes a novel low cost triple-node-upset self-recoverable latch. Simulation results show that compared with the existing triple-node-upset hardened designs, the proposed latch has reduced power consumption, delay, and power-delay product by 34.57 %, 6.42 %, and 34.98 %, respectively.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140650477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-24DOI: 10.1016/j.vlsi.2024.102201
Lin Jiang , Anthony Dowling, Yu Liu, Ming-C. Cheng
An ensemble data-learning approach based on proper orthogonal decomposition (POD) and Galerkin projection (EnPOD-GP) is proposed for thermal simulations of multi-core CPUs to improve training efficiency and the model accuracy for a previously developed global POD-GP method (GPOD-GP). GPOD-GP generates one set of basis functions (or POD modes) to account for thermal behavior in response to variations in dynamic power maps (PMs) in the entire chip, which is computationally intensive to cover possible variations of all power sources. EnPOD-GP however acquires multiple sets of POD modes to significantly improve training efficiency and effectiveness, and its simulation accuracy is independent of any dynamic PM. Compared to finite element simulation, both GPOD-GP and EnPOD-GP offer a computational speedup over 3 orders of magnitude. For a processor with a small number of cores, GPOD-GP provides a more efficient approach. When high accuracy is desired and/or a processor with more cores is involved, EnPOD-GP is more preferable in terms of training effort and simulation accuracy and efficiency. Additionally, the error resulting from EnPOD-GP can be precisely predicted for any random spatiotemporal power excitation.
针对多核 CPU 的热仿真,提出了一种基于适当正交分解(POD)和 Galerkin 投影(EnPOD-GP)的集合数据学习方法,以提高先前开发的全局 POD-GP 方法(GPOD-GP)的训练效率和模型精度。GPOD-GP 生成一组基函数(或 POD 模式)来解释热行为,以响应整个芯片中动态功率图 (PM) 的变化,这需要大量计算才能涵盖所有功率源的可能变化。然而,EnPOD-GP 可获取多组 POD 模式,从而显著提高训练效率和效果,而且其仿真精度与任何动态 PM 无关。与有限元模拟相比,GPOD-GP 和 EnPOD-GP 的计算速度提高了 3 个数量级。对于内核数量较少的处理器,GPOD-GP 提供了一种更高效的方法。当需要高精度和/或更多内核的处理器时,EnPOD-GP 在训练工作量、仿真精度和效率方面更为可取。此外,对于任何随机时空功率激励,EnPOD-GP 产生的误差都可以精确预测。
{"title":"Ensemble learning model for effective thermal simulation of multi-core CPUs","authors":"Lin Jiang , Anthony Dowling, Yu Liu, Ming-C. Cheng","doi":"10.1016/j.vlsi.2024.102201","DOIUrl":"10.1016/j.vlsi.2024.102201","url":null,"abstract":"<div><p>An ensemble data-learning approach based on proper orthogonal decomposition (POD) and Galerkin projection (EnPOD-GP) is proposed for thermal simulations of multi-core CPUs to improve training efficiency and the model accuracy for a previously developed global POD-GP method (GPOD-GP). GPOD-GP generates one set of basis functions (or POD modes) to account for thermal behavior in response to variations in dynamic power maps (PMs) in the entire chip, which is computationally intensive to cover possible variations of all power sources. EnPOD-GP however acquires multiple sets of POD modes to significantly improve training efficiency and effectiveness, and its simulation accuracy is independent of any dynamic PM. Compared to finite element simulation, both GPOD-GP and EnPOD-GP offer a computational speedup over 3 orders of magnitude. For a processor with a small number of cores, GPOD-GP provides a more efficient approach. When high accuracy is desired and/or a processor with more cores is involved, EnPOD-GP is more preferable in terms of training effort and simulation accuracy and efficiency. Additionally, the error resulting from EnPOD-GP can be precisely predicted for any random spatiotemporal power excitation.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167926024000658/pdfft?md5=1bfea626d6bed7a5cf9433aa649eaf0a&pid=1-s2.0-S0167926024000658-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140783197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-23DOI: 10.1016/j.vlsi.2024.102197
Mohamed Salah Azzaz, Redouane Kaibou, Bachir Madani
In this paper a new encryption system has been designed and implemented for real-time speech transmission to reduce bandwidth requirements, increase security and minimize residual intelligibility. To guarantee robustness and lightweight computation, the developed cryptosystem has been carried out in the wavelet transform domain based on a hyperchaotic model to generate mask and permutation keys. The cryptographic system has been designed using a hardware-software (HW/SW) co-design approach by developing several IP-cores in a relatively short development time. The performances and security evaluation of the system have been validated through simulation results followed by an experimental validation through the implementation of an encrypted speech signal transmission between two low cost Nexys-4 DDR FPGA platforms, operating in real-time for both wired and wireless communications. Compared to similar works, high performances have been obtained in terms of bandwidth efficiency due to the use of DWT, limited area of FPGA resources, low power consumption and high security level with a large keyspace that is sufficient to resist against brute force attacks. The designed system can be a very useful solution for many real-time secure integrated voice communication systems, multiple communication purposes, military, professional or personal high level of conversations security.
本文为实时语音传输设计并实施了一种新的加密系统,以降低带宽要求、提高安全性并最大限度地减少残余可懂度。为保证稳健性和轻量级计算,所开发的加密系统在小波变换域中基于超混沌模型生成掩码和置换密钥。加密系统的设计采用了硬件/软件(HW/SW)协同设计方法,在相对较短的开发时间内开发了多个 IP 核。系统的性能和安全性评估已通过仿真结果得到验证,随后通过在两个低成本 Nexys-4 DDR FPGA 平台之间实现加密语音信号传输进行了实验验证,实时运行于有线和无线通信。与同类研究相比,由于使用了 DWT,该系统在带宽效率、有限的 FPGA 资源面积、低功耗和高安全级别等方面都取得了很高的性能。所设计的系统对于许多实时安全综合语音通信系统、多种通信用途、军事、专业或个人高级对话安全都是非常有用的解决方案。
{"title":"Co-design based FPGA implementation of an efficient new speech hyperchaotic cryptosystem in the transform domain","authors":"Mohamed Salah Azzaz, Redouane Kaibou, Bachir Madani","doi":"10.1016/j.vlsi.2024.102197","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102197","url":null,"abstract":"<div><p>In this paper a new encryption system has been designed and implemented for real-time speech transmission to reduce bandwidth requirements, increase security and minimize residual intelligibility. To guarantee robustness and lightweight computation, the developed cryptosystem has been carried out in the wavelet transform domain based on a hyperchaotic model to generate mask and permutation keys. The cryptographic system has been designed using a hardware-software (HW/SW) co-design approach by developing several IP-cores in a relatively short development time. The performances and security evaluation of the system have been validated through simulation results followed by an experimental validation through the implementation of an encrypted speech signal transmission between two low cost Nexys-4 DDR FPGA platforms, operating in real-time for both wired and wireless communications. Compared to similar works, high performances have been obtained in terms of bandwidth efficiency due to the use of DWT, limited area of FPGA resources, low power consumption and high security level with a large keyspace that is sufficient to resist against brute force attacks. The designed system can be a very useful solution for many real-time secure integrated voice communication systems, multiple communication purposes, military, professional or personal high level of conversations security.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140650478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-20DOI: 10.1016/j.vlsi.2024.102195
Zhe Sun , Zimeng Zhou , Fang-Wei Fu
With the increasing application of IoT devices, the memory subsystem, as the performance and energy bottleneck of IoT systems, has received a lot of attention. One of the keys is on-chip memory which can bridge the performance gap between the CPU and main memory. While many off-the-shelf embedded processors utilize the hybrid on-chip memory architecture containing scratchpad memories (SPMs) and caches, most existing literature ignores the collaboration between caches and SPMs. This paper proposes static SPM allocation strategies for the architecture mentioned above in IoT systems, which try to minimize the overall instruction memory subsystem latency and/or energy consumption. We capture the intra- and inter-task cache conflict misses via a fine-grained temporal cache behavior model. Based on this cache conflict information, we propose an integer linear programming (ILP) algorithm to generate an optimal static function level SPM allocation for system performance. Furthermore, to improve the scalability of the proposed allocation scheme for an enormous task set, we offer the interference factor to calculate the interference impact quantitatively. Then, based on the interference factor, we present two approximate knapsack based heuristic algorithms to provide near optimal static allocation schemes at both function- and basic block-level granularities, which favors fast design space exploration. The experiment results demonstrate that the proposed solution achieves a 30.85% improvement in memory performance, and up to 31.39% reduction in energy consumption, compared to the existing SPM allocation scheme at the function level. In addition, the proposed basic block level allocation algorithm shows better performance than our function level allocation algorithm and other basic block level allocation algorithm.
{"title":"Optimizing code allocation for hybrid on-chip memory in IoT systems","authors":"Zhe Sun , Zimeng Zhou , Fang-Wei Fu","doi":"10.1016/j.vlsi.2024.102195","DOIUrl":"10.1016/j.vlsi.2024.102195","url":null,"abstract":"<div><p>With the increasing application of IoT devices, the memory subsystem, as the performance and energy bottleneck of IoT systems, has received a lot of attention. One of the keys is on-chip memory which can bridge the performance gap between the CPU and main memory. While many off-the-shelf embedded processors utilize the hybrid on-chip memory architecture containing scratchpad memories (SPMs) and caches, most existing literature ignores the collaboration between caches and SPMs. This paper proposes static SPM allocation strategies for the architecture mentioned above in IoT systems, which try to minimize the overall instruction memory subsystem latency and/or energy consumption. We capture the intra- and inter-task cache conflict misses via a fine-grained temporal cache behavior model. Based on this cache conflict information, we propose an integer linear programming (ILP) algorithm to generate an optimal static function level SPM allocation for system performance. Furthermore, to improve the scalability of the proposed allocation scheme for an enormous task set, we offer the interference factor to calculate the interference impact quantitatively. Then, based on the interference factor, we present two approximate knapsack based heuristic algorithms to provide near optimal static allocation schemes at both function- and basic block-level granularities, which favors fast design space exploration. The experiment results demonstrate that the proposed solution achieves a 30.85% improvement in memory performance, and up to 31.39% reduction in energy consumption, compared to the existing SPM allocation scheme at the function level. In addition, the proposed basic block level allocation algorithm shows better performance than our function level allocation algorithm and other basic block level allocation algorithm.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140794438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-15DOI: 10.1016/j.vlsi.2024.102196
Yu Xie, Qiang Lai
It is a universally acknowledged fact that memristor is widely used in neural networks owing to its memory functions similar to synapses. This paper aims to construct a memristive neural network (MNN) with special dynamic behaviors and structure, which consists of four cyclic neurons and one unidirectional memristive synapse. In this study, we explored the dynamic behaviors, including asymmetric coexisting attractors and parameter-relied large-scale amplitude control. Specially, we found that there are four different types of asymmetric coexisting attractors, namely coexisting double-point (or periodic or chaotic) attractors and coexisting periodic and chaotic attractors. In order to reveal the characteristics of large-scale amplitude control, we used analysis methods such as phase plane plots and time sequences. The existence of this phenomenon is closely related to system parameters and initial values. Meanwhile, a specific circuit experiment is implemented to verify the feasibility of our designation.
{"title":"A memristive neural network with features of asymmetric coexisting attractors and large-scale amplitude control","authors":"Yu Xie, Qiang Lai","doi":"10.1016/j.vlsi.2024.102196","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102196","url":null,"abstract":"<div><p>It is a universally acknowledged fact that memristor is widely used in neural networks owing to its memory functions similar to synapses. This paper aims to construct a memristive neural network (MNN) with special dynamic behaviors and structure, which consists of four cyclic neurons and one unidirectional memristive synapse. In this study, we explored the dynamic behaviors, including asymmetric coexisting attractors and parameter-relied large-scale amplitude control. Specially, we found that there are four different types of asymmetric coexisting attractors, namely coexisting double-point (or periodic or chaotic) attractors and coexisting periodic and chaotic attractors. In order to reveal the characteristics of large-scale amplitude control, we used analysis methods such as phase plane plots and time sequences. The existence of this phenomenon is closely related to system parameters and initial values. Meanwhile, a specific circuit experiment is implemented to verify the feasibility of our designation.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140557935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work aims to improves the chaotic behavior of classical logistic chaotic system for voice encryption. In this study, the classical chaotic system was enhanced. This enhanced map has many advantages like a wider chaotic range, more unpredictable, and better ergodicity than many existing chaotic maps (i.e. including 1D and 2D maps). The effectiveness of the improved chaotic system was verified by the bifurcation diagram, performing NIST SP 800-22 and Lyapunov exponent. On this basis, an efficient tweakable voice encryption algorithm was proposed to protect the security of digital voice transmission. The proposed scheme is based on the speech signal being pre-processed to automatically remove silent or voiceless segments, resulting in the extraction of relevant parts of speech for encryption. This leads to a significant reduction in both computing time and resources requirements, as well as the confusion-diffusion architecture. With the aid of the tweak, where each original voice has multiple different encrypted voices using the same secret key which saves time and makes the cost lower compared to changing the key to the proposed scheme. These features make the proposed speech encryption algorithm suitable for real-time communication. In this manner, it is demonstrated that our encryption system effectively withstands known/chosen plaintext attacks. The experimental results demonstrate that the proposed algorithm can withstand several types of attacks through voice encryption. The research results shed new light on the data security in the transmission of voices.
{"title":"An enhanced logistic chaotic map based tweakable speech encryption algorithm","authors":"Herbadji Djamel , Abderrahmane Herbadji , Ismail haddad , Hichem Kahia , Aissa Belmeguenai , Nadir Derouiche","doi":"10.1016/j.vlsi.2024.102192","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102192","url":null,"abstract":"<div><p>This work aims to improves the chaotic behavior of classical logistic chaotic system for voice encryption. In this study, the classical chaotic system was enhanced. This enhanced map has many advantages like a wider chaotic range, more unpredictable, and better ergodicity than many existing chaotic maps (i.e. including 1D and 2D maps). The effectiveness of the improved chaotic system was verified by the bifurcation diagram, performing NIST SP 800-22 and Lyapunov exponent. On this basis, an efficient tweakable voice encryption algorithm was proposed to protect the security of digital voice transmission. The proposed scheme is based on the speech signal being pre-processed to automatically remove silent or voiceless segments, resulting in the extraction of relevant parts of speech for encryption. This leads to a significant reduction in both computing time and resources requirements, as well as the confusion-diffusion architecture. With the aid of the tweak, where each original voice has multiple different encrypted voices using the same secret key which saves time and makes the cost lower compared to changing the key to the proposed scheme. These features make the proposed speech encryption algorithm suitable for real-time communication. In this manner, it is demonstrated that our encryption system effectively withstands known/chosen plaintext attacks. The experimental results demonstrate that the proposed algorithm can withstand several types of attacks through voice encryption. The research results shed new light on the data security in the transmission of voices.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140535665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01DOI: 10.1016/j.vlsi.2024.102193
Yassine Attaoui , Mohamed Chentouf , Zine El Abidine Alaoui Ismaili , Aimad El Mourabit
Nowadays, the ASIC design is increasing in complexity, and PPA targets are pushed to the limit. The lack of physical information at the early design stages hinders precise timing predictions and may lead to design re-spins. In previous work, we successfully improved timing prediction at the post-placement stage using the Random Forest model, achieving 91.25% cell delay accuracy. Building upon this, we further investigate the potential of Ensemble Tree-based algorithms, specifically focusing on “Extremely Randomized Trees” and “Gradient Boosting”, to close the gap in cell delay accuracy. In this paper, we enrich the training dataset with new 16 nm industrial designs. The results demonstrate a substantial improvement, with an average cell delay accuracy of 92.01% and 84.26% on unseen data. The average Root-Mean-Square-Error is significantly reduced from 12.11 to 3.23 and 7.76 on unseen data.
{"title":"Enhancing cell delay accuracy in post-placed netlists using ensemble tree-based algorithms","authors":"Yassine Attaoui , Mohamed Chentouf , Zine El Abidine Alaoui Ismaili , Aimad El Mourabit","doi":"10.1016/j.vlsi.2024.102193","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102193","url":null,"abstract":"<div><p>Nowadays, the ASIC design is increasing in complexity, and PPA targets are pushed to the limit. The lack of physical information at the early design stages hinders precise timing predictions and may lead to design re-spins. In previous work, we successfully improved timing prediction at the post-placement stage using the <em>Random Forest</em> model, achieving 91.25% cell delay accuracy. Building upon this, we further investigate the potential of <em>Ensemble Tree-based</em> algorithms, specifically focusing on “<em>Extremely Randomized Trees</em>” and “<em>Gradient Boosting</em>”, to close the gap in cell delay accuracy. In this paper, we enrich the training dataset with new 16 nm industrial designs. The results demonstrate a substantial improvement, with an average cell delay accuracy of <strong>92.01%</strong> and <strong>84.26%</strong> on unseen data. The average Root-Mean-Square-Error is significantly reduced from <strong>12.11</strong> to <strong>3.23</strong> and <strong>7.76</strong> on unseen data.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140344290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01DOI: 10.1016/j.vlsi.2024.102191
Yavar Safaei Mehrabani , Reza Faghih Mirzaee
As the number of transistors on a chip surface increases, power consumption becomes more and more a serious concern. A promising solution to bridge the gap between resource-constrained gadgets and computation-intensive applications could be the approximate computing paradigm. This paper presents four efficient approximate full adder cells based on dynamic logic and carbon nanotube field-effect transistors (CNFETs). To the best of our knowledge, dynamic logic has never been deployed in the design of approximate full adders before. Comprehensive simulations and analyses are conducted to study the efficacy of the new circuits. Simulation results indicate remarkable improvements compared to state-of-the-art circuits. For instance, at 0.9 V power supply, our final proposed design improves the power-delay-area product (PDAP) metric by at least 63% compared to its peers. Moreover, the applicability of the proposed adders in the image sharpening application is examined by measuring peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) using the MATLAB tool. The proposed designs have also a reasonable performance in this regard.
随着芯片表面晶体管数量的增加,功耗越来越成为一个令人担忧的问题。近似计算范式是缩小资源受限的小工具与计算密集型应用之间差距的一个有前途的解决方案。本文介绍了四种基于动态逻辑和碳纳米管场效应晶体管(CNFET)的高效近似全加法器单元。据我们所知,动态逻辑以前从未用于近似全加法器的设计。为了研究新电路的功效,我们进行了全面的模拟和分析。仿真结果表明,与最先进的电路相比,新电路的性能有了显著提高。例如,在 0.9 V 电源条件下,我们最终提出的设计与同类产品相比,功率-延迟-面积乘积 (PDAP) 指标至少提高了 63%。此外,通过使用 MATLAB 工具测量峰值信噪比(PSNR)和结构相似性指数(SSIM),检验了所提出的加法器在图像锐化应用中的适用性。所提出的设计在这方面也有合理的表现。
{"title":"DAFA: Dynamic approximate full adders for high area and energy efficiency","authors":"Yavar Safaei Mehrabani , Reza Faghih Mirzaee","doi":"10.1016/j.vlsi.2024.102191","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102191","url":null,"abstract":"<div><p>As the number of transistors on a chip surface increases, power consumption becomes more and more a serious concern. A promising solution to bridge the gap between resource-constrained gadgets and computation-intensive applications could be the approximate computing paradigm. This paper presents four efficient approximate full adder cells based on dynamic logic and carbon nanotube field-effect transistors (CNFETs). To the best of our knowledge, dynamic logic has never been deployed in the design of approximate full adders before. Comprehensive simulations and analyses are conducted to study the efficacy of the new circuits. Simulation results indicate remarkable improvements compared to state-of-the-art circuits. For instance, at 0.9 V power supply, our final proposed design improves the power-delay-area product (PDAP) metric by at least 63% compared to its peers. Moreover, the applicability of the proposed adders in the image sharpening application is examined by measuring peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) using the MATLAB tool. The proposed designs have also a reasonable performance in this regard.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140535994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-26DOI: 10.1016/j.vlsi.2024.102190
Qingping Zhang , Wenfa Zhan , Xiaoqing Wen
A die-level design-for-test architecture for 3D stacked ICs is proposed. The main component of this architecture is a newly proposed configurable boundary cell, based on which flexible parallel test is achieved. Both of the number of parallel scan chains and their lengths can be configured during test. This test architecture features light-weight, high flexibility in parallel test configuration, modularity, and IEEE P1149.1 compatibility. In this work, both infrastructure and implementation aspects are illustrated. Experimental results demonstrate desired test acceleration. The acceleration ratio approximately reaches its limit, which equals the number of parallel scan chains, when the number of test vectors is over 300.
{"title":"A new die-level flexible design-for-test architecture for 3D stacked ICs","authors":"Qingping Zhang , Wenfa Zhan , Xiaoqing Wen","doi":"10.1016/j.vlsi.2024.102190","DOIUrl":"10.1016/j.vlsi.2024.102190","url":null,"abstract":"<div><p>A die-level design-for-test architecture for 3D stacked ICs is proposed. The main component of this architecture is a newly proposed configurable boundary cell, based on which flexible parallel test is achieved. Both of the number of parallel scan chains and their lengths can be configured during test. This test architecture features light-weight, high flexibility in parallel test configuration, modularity, and IEEE P1149.1 compatibility. In this work, both infrastructure and implementation aspects are illustrated. Experimental results demonstrate desired test acceleration. The acceleration ratio approximately reaches its limit, which equals the number of parallel scan chains, when the number of test vectors is over 300.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140406110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}