Pub Date : 2024-05-08DOI: 10.1016/j.vlsi.2024.102205
Liwen Zhang, He Yang, Chen Yang, Jincan Zhang, Jinchan Wang
The single-objective and single-parameter optimization method is commonly used in the structure optimization of TSV to improve the transmission characteristics, for which a structure design scheme that simultaneously satisfies multiple target requirements is difficult to obtain. Moreover, the method cannot simultaneously optimize different design parameters. Aiming at the above problems, a global optimization method based on the grey wolf optimization (GWO) algorithm and artificial neural network (ANN) model is proposed. With the presented mixed dielectric coaxial-annular TSV model, firstly six key design parameters A-F are selected as optimization variables by the control variable method. The L25(56) orthogonal experiment is designed for Taguchi analysis and analysis of variance (ANOVA). Then, three prediction models, ANN, support vector machine (SVM), and extreme learning machine (ELM), are developed with the extended orthogonal data as the training sets. It is found that the ANN model performed best. To search for the global optimal solution, the genetic algorithm (GA) and GWO algorithm, combined with the ANN model are applied, respectively. The results show that the GWO algorithm is more successful in solving the problem of falling into the local optimum than GA, and the convergence speed is faster and more stable. After GWO-ANN optimization, the performance of each S-parameter index is greatly improved, S11 reduces by 14.05 dB, S21 increases by 0.33 dB, and S31 reduces by 12.50 dB at 30 GHz.
{"title":"Optimal design of mixed dielectric coaxial-annular TSV using GWO algorithm based on artificial neural network","authors":"Liwen Zhang, He Yang, Chen Yang, Jincan Zhang, Jinchan Wang","doi":"10.1016/j.vlsi.2024.102205","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102205","url":null,"abstract":"<div><p>The single-objective and single-parameter optimization method is commonly used in the structure optimization of TSV to improve the transmission characteristics, for which a structure design scheme that simultaneously satisfies multiple target requirements is difficult to obtain. Moreover, the method cannot simultaneously optimize different design parameters. Aiming at the above problems, a global optimization method based on the grey wolf optimization (GWO) algorithm and artificial neural network (ANN) model is proposed. With the presented mixed dielectric coaxial-annular TSV model, firstly six key design parameters A-F are selected as optimization variables by the control variable method. The L<sub>25</sub>(5<sup>6</sup>) orthogonal experiment is designed for Taguchi analysis and analysis of variance (ANOVA). Then, three prediction models, ANN, support vector machine (SVM), and extreme learning machine (ELM), are developed with the extended orthogonal data as the training sets. It is found that the ANN model performed best. To search for the global optimal solution, the genetic algorithm (GA) and GWO algorithm, combined with the ANN model are applied, respectively. The results show that the GWO algorithm is more successful in solving the problem of falling into the local optimum than GA, and the convergence speed is faster and more stable. After GWO-ANN optimization, the performance of each <em>S</em>-parameter index is greatly improved, <em>S</em><sub>11</sub> reduces by 14.05 dB, <em>S</em><sub>21</sub> increases by 0.33 dB, and <em>S</em><sub>31</sub> reduces by 12.50 dB at 30 GHz.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102205"},"PeriodicalIF":1.9,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140948440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-07DOI: 10.1016/j.vlsi.2024.102204
Zahra Hashemi, Mostafa Yargholi
A CMOS fully differential multipath two-stage operational transconductance amplifier (OTA) with boosted slew rate and power efficiency is proposed in this paper. The new OTA consists of two gain stages. The basic structure of the proposed OTA is the recycling folded cascode (RFC) structure. By using the multipath technique in the first stage of the proposed OTA, it leads to an increase in gain and a decrease in power consumption. In addition, a high-speed current mirror is applied to increase the phase margin. The second stage with a class-AB amplifier is used to increase the transconductance and slew rate of the output. Moreover, the power efficiency of the proposed OTA is boosted compared to the recycling double-folded cascode (RDFC) OTA. This makes the proposed OTA more appropriate for applications that require low power consumption, such as neural amplifiers. Design and simulation of the proposed OTA is done in 0.18 μm standard CMOS technology with a 1 V supply voltage. Post-layout simulation results of the proposed OTA demonstrate that the OTA dissipates 180 nW of power, while showing a 136.7 dB voltage gain, and 127.1 kHz unity gain frequency for a capacitive load of 30 pF. Thus, compared to the RDFC OTA, the proposed OTA provides a 250 % increase in slew rate and a 20 % increase in PSRR and CMRR, while power consumption is reduced by 10 %. The proposed OTA is robust against process, voltage, and temperature (PVT) variations. The recommended OTA achieves a good figure of merit (FOM) over the previous OTAs.
本文提出了一种具有更高转速和能效的 CMOS 全差分多路两级运算转导放大器(OTA)。新型 OTA 由两个增益级组成。拟议 OTA 的基本结构是循环折叠级联(RFC)结构。通过在第一级 OTA 中使用多路径技术,可提高增益并降低功耗。此外,还采用了高速电流镜来增加相位裕量。带有 AB 类放大器的第二级用于提高输出的跨导和压摆率。此外,与循环双折叠级联(RDFC)OTA 相比,拟议 OTA 的功率效率有所提高。这使得拟议的 OTA 更适合需要低功耗的应用,如神经放大器。拟议 OTA 的设计和仿真采用 0.18 μm 标准 CMOS 技术,电源电压为 1 V。拟议 OTA 的布局后仿真结果表明,该 OTA 的耗散功率为 180 nW,电压增益为 136.7 dB,在 30 pF 的电容负载条件下,单位增益频率为 127.1 kHz。因此,与 RDFC OTA 相比,拟议 OTA 的压摆率提高了 250%,PSRR 和 CMRR 提高了 20%,同时功耗降低了 10%。建议的 OTA 对工艺、电压和温度(PVT)变化具有鲁棒性。与之前的 OTA 相比,推荐的 OTA 实现了良好的性能指标(FOM)。
{"title":"Design of CMOS fully differential multipath two-stage OTA with boosted slew rate and power efficiency","authors":"Zahra Hashemi, Mostafa Yargholi","doi":"10.1016/j.vlsi.2024.102204","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102204","url":null,"abstract":"<div><p>A CMOS fully differential multipath two-stage operational transconductance amplifier (OTA) with boosted slew rate and power efficiency is proposed in this paper. The new OTA consists of two gain stages. The basic structure of the proposed OTA is the recycling folded cascode (RFC) structure. By using the multipath technique in the first stage of the proposed OTA, it leads to an increase in gain and a decrease in power consumption. In addition, a high-speed current mirror is applied to increase the phase margin. The second stage with a class-AB amplifier is used to increase the transconductance and slew rate of the output. Moreover, the power efficiency of the proposed OTA is boosted compared to the recycling double-folded cascode (RDFC) OTA. This makes the proposed OTA more appropriate for applications that require low power consumption, such as neural amplifiers. Design and simulation of the proposed OTA is done in 0.18 μm standard CMOS technology with a 1 V supply voltage. Post-layout simulation results of the proposed OTA demonstrate that the OTA dissipates 180 nW of power, while showing a 136.7 dB voltage gain, and 127.1 kHz unity gain frequency for a capacitive load of 30 pF. Thus, compared to the RDFC OTA, the proposed OTA provides a 250 % increase in slew rate and a 20 % increase in PSRR and CMRR, while power consumption is reduced by 10 %. The proposed OTA is robust against process, voltage, and temperature (PVT) variations. The recommended OTA achieves a good figure of merit (FOM) over the previous OTAs.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102204"},"PeriodicalIF":1.9,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140918517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-07DOI: 10.1016/j.vlsi.2024.102206
Yildiran Yilmaz , Fatih Gül
Computer-based machine learning algorithms that produce impressive performance results are computationally demanding and thus subject to high energy consumption during training and testing. Therefore, compact neuro-inspired devices are required to achieve efficiency in hardware resource consumption for the smooth implementation of neural network applications that require low energy and area. In this paper, learning characteristics and performances of the nanoscale titanium dioxide () based synaptic device have been analyzed by implementing it in the hardware-based neural network for digit classification. Our model is experimentally validated by using 32-nm CMOS technology and the results demonstrate that the model provides high computational ability with better accuracy and efficiency in resource consumption with low energy and less area. The proposed model exhibits 20% energy gain and 16.82% accuracy improvement and 18% less total latency compared to the state-of-the-art : synaptic device-based neural network. Furthermore, when compared to the software-based (i.e., computer-based) implementation of neural networks, our -based model not only achieved an impressive accuracy rate of 90.01% on the MNIST dataset but also did so with reduced energy consumption. Consequently, our model, characterized by a low hardware implementation cost, emerges as a promising neuro-inspired hardware solution for various neural network applications. The proposed model has further demonstrated outstanding performance in experiments involving both the MNIST and Fisher’s Iris datasets. On the latter dataset, the model exhibited notable precision (94.5%), recall (91.5%), and an impressive F1-score (92.9%), accompanied by a commendable accuracy rate of 93.04%.
{"title":"Neuro-inspired hardware solutions for high-performance computing: A TiO2-based nano-synaptic device approach with backpropagation","authors":"Yildiran Yilmaz , Fatih Gül","doi":"10.1016/j.vlsi.2024.102206","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102206","url":null,"abstract":"<div><p>Computer-based machine learning algorithms that produce impressive performance results are computationally demanding and thus subject to high energy consumption during training and testing. Therefore, compact neuro-inspired devices are required to achieve efficiency in hardware resource consumption for the smooth implementation of neural network applications that require low energy and area. In this paper, learning characteristics and performances of the nanoscale titanium dioxide (<span><math><msub><mrow><mi>TiO</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>) based synaptic device have been analyzed by implementing it in the hardware-based neural network for digit classification. Our model is experimentally validated by using 32-nm CMOS technology and the results demonstrate that the model provides high computational ability with better accuracy and efficiency in resource consumption with low energy and less area. The proposed model exhibits 20% energy gain and 16.82% accuracy improvement and 18% less total latency compared to the state-of-the-art <span><math><mi>Ag</mi></math></span>:<span><math><mi>Si</mi></math></span> synaptic device-based neural network. Furthermore, when compared to the software-based (i.e., computer-based) implementation of neural networks, our <span><math><msub><mrow><mi>TiO</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-based model not only achieved an impressive accuracy rate of 90.01% on the MNIST dataset but also did so with reduced energy consumption. Consequently, our model, characterized by a low hardware implementation cost, emerges as a promising neuro-inspired hardware solution for various neural network applications. The proposed model has further demonstrated outstanding performance in experiments involving both the MNIST and Fisher’s Iris datasets. On the latter dataset, the model exhibited notable precision (94.5%), recall (91.5%), and an impressive F1-score (92.9%), accompanied by a commendable accuracy rate of 93.04%.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102206"},"PeriodicalIF":1.9,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140918518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-30DOI: 10.1016/j.vlsi.2024.102202
Sachin Sachdeva, Jincong Lu, Hussam Amrouch , Sheldon X.-D. Tan
The Long-term reliability of a chip, encompassing factors like bias temperature instability (BTI), plays a substantial role in the chip's operational efficiency and overall lifespan. Most studies primarily center around performance-related aspects like delay and timing impacts, and fewer studies are performed on reliability impacts on the spatial power density and thermal profiles of the chips. In this study, we propose to investigate the BTI impacts on the spatial power density and temperature profiles of VLSI chips for the first time. We assessed the BTI aging impact on the on-chip spatial power density and temperature for two widely used circuit functional blocks (dual port RAM, Discrete Cosine Transform (DCT) block) at T = 130◦C and T = 25◦C to account for the worst-case BTI degradation, using degradation-aware cell libraries for a 10-year aging scenario. Furthermore, we showcased the essential role of BTI aging-aware timing analysis in evaluating the impact of BTI aging on total power, on-chip spatial power density, and thermal maps. Neglecting this aspect can result in a substantial underestimation of the results related to the parameters mentioned above. We developed a power map generation method from the circuit layout and power analysis from EDA tools. We demonstrate that both circuits’ maximum power density reduction is approximately 12 % and 20 %, respectively. Furthermore, to analyze the BTI impact on spatial temperature, we built the heat transfer model using a multiphysics tool to imitate a real chip (Intel i7-8650U) and performed thermal simulations to evaluate the spatial thermal map. The resulting maximum temperature reduction for both these circuits is approximately 10 % and 12 %, respectively, which is quite significant.
Our analysis has further unveiled that, in the context of a specific circuit, the position of maximum power density and the occurrence of a hot spot remains consistent over time, unaffected by aging. However, it's important to note that these positions can vary between different circuits, primarily influenced by the workload the circuit is currently handling. Furthermore, our findings demonstrate that the effects of Bias Temperature Instability (BTI) aging are significantly more pronounced when the circuit operates at higher temperatures (T = 130◦C) compared to lower operating temperatures (T = 25◦C).
{"title":"Exploring BTI aging effects on spatial power density and temperature profiles of VLSI chips","authors":"Sachin Sachdeva, Jincong Lu, Hussam Amrouch , Sheldon X.-D. Tan","doi":"10.1016/j.vlsi.2024.102202","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102202","url":null,"abstract":"<div><p>The Long-term reliability of a chip, encompassing factors like bias temperature instability (BTI), plays a substantial role in the chip's operational efficiency and overall lifespan. Most studies primarily center around performance-related aspects like delay and timing impacts, and fewer studies are performed on reliability impacts on the spatial power density and thermal profiles of the chips. In this study, we propose to investigate the BTI impacts on the spatial power density and temperature profiles of VLSI chips for the first time. We assessed the BTI aging impact on the on-chip spatial power density and temperature for two widely used circuit functional blocks (dual port RAM, Discrete Cosine Transform (DCT) block) at T = 130<sup><em>◦</em></sup>C and T = 25<sup><em>◦</em></sup>C to account for the worst-case BTI degradation, using degradation-aware cell libraries for a 10-year aging scenario. Furthermore, we showcased the essential role of BTI aging-aware timing analysis in evaluating the impact of BTI aging on total power, on-chip spatial power density, and thermal maps. Neglecting this aspect can result in a substantial underestimation of the results related to the parameters mentioned above. We developed a power map generation method from the circuit layout and power analysis from EDA tools. We demonstrate that both circuits’ maximum power density reduction is approximately 12 % and 20 %, respectively. Furthermore, to analyze the BTI impact on spatial temperature, we built the heat transfer model using a multiphysics tool to imitate a real chip (Intel i7-8650U) and performed thermal simulations to evaluate the spatial thermal map. The resulting maximum temperature reduction for both these circuits is approximately 10 % and 12 %, respectively, which is quite significant.</p><p>Our analysis has further unveiled that, in the context of a specific circuit, the position of maximum power density and the occurrence of a hot spot remains consistent over time, unaffected by aging. However, it's important to note that these positions can vary between different circuits, primarily influenced by the workload the circuit is currently handling. Furthermore, our findings demonstrate that the effects of Bias Temperature Instability (BTI) aging are significantly more pronounced when the circuit operates at higher temperatures (T = 130<sup><em>◦</em></sup>C) compared to lower operating temperatures (T = 25<sup><em>◦</em></sup>C).</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102202"},"PeriodicalIF":1.9,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141097293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-27DOI: 10.1016/j.vlsi.2024.102198
Qiyan Sun , Ruiyong Tu , Jin Xie , Yihong Gong , Sini Wu , Jinghu Li , Zhicong Luo
Achieving low propagation delay in comparators under low input overdrive voltage is challenging. To overcome this difficulty, this paper presents a novel rail-to-rail high-speed comparator. By clamping the output node of the current summation circuit relative to a fixed level , the overdrive recovery time under large signal is successfully reduced. Moreover,by adopting a cascaded approach with multiple stages of high bandwidth and low gain,not only is the comparator’s gain enhanced,but it also acquires higher bandwidth. Ultimately, the comparator’s output is transmitted at high speed through an LVDS interface. This design is implemented using SiGe BiCMOS technology. Simulation results show that the comparator has a static power consumption of 26.4 mW, and for 5 mV input overdrive, the average propagation delay is about 1.09 ns.
{"title":"A rail-to-rail high speed comparator with LVDS output in 0.18-μm SiGe BiCMOS Technology","authors":"Qiyan Sun , Ruiyong Tu , Jin Xie , Yihong Gong , Sini Wu , Jinghu Li , Zhicong Luo","doi":"10.1016/j.vlsi.2024.102198","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102198","url":null,"abstract":"<div><p>Achieving low propagation delay in comparators under low input overdrive voltage is challenging. To overcome this difficulty, this paper presents a novel rail-to-rail high-speed comparator. By clamping the output node of the current summation circuit relative to a fixed level <span><math><msub><mrow><mi>V</mi></mrow><mrow><mi>C</mi></mrow></msub></math></span>, the overdrive recovery time under large signal is successfully reduced. Moreover,by adopting a cascaded approach with multiple stages of high bandwidth and low gain,not only is the comparator’s gain enhanced,but it also acquires higher bandwidth. Ultimately, the comparator’s output is transmitted at high speed through an LVDS interface. This design is implemented using <span><math><mrow><mn>0</mn><mo>.</mo><mn>18</mn><mspace></mspace><mi>μ</mi><mi>m</mi></mrow></math></span> SiGe BiCMOS technology. Simulation results show that the comparator has a static power consumption of 26.4 mW, and for 5 mV input overdrive, the average propagation delay is about 1.09 ns.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102198"},"PeriodicalIF":1.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140816071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-25DOI: 10.1016/j.vlsi.2024.102200
Mengdi Zhao, Hongjun Liu
To address the limitations of existing chaotic maps, we proposed a non-degenerate n-dimensional (n ≥ 2) integer domain chaotic map (nD-IDCM) model that can construct any non-degenerate n-dimensional integer domain chaotic maps. Moreover, we analyzed its chaotic behavior through Lyapunov exponent, and found that the nD-IDCM generates chaotic sequences in the integer domain, which effectively resolves the issue of finite precision effect when implementing existing chaotic maps on computers or digital devices. To verify the effectiveness of nD-IDCM, we presented two instances to demonstrate how the positive Lyapunov exponents can be regulated by manipulating the parameter matrix. Subsequently, we have scrutinized their dynamical behavior using Kolmogorov entropy, sample entropy, correlation dimension and randomness testing via TestU01. Finally, to assess the feasibility of nD-IDCM, we devised a keyed pseudo random number generator (PRNG) based on a 3D-IDCM that can ensure superior randomness and unpredictability. Experimental results indicated that integer domain chaotic maps constructed using nD-IDCM have desirable Lyapunov exponents and exhibit ergodicity within a sufficient larger chaotic range.
{"title":"A non-degenerate n-dimensional integer domain chaotic map model with application to PRNG","authors":"Mengdi Zhao, Hongjun Liu","doi":"10.1016/j.vlsi.2024.102200","DOIUrl":"10.1016/j.vlsi.2024.102200","url":null,"abstract":"<div><p>To address the limitations of existing chaotic maps, we proposed a non-degenerate <em>n</em>-dimensional (<em>n</em> ≥ 2) integer domain chaotic map (<em>n</em>D-IDCM) model that can construct any non-degenerate <em>n</em>-dimensional integer domain chaotic maps. Moreover, we analyzed its chaotic behavior through Lyapunov exponent, and found that the <em>n</em>D-IDCM generates chaotic sequences in the integer domain, which effectively resolves the issue of finite precision effect when implementing existing chaotic maps on computers or digital devices. To verify the effectiveness of <em>n</em>D-IDCM, we presented two instances to demonstrate how the positive Lyapunov exponents can be regulated by manipulating the parameter matrix. Subsequently, we have scrutinized their dynamical behavior using Kolmogorov entropy, sample entropy, correlation dimension and randomness testing via TestU01. Finally, to assess the feasibility of <em>n</em>D-IDCM, we devised a keyed pseudo random number generator (PRNG) based on a 3D-IDCM that can ensure superior randomness and unpredictability. Experimental results indicated that integer domain chaotic maps constructed using <em>n</em>D-IDCM have desirable Lyapunov exponents and exhibit ergodicity within a sufficient larger chaotic range.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102200"},"PeriodicalIF":1.9,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140775056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-24DOI: 10.1016/j.vlsi.2024.102199
Hui Xu , Shuo Zhu , Ruijun Ma , Zhengfeng Huang , Huaguo Liang , Haojie Sun , Chaoming Liu
CMOS devices are increasingly affected by triple-node-upset as transistor characteristics reduce, particularly in radiation environments. For the shortcomings of the existing radiation hardened designs, including high overhead and high delay, this paper proposes a novel low cost triple-node-upset self-recoverable latch. Simulation results show that compared with the existing triple-node-upset hardened designs, the proposed latch has reduced power consumption, delay, and power-delay product by 34.57 %, 6.42 %, and 34.98 %, respectively.
{"title":"Design of novel low cost triple-node-upset self-recoverable hardened latch","authors":"Hui Xu , Shuo Zhu , Ruijun Ma , Zhengfeng Huang , Huaguo Liang , Haojie Sun , Chaoming Liu","doi":"10.1016/j.vlsi.2024.102199","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102199","url":null,"abstract":"<div><p>CMOS devices are increasingly affected by triple-node-upset as transistor characteristics reduce, particularly in radiation environments. For the shortcomings of the existing radiation hardened designs, including high overhead and high delay, this paper proposes a novel low cost triple-node-upset self-recoverable latch. Simulation results show that compared with the existing triple-node-upset hardened designs, the proposed latch has reduced power consumption, delay, and power-delay product by 34.57 %, 6.42 %, and 34.98 %, respectively.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102199"},"PeriodicalIF":1.9,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140650477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-24DOI: 10.1016/j.vlsi.2024.102201
Lin Jiang , Anthony Dowling, Yu Liu, Ming-C. Cheng
An ensemble data-learning approach based on proper orthogonal decomposition (POD) and Galerkin projection (EnPOD-GP) is proposed for thermal simulations of multi-core CPUs to improve training efficiency and the model accuracy for a previously developed global POD-GP method (GPOD-GP). GPOD-GP generates one set of basis functions (or POD modes) to account for thermal behavior in response to variations in dynamic power maps (PMs) in the entire chip, which is computationally intensive to cover possible variations of all power sources. EnPOD-GP however acquires multiple sets of POD modes to significantly improve training efficiency and effectiveness, and its simulation accuracy is independent of any dynamic PM. Compared to finite element simulation, both GPOD-GP and EnPOD-GP offer a computational speedup over 3 orders of magnitude. For a processor with a small number of cores, GPOD-GP provides a more efficient approach. When high accuracy is desired and/or a processor with more cores is involved, EnPOD-GP is more preferable in terms of training effort and simulation accuracy and efficiency. Additionally, the error resulting from EnPOD-GP can be precisely predicted for any random spatiotemporal power excitation.
针对多核 CPU 的热仿真,提出了一种基于适当正交分解(POD)和 Galerkin 投影(EnPOD-GP)的集合数据学习方法,以提高先前开发的全局 POD-GP 方法(GPOD-GP)的训练效率和模型精度。GPOD-GP 生成一组基函数(或 POD 模式)来解释热行为,以响应整个芯片中动态功率图 (PM) 的变化,这需要大量计算才能涵盖所有功率源的可能变化。然而,EnPOD-GP 可获取多组 POD 模式,从而显著提高训练效率和效果,而且其仿真精度与任何动态 PM 无关。与有限元模拟相比,GPOD-GP 和 EnPOD-GP 的计算速度提高了 3 个数量级。对于内核数量较少的处理器,GPOD-GP 提供了一种更高效的方法。当需要高精度和/或更多内核的处理器时,EnPOD-GP 在训练工作量、仿真精度和效率方面更为可取。此外,对于任何随机时空功率激励,EnPOD-GP 产生的误差都可以精确预测。
{"title":"Ensemble learning model for effective thermal simulation of multi-core CPUs","authors":"Lin Jiang , Anthony Dowling, Yu Liu, Ming-C. Cheng","doi":"10.1016/j.vlsi.2024.102201","DOIUrl":"10.1016/j.vlsi.2024.102201","url":null,"abstract":"<div><p>An ensemble data-learning approach based on proper orthogonal decomposition (POD) and Galerkin projection (EnPOD-GP) is proposed for thermal simulations of multi-core CPUs to improve training efficiency and the model accuracy for a previously developed global POD-GP method (GPOD-GP). GPOD-GP generates one set of basis functions (or POD modes) to account for thermal behavior in response to variations in dynamic power maps (PMs) in the entire chip, which is computationally intensive to cover possible variations of all power sources. EnPOD-GP however acquires multiple sets of POD modes to significantly improve training efficiency and effectiveness, and its simulation accuracy is independent of any dynamic PM. Compared to finite element simulation, both GPOD-GP and EnPOD-GP offer a computational speedup over 3 orders of magnitude. For a processor with a small number of cores, GPOD-GP provides a more efficient approach. When high accuracy is desired and/or a processor with more cores is involved, EnPOD-GP is more preferable in terms of training effort and simulation accuracy and efficiency. Additionally, the error resulting from EnPOD-GP can be precisely predicted for any random spatiotemporal power excitation.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102201"},"PeriodicalIF":1.9,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167926024000658/pdfft?md5=1bfea626d6bed7a5cf9433aa649eaf0a&pid=1-s2.0-S0167926024000658-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140783197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-23DOI: 10.1016/j.vlsi.2024.102197
Mohamed Salah Azzaz, Redouane Kaibou, Bachir Madani
In this paper a new encryption system has been designed and implemented for real-time speech transmission to reduce bandwidth requirements, increase security and minimize residual intelligibility. To guarantee robustness and lightweight computation, the developed cryptosystem has been carried out in the wavelet transform domain based on a hyperchaotic model to generate mask and permutation keys. The cryptographic system has been designed using a hardware-software (HW/SW) co-design approach by developing several IP-cores in a relatively short development time. The performances and security evaluation of the system have been validated through simulation results followed by an experimental validation through the implementation of an encrypted speech signal transmission between two low cost Nexys-4 DDR FPGA platforms, operating in real-time for both wired and wireless communications. Compared to similar works, high performances have been obtained in terms of bandwidth efficiency due to the use of DWT, limited area of FPGA resources, low power consumption and high security level with a large keyspace that is sufficient to resist against brute force attacks. The designed system can be a very useful solution for many real-time secure integrated voice communication systems, multiple communication purposes, military, professional or personal high level of conversations security.
本文为实时语音传输设计并实施了一种新的加密系统,以降低带宽要求、提高安全性并最大限度地减少残余可懂度。为保证稳健性和轻量级计算,所开发的加密系统在小波变换域中基于超混沌模型生成掩码和置换密钥。加密系统的设计采用了硬件/软件(HW/SW)协同设计方法,在相对较短的开发时间内开发了多个 IP 核。系统的性能和安全性评估已通过仿真结果得到验证,随后通过在两个低成本 Nexys-4 DDR FPGA 平台之间实现加密语音信号传输进行了实验验证,实时运行于有线和无线通信。与同类研究相比,由于使用了 DWT,该系统在带宽效率、有限的 FPGA 资源面积、低功耗和高安全级别等方面都取得了很高的性能。所设计的系统对于许多实时安全综合语音通信系统、多种通信用途、军事、专业或个人高级对话安全都是非常有用的解决方案。
{"title":"Co-design based FPGA implementation of an efficient new speech hyperchaotic cryptosystem in the transform domain","authors":"Mohamed Salah Azzaz, Redouane Kaibou, Bachir Madani","doi":"10.1016/j.vlsi.2024.102197","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102197","url":null,"abstract":"<div><p>In this paper a new encryption system has been designed and implemented for real-time speech transmission to reduce bandwidth requirements, increase security and minimize residual intelligibility. To guarantee robustness and lightweight computation, the developed cryptosystem has been carried out in the wavelet transform domain based on a hyperchaotic model to generate mask and permutation keys. The cryptographic system has been designed using a hardware-software (HW/SW) co-design approach by developing several IP-cores in a relatively short development time. The performances and security evaluation of the system have been validated through simulation results followed by an experimental validation through the implementation of an encrypted speech signal transmission between two low cost Nexys-4 DDR FPGA platforms, operating in real-time for both wired and wireless communications. Compared to similar works, high performances have been obtained in terms of bandwidth efficiency due to the use of DWT, limited area of FPGA resources, low power consumption and high security level with a large keyspace that is sufficient to resist against brute force attacks. The designed system can be a very useful solution for many real-time secure integrated voice communication systems, multiple communication purposes, military, professional or personal high level of conversations security.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102197"},"PeriodicalIF":1.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140650478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-20DOI: 10.1016/j.vlsi.2024.102195
Zhe Sun , Zimeng Zhou , Fang-Wei Fu
With the increasing application of IoT devices, the memory subsystem, as the performance and energy bottleneck of IoT systems, has received a lot of attention. One of the keys is on-chip memory which can bridge the performance gap between the CPU and main memory. While many off-the-shelf embedded processors utilize the hybrid on-chip memory architecture containing scratchpad memories (SPMs) and caches, most existing literature ignores the collaboration between caches and SPMs. This paper proposes static SPM allocation strategies for the architecture mentioned above in IoT systems, which try to minimize the overall instruction memory subsystem latency and/or energy consumption. We capture the intra- and inter-task cache conflict misses via a fine-grained temporal cache behavior model. Based on this cache conflict information, we propose an integer linear programming (ILP) algorithm to generate an optimal static function level SPM allocation for system performance. Furthermore, to improve the scalability of the proposed allocation scheme for an enormous task set, we offer the interference factor to calculate the interference impact quantitatively. Then, based on the interference factor, we present two approximate knapsack based heuristic algorithms to provide near optimal static allocation schemes at both function- and basic block-level granularities, which favors fast design space exploration. The experiment results demonstrate that the proposed solution achieves a 30.85% improvement in memory performance, and up to 31.39% reduction in energy consumption, compared to the existing SPM allocation scheme at the function level. In addition, the proposed basic block level allocation algorithm shows better performance than our function level allocation algorithm and other basic block level allocation algorithm.
{"title":"Optimizing code allocation for hybrid on-chip memory in IoT systems","authors":"Zhe Sun , Zimeng Zhou , Fang-Wei Fu","doi":"10.1016/j.vlsi.2024.102195","DOIUrl":"10.1016/j.vlsi.2024.102195","url":null,"abstract":"<div><p>With the increasing application of IoT devices, the memory subsystem, as the performance and energy bottleneck of IoT systems, has received a lot of attention. One of the keys is on-chip memory which can bridge the performance gap between the CPU and main memory. While many off-the-shelf embedded processors utilize the hybrid on-chip memory architecture containing scratchpad memories (SPMs) and caches, most existing literature ignores the collaboration between caches and SPMs. This paper proposes static SPM allocation strategies for the architecture mentioned above in IoT systems, which try to minimize the overall instruction memory subsystem latency and/or energy consumption. We capture the intra- and inter-task cache conflict misses via a fine-grained temporal cache behavior model. Based on this cache conflict information, we propose an integer linear programming (ILP) algorithm to generate an optimal static function level SPM allocation for system performance. Furthermore, to improve the scalability of the proposed allocation scheme for an enormous task set, we offer the interference factor to calculate the interference impact quantitatively. Then, based on the interference factor, we present two approximate knapsack based heuristic algorithms to provide near optimal static allocation schemes at both function- and basic block-level granularities, which favors fast design space exploration. The experiment results demonstrate that the proposed solution achieves a 30.85% improvement in memory performance, and up to 31.39% reduction in energy consumption, compared to the existing SPM allocation scheme at the function level. In addition, the proposed basic block level allocation algorithm shows better performance than our function level allocation algorithm and other basic block level allocation algorithm.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102195"},"PeriodicalIF":1.9,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140794438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}