首页 > 最新文献

Integration-The Vlsi Journal最新文献

英文 中文
A low voltage input boost converter with novel switch driver enhancement technology for indoor solar energy harvesting 采用新型开关驱动器增强技术的低压输入升压转换器,适用于室内太阳能收集
IF 1.9 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-28 DOI: 10.1016/j.vlsi.2024.102214
Xiwen Zhu, Kaixuan Xu, Mingxue Li, Yufeng Zhang

In the indoor environment, the output voltage of a small photovoltaic cell is usually too low to charge the battery or utilize it directly. As a result, this paper proposed a low-voltage input boost converter with novel switch driver enhancement technology for indoor solar energy harvesting. The boost converter utilized switched-capacitor charge pump architecture. Compared with conventional charge pumps, the proposed boost converter uses driver enhancement technology, which improves the output current ability of the circuit and power conversion efficiency. Besides, an adaptive dead-time circuit is designed to further optimize conversion efficiency at low input voltage. The integrated circuit (IC) of the boost converter has been manufactured in a 180 nm BCD process and occupies an active chip area of 1.6mm × 0.6 mm. Experimental measurement results confirm that the voltage boost converter increased the input voltage by four times. And the lowest start-up voltage is 0.12 V. The voltage conversion efficiency is 98 % and the highest power conversion efficiency is 76.7 % at Vin of 0.5 V. The design is suitable for indoor solar energy harvesting.

在室内环境中,小型光伏电池的输出电压通常过低,无法为电池充电或直接利用。因此,本文提出了一种采用新型开关驱动器增强技术的低压输入升压转换器,用于室内太阳能收集。该升压转换器采用了开关电容充电泵架构。与传统的电荷泵相比,本文提出的升压转换器采用了驱动增强技术,从而提高了电路的输出电流能力和功率转换效率。此外,还设计了自适应死区时间电路,以进一步优化低输入电压下的转换效率。升压转换器的集成电路(IC)采用 180 nm BCD 工艺制造,有效芯片面积为 1.6 mm × 0.6 mm。实验测量结果证实,升压转换器将输入电压提高了四倍。电压转换效率为 98%,当 Vin 为 0.5 V 时,最高功率转换效率为 76.7%。该设计适用于室内太阳能收集。
{"title":"A low voltage input boost converter with novel switch driver enhancement technology for indoor solar energy harvesting","authors":"Xiwen Zhu,&nbsp;Kaixuan Xu,&nbsp;Mingxue Li,&nbsp;Yufeng Zhang","doi":"10.1016/j.vlsi.2024.102214","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102214","url":null,"abstract":"<div><p>In the indoor environment, the output voltage of a small photovoltaic cell is usually too low to charge the battery or utilize it directly. As a result, this paper proposed a low-voltage input boost converter with novel switch driver enhancement technology for indoor solar energy harvesting. The boost converter utilized switched-capacitor charge pump architecture. Compared with conventional charge pumps, the proposed boost converter uses driver enhancement technology, which improves the output current ability of the circuit and power conversion efficiency. Besides, an adaptive dead-time circuit is designed to further optimize conversion efficiency at low input voltage. The integrated circuit (IC) of the boost converter has been manufactured in a 180 nm BCD process and occupies an active chip area of 1.6mm × 0.6 mm. Experimental measurement results confirm that the voltage boost converter increased the input voltage by four times. And the lowest start-up voltage is 0.12 V. The voltage conversion efficiency is 98 % and the highest power conversion efficiency is 76.7 % at Vin of 0.5 V. The design is suitable for indoor solar energy harvesting.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"98 ","pages":"Article 102214"},"PeriodicalIF":1.9,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141243914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Radix-16 booth encoding and rounding-based approximate Karatsuba multiplier for fast Fourier transform computation in biomedical signal processing application 用于生物医学信号处理应用中快速傅立叶变换计算的混合 Radix-16 亭编码和基于舍入的近似 Karatsuba 乘法器
IF 1.9 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-28 DOI: 10.1016/j.vlsi.2024.102215
Dinesh Kumar Jayaraman Rajanediran , Ganesh Babu C , Priyadharsini K , M. Ramkumar

Multiplication is an essential biomedical signal processing function implemented in the Digital Signal Processing (DSP) cores. To enhance the speed, area and energy efficiency of DSP cores, approximate multiplication is used. Also, low power multiplier unit design is one of the requirements of DSP processor to meet the increasing demands. To balance both the design and error metrics of a multiplier design, an efficient Hybrid Radix-16 Booth Encoding and rounding-based approximate Karatsuba Multiplier (RBEKM-16) is proposed. This research introduces an Approximate Karatsuba multiplier based on rounding, utilizing rounding approximation to compute the least significant part of the product. Simple operators, like adders and multiplexers, replace complex and costly conventional Floating-Point (FP) multipliers in this process. Radix-4 logarithms are incorporated to further minimize hardware complexity and calculate the product's most significant part. Subsequently, an approximate 4-2 compressor is applied in the partial product reduction stage to generate the most significant bit result. In the experimental scenario, the efficiency of the multiplier is evaluated in terms of energy efficiency, area utilization and error rate by using Xilinx ISE 8.1i tool. The results from the experiments indicate that the suggested multiplier demonstrates improved energy efficiency, utilizes space more effectively, and performs well in applications related to biomedical signal processing. Further, the accomplished area utilization of the proposed 16-bit multiplier is 1068 μm2, delay is 3.01 ns, power consumption is 0.021 mW and power delay product is 119 fJ.

乘法是数字信号处理(DSP)内核中实现的一项基本生物医学信号处理功能。为了提高 DSP 内核的速度、面积和能效,需要使用近似乘法。此外,低功耗乘法器单元设计也是 DSP 处理器的要求之一,以满足日益增长的需求。为了平衡乘法器设计和误差指标,提出了一种高效的混合 Radix-16 Booth 编码和基于舍入的近似 Karatsuba 乘法器 (RBEKM-16)。这项研究引入了一种基于舍入的近似卡拉祖巴乘法器,利用舍入近似来计算乘积的最小有效部分。在此过程中,简单的运算器(如加法器和多路复用器)取代了复杂而昂贵的传统浮点(FP)乘法器。为了进一步降低硬件复杂性并计算乘积的最有意义部分,Radix-4 对数被纳入其中。随后,在部分乘积还原阶段应用近似 4-2 压缩器,生成最有意义位结果。在实验方案中,使用 Xilinx ISE 8.1i 工具从能效、面积利用率和错误率方面评估了乘法器的效率。实验结果表明,建议的乘法器提高了能效,更有效地利用了空间,在生物医学信号处理相关应用中表现良好。此外,所建议的 16 位乘法器的面积利用率为 1068 μm2,延迟为 3.01 ns,功耗为 0.021 mW,功率延迟积为 119 fJ。
{"title":"Hybrid Radix-16 booth encoding and rounding-based approximate Karatsuba multiplier for fast Fourier transform computation in biomedical signal processing application","authors":"Dinesh Kumar Jayaraman Rajanediran ,&nbsp;Ganesh Babu C ,&nbsp;Priyadharsini K ,&nbsp;M. Ramkumar","doi":"10.1016/j.vlsi.2024.102215","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102215","url":null,"abstract":"<div><p>Multiplication is an essential biomedical signal processing function implemented in the Digital Signal Processing (DSP) cores. To enhance the speed, area and energy efficiency of DSP cores, approximate multiplication is used. Also, low power multiplier unit design is one of the requirements of DSP processor to meet the increasing demands. To balance both the design and error metrics of a multiplier design, an efficient Hybrid Radix-16 Booth Encoding and rounding-based approximate Karatsuba Multiplier (RBEKM-16) is proposed. <strong>This research introduces an Approximate Karatsuba multiplier based on rounding, utilizing rounding approximation to compute the least significant part of the product. Simple operators, like adders and multiplexers, replace complex and costly conventional Floating-Point (FP) multipliers in this process. Radix-4 logarithms are incorporated to further minimize hardware complexity and calculate the product's most significant part. Subsequently, an approximate 4-2 compressor is applied in the partial product reduction stage to generate the most significant bit result.</strong> In the experimental scenario, the efficiency of the multiplier is evaluated in terms of energy efficiency, area utilization and error rate by using Xilinx ISE 8.1i tool. The results from the experiments indicate that the suggested multiplier demonstrates improved energy efficiency, utilizes space more effectively, and performs well in applications related to biomedical signal processing. Further, the accomplished area utilization of the proposed 16-bit multiplier is 1068 <span><math><mrow><mi>μ</mi><msup><mi>m</mi><mn>2</mn></msup></mrow></math></span>, delay is 3.01 ns, power consumption is 0.021 mW and power delay product is 119 fJ.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"98 ","pages":"Article 102215"},"PeriodicalIF":1.9,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141243915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Content-addressable memory using selective-charging and adaptive-discharging scheme for low-power hardware search engine 采用选择性充电和自适应放电方案的内容可寻址存储器,用于低功耗硬件搜索引擎
IF 1.9 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-27 DOI: 10.1016/j.vlsi.2024.102213
Sheikh Wasmir Hussain , Telajala Venkata Mahendra , Sandeep Mishra , Anup Dandapat

Single clock cycle access feature of content-addressable memory (CAM) suits well for high-speed parallel content search operation in data-intensive hardware search engines. The diverse applications span from accelerating databases and routing networks to processing images, implementing machine learning, processing biomedical data, and compressing data. Nevertheless, the CAM macro consumes significant energy due to the high switching of most match-lines (MLs), which comprise CAM words, during parallel access. Segmented ML schemes reduced power yet the cell and ML delay, and the extra sequential cycles affect search-speed. A novel selective-charging and adaptive-discharging (SCAD) scheme in the form of dynamic ML architecture is proposed to reduce CAM power consumption at no extra cycle cost. Additionally, a full-swing CAM cell forms the basis of storage and comparison-evaluation to lessen ML delay. Based on 45-nm technology under 1-V supply, the proposed 64 × 32-bit and 256 × 144-bit SCAD-CAM arrays dissipate only 0.45–0.46 fJ/bit/search energy and achieve high-speed. Compared to CAMs based on low-power ML schemes, viz., low-swing precharge, division and control, and master–slave, and the conventional CAM as baseline design, the SCAD-CAM reduces 13.49%–89.35% energy-delay. The average-power reduction of 1.8×–2.4× establishes the SCAD-CAM as a promising memory architecture for emerging search-intensive applications involving large-scale data workloads.

内容可寻址存储器(CAM)的单时钟周期访问特性非常适合数据密集型硬件搜索引擎中的高速并行内容搜索操作。从加速数据库和路由网络到处理图像、实现机器学习、处理生物医学数据和压缩数据,这些应用多种多样。然而,在并行访问过程中,由于大多数匹配行(ML)(由 CAM 字组成)的高切换率,CAM 宏会消耗大量能量。分段式 ML 方案降低了功耗,但单元和 ML 的延迟以及额外的顺序周期影响了搜索速度。我们提出了一种动态 ML 架构形式的新型选择性充电和自适应放电(SCAD)方案,可在不增加额外周期成本的情况下降低 CAM 功耗。此外,全摆动 CAM 单元构成了存储和比较评估的基础,从而减少了 ML 延迟。基于 1 V 电源下的 45 纳米技术,所提出的 64 × 32 位和 256 × 144 位 SCAD-CAM 阵列仅耗散 0.45-0.46 fJ/bit/search 能量,并实现了高速。与基于低功耗 ML 方案(即低摆动预充电、分割和控制、主从)的 CAM 和作为基准设计的传统 CAM 相比,SCAD-CAM 减少了 13.49%-89.35% 的能耗延迟。平均功耗降低了 1.8 倍-2.4 倍,这使 SCAD-CAM 成为涉及大规模数据工作负载的新兴搜索密集型应用的理想内存架构。
{"title":"Content-addressable memory using selective-charging and adaptive-discharging scheme for low-power hardware search engine","authors":"Sheikh Wasmir Hussain ,&nbsp;Telajala Venkata Mahendra ,&nbsp;Sandeep Mishra ,&nbsp;Anup Dandapat","doi":"10.1016/j.vlsi.2024.102213","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102213","url":null,"abstract":"<div><p>Single clock cycle access feature of content-addressable memory (CAM) suits well for high-speed parallel content search operation in data-intensive hardware search engines. The diverse applications span from accelerating databases and routing networks to processing images, implementing machine learning, processing biomedical data, and compressing data. Nevertheless, the CAM macro consumes significant energy due to the high switching of most match-lines (MLs), which comprise CAM words, during parallel access. Segmented ML schemes reduced power yet the cell and ML delay, and the extra sequential cycles affect search-speed. A novel selective-charging and adaptive-discharging (SCAD) scheme in the form of dynamic ML architecture is proposed to reduce CAM power consumption at no extra cycle cost. Additionally, a full-swing CAM cell forms the basis of storage and comparison-evaluation to lessen ML delay. Based on 45-nm technology under 1-V supply, the proposed 64 × 32-bit and 256 × 144-bit SCAD-CAM arrays dissipate only 0.45–0.46 fJ/bit/search energy and achieve high-speed. Compared to CAMs based on low-power ML schemes, viz., low-swing precharge, division and control, and master–slave, and the conventional CAM as baseline design, the SCAD-CAM reduces 13.49%–89.35% energy-delay. The average-power reduction of 1.8<span><math><mo>×</mo></math></span>–2.4<span><math><mo>×</mo></math></span> establishes the SCAD-CAM as a promising memory architecture for emerging search-intensive applications involving large-scale data workloads.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"98 ","pages":"Article 102213"},"PeriodicalIF":1.9,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141323392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SET-detection low complexity burst error correction codes for SRAM protection 用于 SRAM 保护的 SET 检测低复杂度突发纠错码
IF 1.9 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-25 DOI: 10.1016/j.vlsi.2024.102212
He Liu , Jiaqiang Li , Liyi Xiao , Tianqi Wang , Jie Li

As the feature size of transistors decreases, multiple bit upsets and single event transient effects become severe in circuits working in radiation environment. In static random-access memories (SRAM), both single event upsets and single event transients need caring about. Fault-tolerant ECCs are optional for SRAM protection, which own the ability to deal with SEU and SET at the same time. We designed a series of low complexity burst error correcting codes with fault detection feature. This can deal with burst errors in memories and transient errors in the decoder. Low complexity ECC simplifies the decoding circuits and reduces hardware overhead. Compared with schemes to deal with SET in decoders, the proposed scheme has obvious advantage on area’s overhead and can be an effective choice for SRAM protection in radiation environment.

随着晶体管特征尺寸的减小,在辐射环境中工作的电路中,多位中断和单事件瞬态效应变得越来越严重。在静态随机存取存储器(SRAM)中,单个事件中断和单个事件瞬变都需要关注。容错 ECC 是 SRAM 保护的可选项,它具有同时处理 SEU 和 SET 的能力。我们设计了一系列具有故障检测功能的低复杂度突发纠错码。这可以处理存储器中的突发错误和解码器中的瞬时错误。低复杂度 ECC 简化了解码电路,降低了硬件开销。与处理解码器中 SET 的方案相比,所提出的方案在面积开销方面具有明显的优势,可以成为辐射环境中 SRAM 保护的有效选择。
{"title":"SET-detection low complexity burst error correction codes for SRAM protection","authors":"He Liu ,&nbsp;Jiaqiang Li ,&nbsp;Liyi Xiao ,&nbsp;Tianqi Wang ,&nbsp;Jie Li","doi":"10.1016/j.vlsi.2024.102212","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102212","url":null,"abstract":"<div><p>As the feature size of transistors decreases, multiple bit upsets and single event transient effects become severe in circuits working in radiation environment. In static random-access memories (SRAM), both single event upsets and single event transients need caring about. Fault-tolerant ECCs are optional for SRAM protection, which own the ability to deal with SEU and SET at the same time. We designed a series of low complexity burst error correcting codes with fault detection feature. This can deal with burst errors in memories and transient errors in the decoder. Low complexity ECC simplifies the decoding circuits and reduces hardware overhead. Compared with schemes to deal with SET in decoders, the proposed scheme has obvious advantage on area’s overhead and can be an effective choice for SRAM protection in radiation environment.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"98 ","pages":"Article 102212"},"PeriodicalIF":1.9,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141250580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of a new three-dimensional jerk chaotic system with transient chaos and its adaptive backstepping synchronous control 带有瞬态混沌的新型三维 Jerk 混沌系统及其自适应反步进同步控制分析
IF 1.9 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-24 DOI: 10.1016/j.vlsi.2024.102210
Shaohui Yan , Jianjian Wang , Lin Li

A new three-dimensional Jerk chaotic system with line equilibrium points is proposed. The system is researched in detail by the Lyapunov exponent graph, bifurcation diagram, phase diagram, and time domain waveform diagram, which show that the system has rich dynamical behaviors, such as eight types of coexisting attractors, extreme multistability of four different attractor states, and offset boosting in two directions. In addition, the system also has six types of transient chaos, which greatly increase the complexity of the system. We study the variation of the spectral entropy (SE) and C0 complexity when the system takes different initial values. Also, in this paper, the initial conditions under which the system is in a synchronized state are determined by initial values with higher complexity. The correctness of the theoretical analysis and numerical simulation is verified by circuit simulation and hardware experiments. Finally, the new system achieves synchronization control utilizing a designed adaptive backstepping controller, laying the foundation for its subsequent use in secure communications.

提出了一种新的三维 Jerk 混沌系统,该系统具有线平衡点。通过Lyapunov指数图、分岔图、相位图和时域波形图对该系统进行了详细研究,结果表明该系统具有丰富的动力学行为,如八种共存吸引子、四种不同吸引子状态的极端多稳态性和两个方向的偏移提升。此外,系统还存在六种瞬态混沌,大大增加了系统的复杂性。我们研究了系统取不同初始值时的谱熵 (SE) 和 C0 复杂性的变化。同时,在本文中,系统处于同步状态的初始条件是由复杂度较高的初始值决定的。电路仿真和硬件实验验证了理论分析和数值模拟的正确性。最后,新系统利用设计的自适应反步进控制器实现了同步控制,为其后续在安全通信中的应用奠定了基础。
{"title":"Analysis of a new three-dimensional jerk chaotic system with transient chaos and its adaptive backstepping synchronous control","authors":"Shaohui Yan ,&nbsp;Jianjian Wang ,&nbsp;Lin Li","doi":"10.1016/j.vlsi.2024.102210","DOIUrl":"10.1016/j.vlsi.2024.102210","url":null,"abstract":"<div><p>A new three-dimensional Jerk chaotic system with line equilibrium points is proposed. The system is researched in detail by the Lyapunov exponent graph, bifurcation diagram, phase diagram, and time domain waveform diagram, which show that the system has rich dynamical behaviors, such as eight types of coexisting attractors, extreme multistability of four different attractor states, and offset boosting in two directions. In addition, the system also has six types of transient chaos, which greatly increase the complexity of the system. We study the variation of the spectral entropy (SE) and C0 complexity when the system takes different initial values. Also, in this paper, the initial conditions under which the system is in a synchronized state are determined by initial values with higher complexity. The correctness of the theoretical analysis and numerical simulation is verified by circuit simulation and hardware experiments. Finally, the new system achieves synchronization control utilizing a designed adaptive backstepping controller, laying the foundation for its subsequent use in secure communications.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"98 ","pages":"Article 102210"},"PeriodicalIF":1.9,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141145038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AiTO: Simultaneous gate sizing and buffer insertion for timing optimization with GNNs and RL AiTO:利用 GNN 和 RL 同时优化栅极尺寸和缓冲器插入以实现时序优化
IF 1.9 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-21 DOI: 10.1016/j.vlsi.2024.102211
Hongxi Wu , Zhipeng Huang , Xingquan Li , Wenxing Zhu

Gate sizing and buffer insertion for timing optimization are performed extensively in electronic design automation (EDA) flows. Both of them aim to adjust the upstream and downstream capacitances of gates/buffers to minimize delay. However, most of existing work focuses on gate sizing or buffer insertion independently. This paper proposes a learning-based timing optimization framework, AiTO, that combines reinforcement learning with graph neural network, to perform simultaneously gate sizing and buffer insertion. We model buffer insertion as a special gate sizing by determining possible buffer locations in advance and treating the buffer insertion and gate sizing as an RL process. Experimental results on 10 real designs (28-nm and 110-nm) show that, AiTO can achieve better worst negative slack (WNS) optimization results than OpenROAD while being able to improve the results of the commercial tool, Innovus, to some extent. Moreover, ablation studies demonstrate the benefits of performing simultaneous gate sizing and buffer insertion for timing optimization.

在电子设计自动化(EDA)流程中,为优化时序而进行的栅极尺寸调整和缓冲器插入工作被广泛采用。它们的目的都是调整栅极/缓冲器的上下游电容,以尽量减少延迟。然而,现有的大部分工作都集中在门大小或缓冲器插入的独立方面。本文提出了一种基于学习的时序优化框架 AiTO,它将强化学习与图神经网络相结合,可同时执行门大小调整和缓冲区插入。我们通过提前确定可能的缓冲区位置,将缓冲区插入作为一种特殊的栅极选型,并将缓冲区插入和栅极选型视为一个 RL 过程。10 个实际设计(28 纳米和 110 纳米)的实验结果表明,AiTO 比 OpenROAD 能获得更好的最差负松弛(WNS)优化结果,同时在一定程度上改善了商业工具 Innovus 的结果。此外,烧蚀研究还证明了同时执行栅极尺寸和缓冲器插入以进行时序优化的好处。
{"title":"AiTO: Simultaneous gate sizing and buffer insertion for timing optimization with GNNs and RL","authors":"Hongxi Wu ,&nbsp;Zhipeng Huang ,&nbsp;Xingquan Li ,&nbsp;Wenxing Zhu","doi":"10.1016/j.vlsi.2024.102211","DOIUrl":"10.1016/j.vlsi.2024.102211","url":null,"abstract":"<div><p>Gate sizing and buffer insertion for timing optimization are performed extensively in electronic design automation (EDA) flows. Both of them aim to adjust the upstream and downstream capacitances of gates/buffers to minimize delay. However, most of existing work focuses on gate sizing or buffer insertion independently. This paper proposes a learning-based timing optimization framework, AiTO, that combines reinforcement learning with graph neural network, to perform simultaneously gate sizing and buffer insertion. We model buffer insertion as a special gate sizing by determining possible buffer locations in advance and treating the buffer insertion and gate sizing as an RL process. Experimental results on 10 real designs (28-nm and 110-nm) show that, AiTO can achieve better worst negative slack (WNS) optimization results than OpenROAD while being able to improve the results of the commercial tool, Innovus, to some extent. Moreover, ablation studies demonstrate the benefits of performing simultaneous gate sizing and buffer insertion for timing optimization.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"98 ","pages":"Article 102211"},"PeriodicalIF":1.9,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141136535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 8.83 ppm/°C temperature coefficient, 75 dB PSRR subthreshold CMOS voltage reference with piecewise curvature compensation 温度系数为 8.83 ppm/°C、PSRR 为 75 dB、具有片式曲率补偿功能的亚阈值 CMOS 电压基准
IF 1.9 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-20 DOI: 10.1016/j.vlsi.2024.102209
Tiedong Cheng, Hao Rao, Jinxiang Wei

A subthreshold CMOS voltage reference (CVR) with low temperature coefficient (TC) over a wide temperature range and low power is proposed in this paper. The proposed CVR utilizes the ΔVGS of different-threshold and same-threshold nMOS pairs to generate complementary-to-absolute-temperature (CTAT) and proportional-to-absolute-temperature (PTAT) voltages, respectively. To compensate for the low-temperature and high-temperature segments of the temperature characteristic curve, the nonlinear compensation currents generated by the exponential-like relationship between the drain current and the gate-source voltage of two MOSFETs work in the subthreshold region is used. Based on a 0.18-μm CMOS process, post-layout simulation results show that the proposed CVR achieves an average output voltage of 263 mV. The power supply ripple rejection (PSRR) is −75 dB at 10 Hz and the line sensitivity (LS) is 0.0069 %/V when the supply voltage varies from 0.8 V to 2.5 V. The average TC is 8.83 ppm/°C for a wide temperature range of −40 °C–120 °C, and the minimum TC is only 3.65 ppm/°C.

本文提出了一种在宽温度范围内具有低温度系数(TC)和低功耗的阈下 CMOS 电压基准(CVR)。所提出的 CVR 利用不同阈值和相同阈值 nMOS 对的ΔVGS,分别生成互补绝对温度电压 (CTAT) 和比例绝对温度电压 (PTAT)。为了对温度特性曲线的低温段和高温段进行补偿,使用了由工作在亚阈值区的两个 MOSFET 的漏极电流和栅源电压之间的指数关系所产生的非线性补偿电流。基于 0.18μm CMOS 工艺的布局后仿真结果表明,所提出的 CVR 实现了 263 mV 的平均输出电压。当电源电压在 0.8 V 至 2.5 V 之间变化时,10 Hz 时的电源纹波抑制 (PSRR) 为 -75 dB,线路灵敏度 (LS) 为 0.0069 %/V。在 -40 °C-120 °C 宽温度范围内,平均 TC 为 8.83 ppm/°C,最小 TC 仅为 3.65 ppm/°C。
{"title":"A 8.83 ppm/°C temperature coefficient, 75 dB PSRR subthreshold CMOS voltage reference with piecewise curvature compensation","authors":"Tiedong Cheng,&nbsp;Hao Rao,&nbsp;Jinxiang Wei","doi":"10.1016/j.vlsi.2024.102209","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102209","url":null,"abstract":"<div><p>A subthreshold CMOS voltage reference (CVR) with low temperature coefficient (TC) over a wide temperature range and low power is proposed in this paper. The proposed CVR utilizes the <span><math><mrow><mo>Δ</mo><msub><mi>V</mi><mrow><mi>G</mi><mi>S</mi></mrow></msub></mrow></math></span> of different-threshold and same-threshold nMOS pairs to generate complementary-to-absolute-temperature (CTAT) and proportional-to-absolute-temperature (PTAT) voltages, respectively. To compensate for the low-temperature and high-temperature segments of the temperature characteristic curve, the nonlinear compensation currents generated by the exponential-like relationship between the drain current and the gate-source voltage of two MOSFETs work in the subthreshold region is used. Based on a 0.18-μm CMOS process, post-layout simulation results show that the proposed CVR achieves an average output voltage of 263 mV. The power supply ripple rejection (PSRR) is −75 dB at 10 Hz and the line sensitivity (LS) is 0.0069 %/V when the supply voltage varies from 0.8 V to 2.5 V. The average TC is 8.83 ppm/°C for a wide temperature range of −40 °C–120 °C, and the minimum TC is only 3.65 ppm/°C.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102209"},"PeriodicalIF":1.9,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141091016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel low-resource consumption and high-speed hardware implementation of HOG feature extraction on FPGA for human detection 在 fpga 上实现用于人类检测的猪特征提取的低资源消耗和高速硬件新方法
IF 1.9 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-15 DOI: 10.1016/j.vlsi.2024.102208
Yuhai He , Jiye Huang , Yiming Pan

In today’s increasingly complex traffic environment, pedestrian detection has become increasingly important. The Histogram of Oriented Gradients (HOG) algorithm has been proven to be highly efficient in pedestrian detection. This paper proposes a low-resource consumption, high-speed hardware implementation for HOG algorithm. In the case of a slight sacrifice in accuracy, it increases computational speed and reduces resource consumption. Experimental results demonstrate that the implementation achieves a speed of 0.933 pixels per clock cycle and consumes 4117 look-up tables and 4.5 Kbits of block RAMs while its accuracy decreases by 1.2% on the INRIA dataset and by 0.11% on the MIT dataset.

在当今日益复杂的交通环境中,行人检测变得越来越重要。定向梯度直方图(HOG)算法已被证明在行人检测中具有很高的效率。本文提出了一种低资源消耗、高速硬件实现的 HOG 算法。在略微牺牲精度的情况下,它提高了计算速度,降低了资源消耗。实验结果表明,该算法实现了每时钟周期 0.933 像素的速度,消耗了 4117 个查找表和 4.5 Kbits 的块 RAM,而其精度在 INRIA 数据集上降低了 1.2%,在麻省理工学院数据集上降低了 0.11%。
{"title":"A novel low-resource consumption and high-speed hardware implementation of HOG feature extraction on FPGA for human detection","authors":"Yuhai He ,&nbsp;Jiye Huang ,&nbsp;Yiming Pan","doi":"10.1016/j.vlsi.2024.102208","DOIUrl":"10.1016/j.vlsi.2024.102208","url":null,"abstract":"<div><p>In today’s increasingly complex traffic environment, pedestrian detection has become increasingly important. The Histogram of Oriented Gradients (HOG) algorithm has been proven to be highly efficient in pedestrian detection. This paper proposes a low-resource consumption, high-speed hardware implementation for HOG algorithm. In the case of a slight sacrifice in accuracy, it increases computational speed and reduces resource consumption. Experimental results demonstrate that the implementation achieves a speed of 0.933 pixels per clock cycle and consumes 4117 look-up tables and 4.5 Kbits of block RAMs while its accuracy decreases by 1.2% on the INRIA dataset and by 0.11% on the MIT dataset.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102208"},"PeriodicalIF":1.9,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141053099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of synthesizable period-jitter sensor IP with high power reduction and variation resiliency 设计可合成的周期抖动传感器 IP,降低功耗并提高抗变化能力
IF 1.9 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-11 DOI: 10.1016/j.vlsi.2024.102207
Jinn-Shyan Wang , Yu-Hsuan Kuo

Previous work has presented a synthesizable design approach to ease the design of an on-chip period-jitter sensor (PJS) with a high resolution. Although the designer of a very large scale integration (VLSI) chip hopes to use this design as an intellectual property (IP), our analysis reveals that this PJS faces key challenges: high power consumption and vulnerability to static PVT and dynamic IR-drop variations. This work develops several design techniques to conquer these challenges at the same time. Taking the PJS IP for monitoring the clock signal in LPDDR4-4266 as a design example, we implement a synthesized 22 nm 2.133 GHz PJS with a resolution of 1.0 ps to verify the design techniques. Post-layout simulation results show that the new design reduces over half of the power while meeting the resolution specification. It passes functional and electrical verification over a broader process variation than the previous design, and the higher variation resiliency assists the synthesizable Verilog code as a soft IP.

之前的研究提出了一种可合成的设计方法,以简化高分辨率片上周期抖动传感器(PJS)的设计。虽然大规模集成(VLSI)芯片的设计者希望将这种设计作为知识产权(IP),但我们的分析表明,这种 PJS 面临着关键挑战:高功耗以及易受静态 PVT 和动态 IR 滴变化的影响。这项工作开发了几种设计技术,以同时应对这些挑战。以用于监控 LPDDR4-4266 中时钟信号的 PJS IP 为设计实例,我们实现了分辨率为 1.0 ps 的 22 nm 2.133 GHz PJS,以验证设计技术。布局后仿真结果表明,新设计在满足分辨率规范的同时降低了一半以上的功耗。与之前的设计相比,新设计在更大的工艺变化范围内通过了功能和电气验证,而更高的变化弹性有助于可综合 Verilog 代码成为软 IP。
{"title":"Design of synthesizable period-jitter sensor IP with high power reduction and variation resiliency","authors":"Jinn-Shyan Wang ,&nbsp;Yu-Hsuan Kuo","doi":"10.1016/j.vlsi.2024.102207","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102207","url":null,"abstract":"<div><p>Previous work has presented a synthesizable design approach to ease the design of an on-chip period-jitter sensor (PJS) with a high resolution. Although the designer of a very large scale integration (VLSI) chip hopes to use this design as an intellectual property (IP), our analysis reveals that this PJS faces key challenges: high power consumption and vulnerability to static PVT and dynamic IR-drop variations. This work develops several design techniques to conquer these challenges at the same time. Taking the PJS IP for monitoring the clock signal in LPDDR4-4266 as a design example, we implement a synthesized 22 nm 2.133 GHz PJS with a resolution of 1.0 ps to verify the design techniques. Post-layout simulation results show that the new design reduces over half of the power while meeting the resolution specification. It passes functional and electrical verification over a broader process variation than the previous design, and the higher variation resiliency assists the synthesizable Verilog code as a soft IP.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102207"},"PeriodicalIF":1.9,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140918516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A logic device based on memristor-diode crossbar and CMOS periphery as spike router for hardware neural network 基于忆阻器-二极管交叉条和 CMOS 外围的逻辑器件,作为硬件神经网络的尖峰路由器
IF 1.9 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-08 DOI: 10.1016/j.vlsi.2024.102203
A.N. Busygin , S. Yu Udovichenko , A.D. Pisarev , A.H.A. Ebrahim , A.A. Gubin

A programmable logic device based on a memristor-diode crossbar and CMOS logic has been developed. The crossbar implements NAND logic gates using memristor ratioed logic and CMOS inverters. The digitally controlled peripheral circuit provides digital signals transmission and allows modification and evaluation memristor states in the crossbar. The proposed logic device circuit requires fewer transistors than known analogues and less area on the chip.

The maximum size of the crossbar in a logic device is estimated by numerical simulation at the level of electrical circuits. The limited size is caused by the degradation of the logic levels voltages in the memristor-diode crossbar. The operability of peripheral circuits as part of a complete electrical circuit of a logic device is demonstrated during the simulation of the execution of logical operations, the processes of modification and evaluation states of individual memristors.

我们开发了一种基于忆阻器二极管交叉条和 CMOS 逻辑的可编程逻辑器件。该交叉条利用忆阻器比率逻辑和 CMOS 反相器实现了 NAND 逻辑门。数字控制的外围电路提供数字信号传输,并允许修改和评估横条上的忆阻器状态。与已知的类似电路相比,拟议的逻辑器件电路所需的晶体管数量更少,在芯片上所占的面积也更小。忆阻器二极管横条中逻辑电平电压的衰减是造成尺寸受限的原因。在模拟逻辑运算的执行、单个忆阻器的修改和评估状态的过程中,证明了外围电路作为逻辑器件完整电路的一部分的可操作性。
{"title":"A logic device based on memristor-diode crossbar and CMOS periphery as spike router for hardware neural network","authors":"A.N. Busygin ,&nbsp;S. Yu Udovichenko ,&nbsp;A.D. Pisarev ,&nbsp;A.H.A. Ebrahim ,&nbsp;A.A. Gubin","doi":"10.1016/j.vlsi.2024.102203","DOIUrl":"https://doi.org/10.1016/j.vlsi.2024.102203","url":null,"abstract":"<div><p>A programmable logic device based on a memristor-diode crossbar and CMOS logic has been developed. The crossbar implements NAND logic gates using memristor ratioed logic and CMOS inverters. The digitally controlled peripheral circuit provides digital signals transmission and allows modification and evaluation memristor states in the crossbar. The proposed logic device circuit requires fewer transistors than known analogues and less area on the chip.</p><p>The maximum size of the crossbar in a logic device is estimated by numerical simulation at the level of electrical circuits. The limited size is caused by the degradation of the logic levels voltages in the memristor-diode crossbar. The operability of peripheral circuits as part of a complete electrical circuit of a logic device is demonstrated during the simulation of the execution of logical operations, the processes of modification and evaluation states of individual memristors.</p></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"97 ","pages":"Article 102203"},"PeriodicalIF":1.9,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Integration-The Vlsi Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1