首页 > 最新文献

IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

英文 中文
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information IEEE 超大规模集成 (VLSI) 系统论文集 出版信息
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-26 DOI: 10.1109/TVLSI.2024.3422690
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2024.3422690","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3422690","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 9","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10648914","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information 电气和电子工程师学会超大规模集成 (VLSI) 系统学会论文集信息
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-26 DOI: 10.1109/TVLSI.2024.3435251
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2024.3435251","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3435251","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 9","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10648917","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Single-Stage Gain-Boosted Cascode Amplifier With Three-Layer Cascode Feedback Amplifier for Front-End SHA in High-Linearity Pipelined ADC 用于高线性度流水线 ADC 前端 SHA 的带有三层级联反馈放大器的单级增益级联放大器
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-19 DOI: 10.1109/TVLSI.2024.3439374
Yu Liu;Yupeng Shen;Mingliang Chen;Hui Xu;Xubin Chen;Jiarui Liu;Zhiyu Wang;Faxin Yu
In this brief, a high-gain and wide-bandwidth single-stage gain-boosted cascode amplifier (GBCA) is proposed for the front-end sample-and-hold amplifier (SHA) in 14-bit 2.5-GS/s pipelined analog-to-digital converter (ADC). This GBCA is composed of a two-layer main cascode amplifier and a three-layer cascode feedback amplifier (FA). The three-layer cascode structure introduces more than 20-dB gain enhancement compared with conventional two-layer FAs. However, adjacent poles appear near the gain bandwidth product (GBW) of the three-layer cascode FA, which may seriously deteriorate the phase margin (PM) of the FA and further prolong the settling time of closed-loop GBCA. A PM expansion technique is proposed to improve the PM of FA by adding a group of switched capacitor array. At the same time, the open-loop GBCA achieves 104-dB direct-current (dc) gain and 65.2-GHz GBW, which satisfies the harsh requirements of the ping-pong interleaved SHA with 12-dB gain on-chip. The pipelined ADC fabricated in 28-nm CMOS process consumes 554 mW at 2.5-GS/s sampling rate, while achieves a signal-to-noise-and-distortion ratio (SNDR) of 52.5 dB and a spurious free dynamic range (SFDR) of 86.4 dBc with 161-MHz input signal.
本文提出了一种用于14位2.5 gs /s流水线模数转换器(ADC)前端采样保持放大器(SHA)的高增益、宽带单级增益增强级联码放大器(GBCA)。该放大器由两层主级联放大器和三层级联反馈放大器组成。与传统的两层级联放大器相比,三层级联放大器的增益增强超过20 db。然而,在三层级联码增益带宽积(GBW)附近会出现相邻极点,这可能会严重恶化增益带宽积的相位裕度(PM),进一步延长闭环GBCA的稳定时间。提出了一种通过增加一组开关电容阵列来提高FA的PM扩展技术。同时,开环GBCA实现了104 db直流增益和65.2 ghz GBW,满足了片上12db增益的乒乓交错SHA的苛刻要求。采用28纳米CMOS工艺制作的流水线ADC在2.5-GS/s采样速率下功耗为554 mW,在输入信号为161-MHz时,信噪比(SNDR)为52.5 dB,无杂散动态范围(SFDR)为86.4 dBc。
{"title":"A Single-Stage Gain-Boosted Cascode Amplifier With Three-Layer Cascode Feedback Amplifier for Front-End SHA in High-Linearity Pipelined ADC","authors":"Yu Liu;Yupeng Shen;Mingliang Chen;Hui Xu;Xubin Chen;Jiarui Liu;Zhiyu Wang;Faxin Yu","doi":"10.1109/TVLSI.2024.3439374","DOIUrl":"10.1109/TVLSI.2024.3439374","url":null,"abstract":"In this brief, a high-gain and wide-bandwidth single-stage gain-boosted cascode amplifier (GBCA) is proposed for the front-end sample-and-hold amplifier (SHA) in 14-bit 2.5-GS/s pipelined analog-to-digital converter (ADC). This GBCA is composed of a two-layer main cascode amplifier and a three-layer cascode feedback amplifier (FA). The three-layer cascode structure introduces more than 20-dB gain enhancement compared with conventional two-layer FAs. However, adjacent poles appear near the gain bandwidth product (GBW) of the three-layer cascode FA, which may seriously deteriorate the phase margin (PM) of the FA and further prolong the settling time of closed-loop GBCA. A PM expansion technique is proposed to improve the PM of FA by adding a group of switched capacitor array. At the same time, the open-loop GBCA achieves 104-dB direct-current (dc) gain and 65.2-GHz GBW, which satisfies the harsh requirements of the ping-pong interleaved SHA with 12-dB gain on-chip. The pipelined ADC fabricated in 28-nm CMOS process consumes 554 mW at 2.5-GS/s sampling rate, while achieves a signal-to-noise-and-distortion ratio (SNDR) of 52.5 dB and a spurious free dynamic range (SFDR) of 86.4 dBc with 161-MHz input signal.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"47-51"},"PeriodicalIF":2.8,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 65 nm CMOS Analog Programmable Standard Cell Library for Mixed-Signal Computing 用于混合信号计算的 65 纳米 CMOS 模拟可编程标准单元库
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-19 DOI: 10.1109/TVLSI.2024.3432916
Pranav O. Mathews;Praveen Raj Ayyappan;Afolabi Ige;Swagat Bhattacharyya;Linhao Yang;Jennifer O. Hasler
Integrated circuit (IC) design for analog computing requires similar toolflows and synthesis as large-scale digital systems, in-turn necessitating a library of general-purpose analog cells. To this end, we present a programmable, floating-gate (FG)-based analog standard cell library in a commercially available 65 nm process that allows analog IC designers to use synthesis tools with an abstracted design mindset similar to large-scale digital design. We fabricate the test cells, which include filters with programmable corners, an analog classifier, and an arbitrary waveform generator (AWG); experimentally characterize FG programming; and experimentally demonstrate the performance of the standard cells. Overall, the standard cells achieve a similar or smaller footprint than previous approaches while leveraging the benefits of FG programming at smaller technology nodes.
模拟计算的集成电路 (IC) 设计需要与大规模数字系统类似的工具流程和综合,因此需要一个通用模拟单元库。为此,我们在商用 65 纳米工艺中推出了基于浮栅 (FG) 的可编程模拟标准单元库,使模拟集成电路设计人员能够以类似于大规模数字设计的抽象设计思路使用综合工具。我们制作了测试单元,其中包括具有可编程拐角的滤波器、模拟分类器和任意波形发生器 (AWG);通过实验鉴定了 FG 编程;并通过实验演示了标准单元的性能。总体而言,标准单元实现了与以往方法相似或更小的占位面积,同时在更小的技术节点上充分利用了 FG 编程的优势。
{"title":"A 65 nm CMOS Analog Programmable Standard Cell Library for Mixed-Signal Computing","authors":"Pranav O. Mathews;Praveen Raj Ayyappan;Afolabi Ige;Swagat Bhattacharyya;Linhao Yang;Jennifer O. Hasler","doi":"10.1109/TVLSI.2024.3432916","DOIUrl":"10.1109/TVLSI.2024.3432916","url":null,"abstract":"Integrated circuit (IC) design for analog computing requires similar toolflows and synthesis as large-scale digital systems, in-turn necessitating a library of general-purpose analog cells. To this end, we present a programmable, floating-gate (FG)-based analog standard cell library in a commercially available 65 nm process that allows analog IC designers to use synthesis tools with an abstracted design mindset similar to large-scale digital design. We fabricate the test cells, which include filters with programmable corners, an analog classifier, and an arbitrary waveform generator (AWG); experimentally characterize FG programming; and experimentally demonstrate the performance of the standard cells. Overall, the standard cells achieve a similar or smaller footprint than previous approaches while leveraging the benefits of FG programming at smaller technology nodes.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1830-1840"},"PeriodicalIF":2.8,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unrolled, Pipelined, and Stage-Folded Architectures for Encoding of Multi-Kernel Polar Codes 多内核极坐标编码的非卷积、流水线和分阶段折叠架构
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-14 DOI: 10.1109/TVLSI.2024.3436872
Hossein Rezaei;Elham Abbasi;Nandana Rajatheva;Matti Latva-Aho
Over the past decade, polar codes have received significant attraction and have been selected as the coding method for the control channel in fifth-generation (5G) wireless communication systems. However, conventional polar codes are reliant solely on binary ( $2 times 2$ ) kernels, which restricts their block length to being only powers of 2. In response, multi-kernel (MK) polar codes have been proposed as a viable solution to achieve increased flexibility in code length. This article proposes unrolled and pipelined architectures for encoding both systematic and nonsystematic MK polar codes, capable of high-throughput encoding of codes constructed with binary, ternary ( $3 times 3$ ), or binary-ternary mixed kernels. Furthermore, two novel nonsystematic stage-folded encoders, designed to minimize resource usage, have been introduced for the encoding of pure-ternary and MK codes. The proposed MK encoders additionally provide the functionality of dynamic kernel assignment. The proposed architectures exhibit an unprecedented level of flexibility by supporting 83 different codes and offering various architectures that provide tradeoffs between throughput and resource consumption. The FPGA implementation results demonstrate that a partially pipelined polar encoder of size $N=4096$ operating at a frequency of 270 MHz gives a throughput of 1080 Gb/s. In addition, a new compiler scripted in Python is introduced to automatically generate HDL modules for the desired encoders. By inserting the desired parameters, a designer can simply obtain all the necessary VHDL files for FPGA implementation.
在过去十年中,极地编码受到了极大的关注,并被选为第五代(5G)无线通信系统中控制信道的编码方法。然而,传统极性编码仅依赖于二进制(2 美元乘 2 美元)内核,这就限制了其块长度只能是 2 的幂次。本文提出了用于编码系统性和非系统性 MK 极码的非滚动和流水线架构,能够对二元、三元(3 美元乘以 3 美元)或二元三元混合内核构建的极码进行高吞吐量编码。此外,还引入了两种新型非系统分阶段折叠编码器,旨在最大限度地减少资源使用,用于纯三元码和 MK 码的编码。拟议的 MK 编码器还提供了动态内核分配功能。所提出的架构支持 83 种不同的编码,并提供了在吞吐量和资源消耗之间进行权衡的各种架构,从而展现出前所未有的灵活性。FPGA 实现结果表明,在 270 MHz 频率下运行的部分流水线极性编码器(大小为 $N=4096$)的吞吐量可达 1080 Gb/s。此外,还引入了一个新的 Python 编译器,用于自动生成所需的编码器 HDL 模块。通过插入所需的参数,设计人员可以简单地获得所有必要的 VHDL 文件,用于 FPGA 实现。
{"title":"Unrolled, Pipelined, and Stage-Folded Architectures for Encoding of Multi-Kernel Polar Codes","authors":"Hossein Rezaei;Elham Abbasi;Nandana Rajatheva;Matti Latva-Aho","doi":"10.1109/TVLSI.2024.3436872","DOIUrl":"10.1109/TVLSI.2024.3436872","url":null,"abstract":"Over the past decade, polar codes have received significant attraction and have been selected as the coding method for the control channel in fifth-generation (5G) wireless communication systems. However, conventional polar codes are reliant solely on binary (\u0000<inline-formula> <tex-math>$2 times 2$ </tex-math></inline-formula>\u0000) kernels, which restricts their block length to being only powers of 2. In response, multi-kernel (MK) polar codes have been proposed as a viable solution to achieve increased flexibility in code length. This article proposes unrolled and pipelined architectures for encoding both systematic and nonsystematic MK polar codes, capable of high-throughput encoding of codes constructed with binary, ternary (\u0000<inline-formula> <tex-math>$3 times 3$ </tex-math></inline-formula>\u0000), or binary-ternary mixed kernels. Furthermore, two novel nonsystematic stage-folded encoders, designed to minimize resource usage, have been introduced for the encoding of pure-ternary and MK codes. The proposed MK encoders additionally provide the functionality of dynamic kernel assignment. The proposed architectures exhibit an unprecedented level of flexibility by supporting 83 different codes and offering various architectures that provide tradeoffs between throughput and resource consumption. The FPGA implementation results demonstrate that a partially pipelined polar encoder of size \u0000<inline-formula> <tex-math>$N=4096$ </tex-math></inline-formula>\u0000 operating at a frequency of 270 MHz gives a throughput of 1080 Gb/s. In addition, a new compiler scripted in Python is introduced to automatically generate HDL modules for the desired encoders. By inserting the desired parameters, a designer can simply obtain all the necessary VHDL files for FPGA implementation.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2107-2120"},"PeriodicalIF":2.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Conjugated Current Mirrors: A General Enhancement in Transconductance Amplifiers 共轭电流镜:跨导放大器的全面提升
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-14 DOI: 10.1109/TVLSI.2024.3439525
Meysam Akbari;Kea-Tiong Tang
This work presents a general enhancement in operational transconductance amplifiers (OTAs) by conjugating the diode-connected topologies of the current mirrors (CMs). The proposed conjugation method provides an internal high-impedance node, by which the transconductance of the amplifier is significantly increased. Since the central node of the conjugated CMs is virtually grounded for small differential signals, the cascode devices of the diode-connected topologies can be employed as an extra differential pair causing a further enhancement in transconductance. Moreover, the large signal behavior of the circuit shows that the conjugated CMs are capable of copying a dynamic current with a higher gain in comparison with a traditional CM amplifier. This advantage results in faster charging and discharging of the output capacitive load, which provides a larger slew rate (SR) without increasing the quiescent current. The proposed amplifier was manufactured with TSMC 0.18- $mu $ m CMOS technology occupying a silicon area of $55.5times 48.9~mu $ m. Experimental results at a supply voltage of 1.8 V show a gain bandwidth (GBW) of 104.9 MHz, a dc gain of 79.1 dB, and an SR of 55.7 V/ $mu $ s for a capacitive load of 10 pF, while the circuit consumes 489- $mu $ W power.
这项研究通过共轭电流镜(CM)的二极管连接拓扑结构,提出了一种普遍增强运算跨导放大器(OTA)的方法。所提出的共轭方法提供了一个内部高阻抗节点,从而显著提高了放大器的跨导。由于共轭 CM 的中心节点对于小差分信号几乎是接地的,因此二极管连接拓扑的级联器件可用作额外的差分对,从而进一步提高跨导。此外,电路的大信号行为表明,与传统的 CM 放大器相比,共轭 CM 能够以更高的增益复制动态电流。这一优势可加快输出电容负载的充放电速度,从而在不增加静态电流的情况下提供更大的压摆率(SR)。实验结果表明,在电源电压为 1.8 V 时,增益带宽 (GBW) 为 104.9 MHz,直流增益为 79.1 dB,电容负载为 10 pF 时的 SR 为 55.7 V/ $mu $s,而电路功耗为 489- $mu $W。
{"title":"The Conjugated Current Mirrors: A General Enhancement in Transconductance Amplifiers","authors":"Meysam Akbari;Kea-Tiong Tang","doi":"10.1109/TVLSI.2024.3439525","DOIUrl":"10.1109/TVLSI.2024.3439525","url":null,"abstract":"This work presents a general enhancement in operational transconductance amplifiers (OTAs) by conjugating the diode-connected topologies of the current mirrors (CMs). The proposed conjugation method provides an internal high-impedance node, by which the transconductance of the amplifier is significantly increased. Since the central node of the conjugated CMs is virtually grounded for small differential signals, the cascode devices of the diode-connected topologies can be employed as an extra differential pair causing a further enhancement in transconductance. Moreover, the large signal behavior of the circuit shows that the conjugated CMs are capable of copying a dynamic current with a higher gain in comparison with a traditional CM amplifier. This advantage results in faster charging and discharging of the output capacitive load, which provides a larger slew rate (SR) without increasing the quiescent current. The proposed amplifier was manufactured with TSMC 0.18-\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000m CMOS technology occupying a silicon area of \u0000<inline-formula> <tex-math>$55.5times 48.9~mu $ </tex-math></inline-formula>\u0000m. Experimental results at a supply voltage of 1.8 V show a gain bandwidth (GBW) of 104.9 MHz, a dc gain of 79.1 dB, and an SR of 55.7 V/\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000s for a capacitive load of 10 pF, while the circuit consumes 489-\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000W power.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1801-1811"},"PeriodicalIF":2.8,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of Low-Complexity Quantized Compressive Sensing Using Measurement Predictive Coding 利用测量预测编码设计低复杂度量化压缩传感
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-12 DOI: 10.1109/TVLSI.2024.3438249
Lakshmi Bhanuprakash Reddy Konduru;Vikramkumar Pudi;Balasubramanyam Appina
Block-based compressive sensing (BCS) has evolved as a promising method for smart devices with limited bandwidth and computing capabilities, striking a balance between image/video quality and transmission efficiency. Despite its advantages, BCS falls short in reducing bitrate compared with traditional acquisition systems, because it increases the number of bits per measurement, which leads to high storage and transmission costs. In this context, we propose a measurement predictive coding (MPC) along with the quantization method in integration with BCS named BCS-MPC; here, we have performed the quantization with bit shifts only instead of binary division. The proposed method reduces the number of bits per compressive sensing (CS) measurement as well as the transmission of the quantization step size. Furthermore, it reduces the latency and hardware resources. The proposed method improved on average +3.44 to +8.28 dB in PSNR over the current works. From the synthesis results, the proposed BCS-MPC method requires 26.11%, 18.89%, and 82.53% less area, power, and delay over the existing work. We have achieved a reduction in delay with bit-shift operations.
基于块的压缩感知(BCS)已经发展成为具有有限带宽和计算能力的智能设备的一种有前途的方法,在图像/视频质量和传输效率之间取得平衡。尽管具有优势,但与传统采集系统相比,BCS在降低比特率方面存在不足,因为它增加了每次测量的比特数,从而导致较高的存储和传输成本。在此背景下,我们提出了一种测量预测编码(MPC)以及与BCS集成的量化方法,称为BCS-MPC;在这里,我们只使用位移位而不是二进制除法进行量化。该方法减少了每次压缩感知(CS)测量的比特数以及量化步长的传输。此外,它还减少了延迟和硬件资源。该方法比现有方法平均提高了+3.44 ~ +8.28 dB的PSNR。从综合结果来看,本文提出的BCS-MPC方法比现有方法的面积、功耗和延迟分别减少26.11%、18.89%和82.53%。我们已经通过位移操作实现了延迟的减少。
{"title":"Design of Low-Complexity Quantized Compressive Sensing Using Measurement Predictive Coding","authors":"Lakshmi Bhanuprakash Reddy Konduru;Vikramkumar Pudi;Balasubramanyam Appina","doi":"10.1109/TVLSI.2024.3438249","DOIUrl":"10.1109/TVLSI.2024.3438249","url":null,"abstract":"Block-based compressive sensing (BCS) has evolved as a promising method for smart devices with limited bandwidth and computing capabilities, striking a balance between image/video quality and transmission efficiency. Despite its advantages, BCS falls short in reducing bitrate compared with traditional acquisition systems, because it increases the number of bits per measurement, which leads to high storage and transmission costs. In this context, we propose a measurement predictive coding (MPC) along with the quantization method in integration with BCS named BCS-MPC; here, we have performed the quantization with bit shifts only instead of binary division. The proposed method reduces the number of bits per compressive sensing (CS) measurement as well as the transmission of the quantization step size. Furthermore, it reduces the latency and hardware resources. The proposed method improved on average +3.44 to +8.28 dB in PSNR over the current works. From the synthesis results, the proposed BCS-MPC method requires 26.11%, 18.89%, and 82.53% less area, power, and delay over the existing work. We have achieved a reduction in delay with bit-shift operations.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"288-292"},"PeriodicalIF":2.8,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M2M: A Fine-Grained Mapping Framework to Accelerate Multiple DNNs on a Multi-Chiplet Architecture M2M: 在多芯片架构上加速多个 DNN 的细粒度映射框架
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-12 DOI: 10.1109/TVLSI.2024.3438549
Jinming Zhang;Xuyan Wang;Yaoyao Ye;Dongxu Lyu;Guojie Xiong;Ningyi Xu;Yong Lian;Guanghui He
With the advancement of artificial intelligence, the collaboration of multiple deep neural networks (DNNs) has been crucial to existing embedded systems and cloud systems, especially for automatic driving applications as well as augmented and virtual reality (AR/VR) applications. To trade off between cost and performance, chiplet-based DNN accelerators have emerged as a promising solution for accelerating DNN workloads. However, most existing mapping methods for multiple DNNs target for the monolithic chip, which fail to solve the problems faced by the emerging multi-chiplet architecture, such as the problems of distributed memory access, complex heterogeneous interconnect network, and the scaling-up of computing resources. In this work, we propose M2M, a fine-grained mapping framework for accelerating multiple DNNs on a multi-chiplet architecture. It includes a temporal and spatial task scheduling for reconfigurable dataflow accelerators and a communication-aware task mapping in a heterogeneous interconnect network. To enhance communication efficiency and reduce the overall latency, we further propose a fine-tuned quality-of-service (QoS) policy for network-on-package (NoP) links. To the best of our knowledge, this is the first fine-grained mapping framework for multiple DNNs on a multi-chiplet architecture. We implemented the proposed fine-grained mapping framework using genetic algorithm and simulated annealing algorithm. Experimental results show that our work achieves 7.18%–61.09% latency reduction under vision, language, and mixed workloads when compared with the state-of-the-art related work.
随着人工智能的发展,多个深度神经网络(DNN)的协作对现有的嵌入式系统和云系统至关重要,特别是在自动驾驶应用以及增强和虚拟现实(AR/VR)应用中。为了在成本和性能之间进行权衡,基于芯片组的 DNN 加速器已成为加速 DNN 工作负载的一种有前途的解决方案。然而,现有的多 DNN 映射方法大多针对单片机,无法解决新兴的多芯片架构所面临的问题,如分布式内存访问、复杂的异构互连网络和计算资源扩展等问题。在这项工作中,我们提出了在多芯片架构上加速多个 DNN 的细粒度映射框架 M2M。它包括可重构数据流加速器的时间和空间任务调度,以及异构互连网络中的通信感知任务映射。为了提高通信效率并降低整体延迟,我们进一步提出了针对包上网络(NoP)链路的微调服务质量(QoS)策略。据我们所知,这是首个针对多芯片架构上多个 DNN 的细粒度映射框架。我们使用遗传算法和模拟退火算法实现了所提出的细粒度映射框架。实验结果表明,与最先进的相关工作相比,我们的工作在视觉、语言和混合工作负载下实现了 7.18%-61.09% 的延迟降低。
{"title":"M2M: A Fine-Grained Mapping Framework to Accelerate Multiple DNNs on a Multi-Chiplet Architecture","authors":"Jinming Zhang;Xuyan Wang;Yaoyao Ye;Dongxu Lyu;Guojie Xiong;Ningyi Xu;Yong Lian;Guanghui He","doi":"10.1109/TVLSI.2024.3438549","DOIUrl":"10.1109/TVLSI.2024.3438549","url":null,"abstract":"With the advancement of artificial intelligence, the collaboration of multiple deep neural networks (DNNs) has been crucial to existing embedded systems and cloud systems, especially for automatic driving applications as well as augmented and virtual reality (AR/VR) applications. To trade off between cost and performance, chiplet-based DNN accelerators have emerged as a promising solution for accelerating DNN workloads. However, most existing mapping methods for multiple DNNs target for the monolithic chip, which fail to solve the problems faced by the emerging multi-chiplet architecture, such as the problems of distributed memory access, complex heterogeneous interconnect network, and the scaling-up of computing resources. In this work, we propose M2M, a fine-grained mapping framework for accelerating multiple DNNs on a multi-chiplet architecture. It includes a temporal and spatial task scheduling for reconfigurable dataflow accelerators and a communication-aware task mapping in a heterogeneous interconnect network. To enhance communication efficiency and reduce the overall latency, we further propose a fine-tuned quality-of-service (QoS) policy for network-on-package (NoP) links. To the best of our knowledge, this is the first fine-grained mapping framework for multiple DNNs on a multi-chiplet architecture. We implemented the proposed fine-grained mapping framework using genetic algorithm and simulated annealing algorithm. Experimental results show that our work achieves 7.18%–61.09% latency reduction under vision, language, and mixed workloads when compared with the state-of-the-art related work.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1864-1877"},"PeriodicalIF":2.8,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DLB-CNet: Difference Learning-Based Convolution Network for Building Change Detection DLB-CNet:用于建筑物变化检测的基于差分学习的卷积网络
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-09 DOI: 10.1109/TVLSI.2024.3438728
Zipeng Fan;Sanqian Wang;Xueting Pu;Yuting Cong;Yuan Liu;Xiubao Sui;Qian Chen
Change detection (CD) in remote sensing (RS) images is a technique used to analyze and characterize surface changes from remotely sensed data at different time periods. However, current deep-learning-based methods sometimes struggle with the diversity of targets in complex RS scenarios, leading to issues, such as false detections and loss of detail. To address these challenges, we propose a method called difference learning-based convolution and network (DLB-CNet) for building CD (BCD). In DLB-CNet, we use difference learning module (DLM), accomplishing the extraction of building change features by enhancing the feature differences between the two images and enhancing model robustness. Additionally, an innovative attention module called integration attention (IA) is introduced to efficiently process semantic information by jointly focusing on global representation subspaces. Our model achieves impressive results on the LEVIR-CD dataset, WHU-CD dataset, and CDD dataset, with ${F}1$ -scores of 90.56%, 92.28%, and 94.98%, respectively, demonstrating its superiority over the state-of-the-art methods.
遥感(RS)图像中的变化检测(CD)是一种用于分析和描述不同时间段遥感数据表面变化的技术。然而,目前基于深度学习的方法有时难以应对复杂 RS 场景中目标的多样性,从而导致误检测和细节丢失等问题。为了应对这些挑战,我们提出了一种名为基于差分学习的卷积和网络(DLB-CNet)的方法,用于构建 CD(BCD)。在 DLB-CNet 中,我们使用了差异学习模块(DLM),通过增强两幅图像之间的特征差异来完成建筑物变化特征的提取,并增强模型的鲁棒性。此外,我们还引入了创新的注意力模块--整合注意力(IA),通过共同关注全局表示子空间来有效处理语义信息。我们的模型在LEVIR-CD数据集、WHU-CD数据集和CDD数据集上取得了令人印象深刻的结果,{F}1$得分率分别为90.56%、92.28%和94.98%,这表明它优于最先进的方法。
{"title":"DLB-CNet: Difference Learning-Based Convolution Network for Building Change Detection","authors":"Zipeng Fan;Sanqian Wang;Xueting Pu;Yuting Cong;Yuan Liu;Xiubao Sui;Qian Chen","doi":"10.1109/TVLSI.2024.3438728","DOIUrl":"10.1109/TVLSI.2024.3438728","url":null,"abstract":"Change detection (CD) in remote sensing (RS) images is a technique used to analyze and characterize surface changes from remotely sensed data at different time periods. However, current deep-learning-based methods sometimes struggle with the diversity of targets in complex RS scenarios, leading to issues, such as false detections and loss of detail. To address these challenges, we propose a method called difference learning-based convolution and network (DLB-CNet) for building CD (BCD). In DLB-CNet, we use difference learning module (DLM), accomplishing the extraction of building change features by enhancing the feature differences between the two images and enhancing model robustness. Additionally, an innovative attention module called integration attention (IA) is introduced to efficiently process semantic information by jointly focusing on global representation subspaces. Our model achieves impressive results on the LEVIR-CD dataset, WHU-CD dataset, and CDD dataset, with \u0000<inline-formula> <tex-math>${F}1$ </tex-math></inline-formula>\u0000-scores of 90.56%, 92.28%, and 94.98%, respectively, demonstrating its superiority over the state-of-the-art methods.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2037-2045"},"PeriodicalIF":2.8,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 20-V Pulse Driver Based on All-nMOS Charge Pump Without Reversion Loss and Overstress in 65-nm Standard CMOS Technology 基于全 nMOS 充电泵的 20 V 脉冲驱动器,采用 65-nm 标准 CMOS 技术,无反向损耗和过应力
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-09 DOI: 10.1109/TVLSI.2024.3435974
Ziliang Zhou;Min Tan
This article proposes a high-efficiency all-nMOS bidirectional charge pump (CP) cell and constructs a CP-based high-voltage (HV) pulse driver based on it. Double-diode substrate isolation (DDSI) can extend the maximum supported voltage in a bulk CMOS process, but it requires an all-nMOS implementation of CP cells. Existing all-nMOS CPs either do not support the bidirectional charge transfer required for HV pulse drivers, or achieve it with additional penalties such as reversion charge loss and overstress on transistors. The proposed all-nMOS CP with novel gate voltage control strategies is the first one reported in the literature that can support the bidirectional charge transfer required for HV pulse drivers without suffering from reversion loss and threshold voltage loss or causing overstress on transistors. A ten-stage CP-based HV pulse driver is implemented in a 65-nm CMOS process utilizing this cell. Postlayout simulation results demonstrate that it can reliably generate 20-V HV pulses from a 2.5 V supply for a 15 pF // 200 k $Omega $ load at 55 kHz. The driver exhibits a peak power efficiency of 46.4% and occupies an area of 0.262 mm2.
本文提出了一种高效率全 MOS 双向电荷泵 (CP) 单元,并在此基础上构建了基于 CP 的高压 (HV) 脉冲驱动器。双二极管基底隔离(DDSI)可以扩展体CMOS工艺的最大支持电压,但它需要CP单元的全MOS实现。现有的全非 MOS CP 要么不支持高压脉冲驱动器所需的双向电荷转移,要么在实现双向电荷转移时会产生额外的损失,如反向电荷损失和晶体管上的过应力。所提出的全 MOS CP 采用了新颖的栅极电压控制策略,是文献中首次报道的能够支持高压脉冲驱动器所需的双向电荷转移,同时不会出现回流损失和阈值电压损失,也不会对晶体管造成过应力。利用这种单元,在 65 纳米 CMOS 工艺中实现了基于 CP 的十级高压脉冲驱动器。布局后仿真结果表明,它能在 55 kHz 频率下可靠地从 2.5 V 电源为 15 pF // 200 k $Omega $ 负载产生 20 V HV 脉冲。该驱动器的峰值功率效率为 46.4%,占地面积为 0.262 mm2。
{"title":"A 20-V Pulse Driver Based on All-nMOS Charge Pump Without Reversion Loss and Overstress in 65-nm Standard CMOS Technology","authors":"Ziliang Zhou;Min Tan","doi":"10.1109/TVLSI.2024.3435974","DOIUrl":"10.1109/TVLSI.2024.3435974","url":null,"abstract":"This article proposes a high-efficiency all-nMOS bidirectional charge pump (CP) cell and constructs a CP-based high-voltage (HV) pulse driver based on it. Double-diode substrate isolation (DDSI) can extend the maximum supported voltage in a bulk CMOS process, but it requires an all-nMOS implementation of CP cells. Existing all-nMOS CPs either do not support the bidirectional charge transfer required for HV pulse drivers, or achieve it with additional penalties such as reversion charge loss and overstress on transistors. The proposed all-nMOS CP with novel gate voltage control strategies is the first one reported in the literature that can support the bidirectional charge transfer required for HV pulse drivers without suffering from reversion loss and threshold voltage loss or causing overstress on transistors. A ten-stage CP-based HV pulse driver is implemented in a 65-nm CMOS process utilizing this cell. Postlayout simulation results demonstrate that it can reliably generate 20-V HV pulses from a 2.5 V supply for a 15 pF // 200 k\u0000<inline-formula> <tex-math>$Omega $ </tex-math></inline-formula>\u0000 load at 55 kHz. The driver exhibits a peak power efficiency of 46.4% and occupies an area of 0.262 mm2.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"1812-1821"},"PeriodicalIF":2.8,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1