首页 > 最新文献

IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

英文 中文
A Fast Floating-Point Multiply–Accumulator Optimized for Sparse Linear Algebra on FPGAs 基于fpga的稀疏线性代数快速浮点乘加器优化
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-23 DOI: 10.1109/TVLSI.2025.3578619
Kun Li;Xiangyu Hao;Zhenguo Ma;Feng Yu;Bo Zhang;Qianjian Xing
This brief presents a pipelined floating-point Multiply–Accumulator (FPMAC) architecture designed to accelerate sparse linear algebra operations. By designing a lookup-table-based 5–3 carry-save adder (CSA) and combining it with a 3–2 CSA, the proposed design minimizes the critical path and boosts operational speed. Moreover, the proposed architecture takes advantage of data characteristics in sparse linear algebra to displace the shift unit in the critical accumulation loop, further increasing the throughput rate. In addition, the integration of a lookup-table-based leading-zero anticipator (LZA) enhances normalization efficiency. Experimental results show that, compared with reported FPMAC designs, the proposed architecture may achieve a significantly higher maximum clock frequency for single-precision floating-point operations.
本文介绍了一种用于加速稀疏线性代数运算的流水线式浮点乘法累加器(FPMAC)架构。通过设计基于查找表的5-3进位节省加法器(CSA),并将其与3-2进位节省加法器相结合,最小化了关键路径,提高了运算速度。此外,该架构利用稀疏线性代数中的数据特征来取代临界积累环路中的移位单元,进一步提高了吞吐量。此外,还集成了基于查找表的前导零预期器(LZA),提高了归一化效率。实验结果表明,与已有的FPMAC设计相比,该架构可以实现更高的单精度浮点运算最大时钟频率。
{"title":"A Fast Floating-Point Multiply–Accumulator Optimized for Sparse Linear Algebra on FPGAs","authors":"Kun Li;Xiangyu Hao;Zhenguo Ma;Feng Yu;Bo Zhang;Qianjian Xing","doi":"10.1109/TVLSI.2025.3578619","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3578619","url":null,"abstract":"This brief presents a pipelined floating-point Multiply–Accumulator (FPMAC) architecture designed to accelerate sparse linear algebra operations. By designing a lookup-table-based 5–3 carry-save adder (CSA) and combining it with a 3–2 CSA, the proposed design minimizes the critical path and boosts operational speed. Moreover, the proposed architecture takes advantage of data characteristics in sparse linear algebra to displace the shift unit in the critical accumulation loop, further increasing the throughput rate. In addition, the integration of a lookup-table-based leading-zero anticipator (LZA) enhances normalization efficiency. Experimental results show that, compared with reported FPMAC designs, the proposed architecture may achieve a significantly higher maximum clock frequency for single-precision floating-point operations.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2592-2596"},"PeriodicalIF":3.1,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RISC-V-Based GPGPU With Vector Capabilities for High-Performance Computing 基于risc - v的GPGPU,具有高性能计算的矢量能力
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-23 DOI: 10.1109/TVLSI.2025.3574427
Jingzhou Li;Fangfei Yu;Mingyuan Ma;Wei Liu;Yuhan Wang;Hualin Wu;Hu He
General-purpose graphics processing units (GPGPUs) have become a leading platform for accelerating modern compute-intensive applications, such as large language models and generative artificial intelligence (AI). However, the lack of advanced open-source GPGPU microarchitectures has hindered high-performance research in this area. In this article, we present Ventus, a high-performance open-source GPGPU implementation built upon the RISC-V architecture with vector extension [RISC-V vector (RVV)]. Ventus introduces customized instructions and a comprehensive software toolchain to optimize performance. We deployed the design on a field programmable gate array (FPGA) platform consisting of 4 Xilinx VU19P devices, scaling up to 16 streaming multiprocessors (SMs) and supporting 256 warps. Experimental results demonstrate that Ventus exhibits key performance features comparable to commercial GPGPUs, achieving an average of 83.9% instruction reduction and 87.4% cycle per instruction (CPI) improvement over the leading open-source alternatives. Under 4-, 8-, and 16-thread configurations, Ventus maintains robust instruction per cycle (IPC) performance with values of 0.47, 0.40, and 0.32, respectively. In addition, the tensor core of Ventus attains an extra average reduction of 69.1% in instruction count and a 68.4% cycle reduction ratio when running AI-related workloads. These findings highlight Ventus as a promising solution for future high-performance GPGPU research and development, offering a robust open-source alternative to proprietary solutions. Ventus can be found on https://github.com/THU-DSP-LAB/ventus-gpgpu
通用图形处理单元(gpgpu)已经成为加速现代计算密集型应用(如大型语言模型和生成式人工智能(AI))的领先平台。然而,缺乏先进的开源GPGPU微架构阻碍了这一领域的高性能研究。在本文中,我们介绍了Ventus,一种基于RISC-V架构的高性能开源GPGPU实现,带有向量扩展[RISC-V vector (RVV)]。Ventus引入定制指令和全面的软件工具链来优化性能。我们将该设计部署在现场可编程门阵列(FPGA)平台上,该平台由4个Xilinx VU19P设备组成,可扩展到16个流多处理器(SMs)并支持256次warp。实验结果表明,Ventus具有与商用gpgpu相当的关键性能特征,与领先的开源替代方案相比,平均减少83.9%的指令和87.4%的每指令周期(CPI)改进。在4、8和16线程配置下,Ventus保持了稳健的指令周期(IPC)性能,其值分别为0.47、0.40和0.32。此外,Ventus的张量核在运行与ai相关的工作负载时,指令数的平均减少率为69.1%,周期减少率为68.4%。这些发现突出了Ventus作为未来高性能GPGPU研发的一个有前途的解决方案,为专有解决方案提供了一个强大的开源替代方案。Ventus可以在https://github.com/THU-DSP-LAB/ventus-gpgpu上找到
{"title":"RISC-V-Based GPGPU With Vector Capabilities for High-Performance Computing","authors":"Jingzhou Li;Fangfei Yu;Mingyuan Ma;Wei Liu;Yuhan Wang;Hualin Wu;Hu He","doi":"10.1109/TVLSI.2025.3574427","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3574427","url":null,"abstract":"General-purpose graphics processing units (GPGPUs) have become a leading platform for accelerating modern compute-intensive applications, such as large language models and generative artificial intelligence (AI). However, the lack of advanced open-source GPGPU microarchitectures has hindered high-performance research in this area. In this article, we present Ventus, a high-performance open-source GPGPU implementation built upon the RISC-V architecture with vector extension [RISC-V vector (RVV)]. Ventus introduces customized instructions and a comprehensive software toolchain to optimize performance. We deployed the design on a field programmable gate array (FPGA) platform consisting of 4 Xilinx VU19P devices, scaling up to 16 streaming multiprocessors (SMs) and supporting 256 warps. Experimental results demonstrate that Ventus exhibits key performance features comparable to commercial GPGPUs, achieving an average of 83.9% instruction reduction and 87.4% cycle per instruction (CPI) improvement over the leading open-source alternatives. Under 4-, 8-, and 16-thread configurations, Ventus maintains robust instruction per cycle (IPC) performance with values of 0.47, 0.40, and 0.32, respectively. In addition, the tensor core of Ventus attains an extra average reduction of 69.1% in instruction count and a 68.4% cycle reduction ratio when running AI-related workloads. These findings highlight Ventus as a promising solution for future high-performance GPGPU research and development, offering a robust open-source alternative to proprietary solutions. Ventus can be found on <uri>https://github.com/THU-DSP-LAB/ventus-gpgpu</uri>","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2239-2251"},"PeriodicalIF":2.8,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144705279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Sample-and-Hold-Based 453-ps True Time Delay Circuit With a Wide Bandwidth of 0.5–2.5 GHz in 65-nm CMOS 基于采样保持的453-ps真延时电路,带宽为0.5-2.5 GHz
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-20 DOI: 10.1109/TVLSI.2025.3578959
Chuanjie Chen;Xiangyu Meng;Wang Xie;Baoyong Chi
Delay solutions applied to high frequencies typically involve switched transmission lines or all-pass filters. These solutions often suffer from significant insertion loss and drastic gain variations at high frequencies, along with poor delay flatness. In this work, we have designed a delay circuit that can be applied to high frequencies, featuring excellent delay flatness, good delay resolution, and a wide bandwidth. In this design, a multistage cascaded sampling circuit is used to generate delays. By introducing differential clocks or three-phase clocks, simple coarse delay or fine delay can be achieved. The measurement results show that the sample and hold circuit achieves a delay accuracy of 17.5 ps and a delay range of 453 ps within 0.5–2.5 GHz, with a gain of −2.7 to 2 dB and a gain variation of ±0.85 dB, a delay variation less than 7.5 ps, a power consumption of 111 mW, and a core area of 0.137 mm2.
适用于高频的延迟解决方案通常涉及交换传输线或全通滤波器。这些解决方案通常在高频率下存在显著的插入损耗和剧烈的增益变化,以及较差的延迟平坦性。在这项工作中,我们设计了一种可以应用于高频的延迟电路,具有优异的延迟平坦性,良好的延迟分辨率和宽带宽。在本设计中,多级级联采样电路用于产生延迟。通过引入差分时钟或三相时钟,可以实现简单的粗延迟或细延迟。测量结果表明,该采样保持电路在0.5 ~ 2.5 GHz范围内的延迟精度为17.5 ps,延迟范围为453 ps,增益为−2.7 ~ 2 dB,增益变化为±0.85 dB,延迟变化小于7.5 ps,功耗为111 mW,核心区面积为0.137 mm2。
{"title":"A Sample-and-Hold-Based 453-ps True Time Delay Circuit With a Wide Bandwidth of 0.5–2.5 GHz in 65-nm CMOS","authors":"Chuanjie Chen;Xiangyu Meng;Wang Xie;Baoyong Chi","doi":"10.1109/TVLSI.2025.3578959","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3578959","url":null,"abstract":"Delay solutions applied to high frequencies typically involve switched transmission lines or all-pass filters. These solutions often suffer from significant insertion loss and drastic gain variations at high frequencies, along with poor delay flatness. In this work, we have designed a delay circuit that can be applied to high frequencies, featuring excellent delay flatness, good delay resolution, and a wide bandwidth. In this design, a multistage cascaded sampling circuit is used to generate delays. By introducing differential clocks or three-phase clocks, simple coarse delay or fine delay can be achieved. The measurement results show that the sample and hold circuit achieves a delay accuracy of 17.5 ps and a delay range of 453 ps within 0.5–2.5 GHz, with a gain of −2.7 to 2 dB and a gain variation of ±0.85 dB, a delay variation less than 7.5 ps, a power consumption of 111 mW, and a core area of 0.137 mm<sup>2</sup>.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2344-2348"},"PeriodicalIF":2.8,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Unstructured Sparse DNNs via Multilevel Partial Sum Reduction and PE Array-Level Load Balancing 基于多级部分和约简和PE阵列级负载平衡的非结构化稀疏dnn加速
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-20 DOI: 10.1109/TVLSI.2025.3577626
Chendong Xia;Qiang Li;Zhi Li;Bing Li;Huidong Zhao;Shushan Qiao
Unstructured pruning introduces significant sparsity in deep neural networks (DNNs), enhancing accelerator hardware efficiency. However, three critical challenges constrain performance gains: 1) complex fetching logic for nonzero (NZ) data pairs; 2) load imbalance across processing elements (PEs); and 3) PE stalls from write-back contention. This brief proposes an energy-efficient accelerator addressing these inefficiencies through three innovations. First, we propose a Cartesian-product output-row-stationary (CPORS) dataflow that inherently matches NZ data pairs by sequentially fetching compressed data. Second, a multilevel partial sum reduction (MLPR) strategy minimizes write-back traffic and converts random PE stalls into manageable load imbalance. Third, a kernel sorting and load scheduling (KSLS) mechanism resolves PE idle/stall and achieves PE array-level load balancing, attaining 76.6% average PE utilization across all sparsity levels. Implemented in 22-nm CMOS, the accelerator delivers $1.85times $ speedup and $1.4times $ energy efficiency over baseline and achieves 25.8 TOPS/W peak energy efficiency at 90% sparsity.
非结构化剪枝在深度神经网络(dnn)中引入了显著的稀疏性,提高了加速器硬件效率。然而,三个关键挑战限制了性能的提高:1)非零(NZ)数据对的复杂获取逻辑;2)跨处理单元(PEs)的负载不平衡;3) PE因回写争用而停滞。本文提出了一种节能加速器,通过三项创新来解决这些低效率问题。首先,我们提出了一个笛卡尔积输出行平稳(CPORS)数据流,该数据流通过顺序获取压缩数据来固有地匹配NZ数据对。其次,多级部分和缩减(MLPR)策略最小化回写流量,并将随机PE失速转换为可管理的负载不平衡。第三,内核排序和负载调度(KSLS)机制解决PE空闲/停机问题,并实现PE阵列级负载平衡,在所有稀疏度级别上实现76.6%的平均PE利用率。该加速器采用22纳米CMOS,在基线上提供1.85倍的加速和1.4倍的能效,并在90%稀疏度下实现25.8 TOPS/W的峰值能效。
{"title":"Accelerating Unstructured Sparse DNNs via Multilevel Partial Sum Reduction and PE Array-Level Load Balancing","authors":"Chendong Xia;Qiang Li;Zhi Li;Bing Li;Huidong Zhao;Shushan Qiao","doi":"10.1109/TVLSI.2025.3577626","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3577626","url":null,"abstract":"Unstructured pruning introduces significant sparsity in deep neural networks (DNNs), enhancing accelerator hardware efficiency. However, three critical challenges constrain performance gains: 1) complex fetching logic for nonzero (NZ) data pairs; 2) load imbalance across processing elements (PEs); and 3) PE stalls from write-back contention. This brief proposes an energy-efficient accelerator addressing these inefficiencies through three innovations. First, we propose a Cartesian-product output-row-stationary (CPORS) dataflow that inherently matches NZ data pairs by sequentially fetching compressed data. Second, a multilevel partial sum reduction (MLPR) strategy minimizes write-back traffic and converts random PE stalls into manageable load imbalance. Third, a kernel sorting and load scheduling (KSLS) mechanism resolves PE idle/stall and achieves PE array-level load balancing, attaining 76.6% average PE utilization across all sparsity levels. Implemented in 22-nm CMOS, the accelerator delivers <inline-formula> <tex-math>$1.85times $ </tex-math></inline-formula> speedup and <inline-formula> <tex-math>$1.4times $ </tex-math></inline-formula> energy efficiency over baseline and achieves 25.8 TOPS/W peak energy efficiency at 90% sparsity.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2329-2333"},"PeriodicalIF":2.8,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144702122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IDWA: A Importance-Driven Weight Allocation Algorithm for Low Write–Verify Ratio RRAM-Based In-Memory Computing IDWA:一种基于低写验证比ram的内存计算的重要性驱动的权重分配算法
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-20 DOI: 10.1109/TVLSI.2025.3578388
Jingyuan Qu;Debao Wei;Dejun Zhang;Yanlong Zeng;Zhelong Piao;Liyan Qiao
Resistive random access memory (RRAM)-based in-memory computing (IMC) architectures are currently receiving widespread attention. Since this computing approach relies on the analog characteristics of the devices, the write variation of RRAM can affect the computational accuracy to varying degrees. Conventional write–verify (W&V) procedures are performed on all weight parameters, resulting in significant time overhead. To address this issue, we propose a training algorithm that can recover the offline IMC accuracy impacted by write variation with a lower cost of W&V overhead. We introduce a importance-driven weight allocation (IDWA) algorithm during the training process of the neural network. This algorithm constrains the values of less important weights to suppress the diffusion of variation interference on this part of the weights, thus reducing unnecessary accuracy degradation. Additionally, we employ a layer-wise optimization algorithm to identify important weights in the neural network for W&V operations. Extensive testing across various deep neural networks (DNNs) architectures and datasets demonstrates that our proposed selective W&V methodology consistently outperforms current state-of-the-art selective W&V techniques in both accuracy preservation and computational efficiency. At same accuracy levels, it delivers a speed improvement of $6times sim 32times $ compared to other advanced methods.
基于电阻式随机存取存储器(RRAM)的内存计算(IMC)体系结构目前受到广泛关注。由于这种计算方法依赖于器件的模拟特性,因此RRAM的写入变化会在不同程度上影响计算精度。传统的写验证(W&V)过程对所有权重参数执行,导致大量的时间开销。为了解决这个问题,我们提出了一种训练算法,该算法可以以较低的W&V开销成本恢复受写入变化影响的离线IMC精度。在神经网络的训练过程中引入了一种重要性驱动的权重分配(IDWA)算法。该算法对不太重要的权值进行约束,抑制变异干扰在这部分权值上的扩散,从而减少不必要的精度下降。此外,我们采用分层优化算法来识别W&V操作中神经网络中的重要权重。在各种深度神经网络(dnn)架构和数据集上进行的广泛测试表明,我们提出的选择性W&V方法在准确性保持和计算效率方面始终优于当前最先进的选择性W&V技术。在相同的精度水平下,与其他先进方法相比,它提供了6倍的速度提升。
{"title":"IDWA: A Importance-Driven Weight Allocation Algorithm for Low Write–Verify Ratio RRAM-Based In-Memory Computing","authors":"Jingyuan Qu;Debao Wei;Dejun Zhang;Yanlong Zeng;Zhelong Piao;Liyan Qiao","doi":"10.1109/TVLSI.2025.3578388","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3578388","url":null,"abstract":"Resistive random access memory (RRAM)-based in-memory computing (IMC) architectures are currently receiving widespread attention. Since this computing approach relies on the analog characteristics of the devices, the write variation of RRAM can affect the computational accuracy to varying degrees. Conventional write–verify (W&V) procedures are performed on all weight parameters, resulting in significant time overhead. To address this issue, we propose a training algorithm that can recover the offline IMC accuracy impacted by write variation with a lower cost of W&V overhead. We introduce a importance-driven weight allocation (IDWA) algorithm during the training process of the neural network. This algorithm constrains the values of less important weights to suppress the diffusion of variation interference on this part of the weights, thus reducing unnecessary accuracy degradation. Additionally, we employ a layer-wise optimization algorithm to identify important weights in the neural network for W&V operations. Extensive testing across various deep neural networks (DNNs) architectures and datasets demonstrates that our proposed selective W&V methodology consistently outperforms current state-of-the-art selective W&V techniques in both accuracy preservation and computational efficiency. At same accuracy levels, it delivers a speed improvement of <inline-formula> <tex-math>$6times sim 32times $ </tex-math></inline-formula> compared to other advanced methods.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2508-2517"},"PeriodicalIF":3.1,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 28 nm Dual-Mode SRAM-CIM Macro With Local Computing Cell for CNNs and Grayscale Edge Detection 用于cnn和灰度边缘检测的28nm双模SRAM-CIM宏
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-18 DOI: 10.1109/TVLSI.2025.3578319
Chunyu Peng;Xiaohang Chen;Mengya Gao;Jiating Guo;Lijun Guan;Chenghu Dai;Zhiting Lin;Xiulong Wu
With the rise of artificial intelligence (AI), neural network applications are growing in demand for efficient data transmission. The traditional von Neumann architecture can no longer keep pace with modern technological needs. Computing-in-memory (CIM) is proposed as a promising solution to address this bottleneck. This work introduces a local computing cell (LCC) scheme based on compact 6T-SRAM cells. The proposed circuit aims to enhance energy efficiency and reduce power consumption by reusing the LCC. The LCC circuit can perform the multiplication of a 2-bit input with a 1-bit weight, which can be applied to convolutional neural networks (CNNs) with the multiply-accumulate (MAC) operations. Through circuit reuse, it can also be used for multibit multiply operations, performing 2-bit input multiplication and 1-bit weight addition, which can be applied to grayscale edge detection in images. The energy efficiency of the SRAM-CIM macro achieves an energy efficiency of 46.3 TOPS/W under MAC operations with input precision of 8-bits and weight precision of 8-bits, and up to 389.1–529.1 TOPS/W under the calculation in one subarray with an input precision of 2-bits and a weight precision of 1-bit. The estimated inference accuracy on CIFAR-10 datasets is 90.21%.
随着人工智能(AI)的兴起,神经网络应用对高效数据传输的需求日益增长。传统的冯·诺依曼建筑已经跟不上现代技术的需要。内存计算(CIM)被认为是解决这一瓶颈的一个很有前途的解决方案。本文介绍了一种基于紧凑6T-SRAM单元的局部计算单元(LCC)方案。该电路旨在通过重复使用LCC来提高能源效率和降低功耗。LCC电路可以实现2位输入与1位权值的乘法运算,可以应用于卷积神经网络(cnn)的乘法累加(MAC)运算。通过电路复用,还可用于多位乘法运算,进行2位输入乘法和1位权值相加,可应用于图像的灰度边缘检测。在MAC操作下,SRAM-CIM宏的能量效率为46.3 TOPS/W,输入精度为8位,权值精度为8位,在一个子阵列中计算,输入精度为2位,权值精度为1位,能量效率可达389.1-529.1 TOPS/W。在CIFAR-10数据集上估计的推理准确率为90.21%。
{"title":"A 28 nm Dual-Mode SRAM-CIM Macro With Local Computing Cell for CNNs and Grayscale Edge Detection","authors":"Chunyu Peng;Xiaohang Chen;Mengya Gao;Jiating Guo;Lijun Guan;Chenghu Dai;Zhiting Lin;Xiulong Wu","doi":"10.1109/TVLSI.2025.3578319","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3578319","url":null,"abstract":"With the rise of artificial intelligence (AI), neural network applications are growing in demand for efficient data transmission. The traditional von Neumann architecture can no longer keep pace with modern technological needs. Computing-in-memory (CIM) is proposed as a promising solution to address this bottleneck. This work introduces a local computing cell (LCC) scheme based on compact 6T-SRAM cells. The proposed circuit aims to enhance energy efficiency and reduce power consumption by reusing the LCC. The LCC circuit can perform the multiplication of a 2-bit input with a 1-bit weight, which can be applied to convolutional neural networks (CNNs) with the multiply-accumulate (MAC) operations. Through circuit reuse, it can also be used for multibit multiply operations, performing 2-bit input multiplication and 1-bit weight addition, which can be applied to grayscale edge detection in images. The energy efficiency of the SRAM-CIM macro achieves an energy efficiency of 46.3 TOPS/W under MAC operations with input precision of 8-bits and weight precision of 8-bits, and up to 389.1–529.1 TOPS/W under the calculation in one subarray with an input precision of 2-bits and a weight precision of 1-bit. The estimated inference accuracy on CIFAR-10 datasets is 90.21%.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2264-2273"},"PeriodicalIF":2.8,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144705280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 10-bit 50-MS/s Radiation Tolerant Split Coarse/Fine SAR ADC in 65-nm CMOS 一个采用65纳米CMOS的10位50毫秒/秒耐辐射分裂粗/细SAR ADC
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-17 DOI: 10.1109/TVLSI.2025.3576998
Ming Yan;Jaime Cardenas Chavez;Kamal El-Sankary;Li Chen;Xiaotong Lu
This article presents a 10-bit radiation-hardened-by-design (RHBD) SAR analog-to-digital converter (ADC) operating at 50 MS/s, designed for aerospace applications in high-radiation environments. The system- and circuit-level redundancy techniques are implemented to mitigate radiation-induced errors and metastability. A novel split coarse/fine asynchronous SAR ADC architecture is proposed to provide system-level redundancy. At circuits level, single-event effects (SEEs) error detection and radiation-hardened techniques are implemented. Our co-designed SEE error detection scheme includes last-bit-cycle (LBC) detection following the LSB cycle and metastability detection (MD) via a ramp generator with a threshold trigger. This approach detects and corrects radiation-induced errors using a coarse/fine redundant algorithm. The radiation-hardened latch comparators and D flip-flops (DFFs) are incorporated to further mitigate SEEs. The prototype design is fabricated using TSMC 65-nm technology, with an ADC core area of 0.0875 mm2 and a power consumption of 2.79 mW at a 1.2-V power supply. Postirradiation tests confirm functionality up to 100-krad(Si) total ionizing dose (TID) and demonstrate over 90% suppression of large SEE under laser testing.
本文介绍了一种工作速度为50 MS/s的10位辐射强化设计(RHBD) SAR模数转换器(ADC),专为高辐射环境下的航空航天应用而设计。系统级和电路级冗余技术的实施,以减轻辐射引起的误差和亚稳态。提出了一种新颖的分割式粗/细异步SAR ADC架构,提供系统级冗余。在电路级,实现了单事件效应(SEEs)错误检测和辐射硬化技术。我们共同设计的SEE错误检测方案包括LSB周期后的最后位周期(LBC)检测和通过带有阈值触发器的斜坡发生器进行亚稳态检测(MD)。该方法采用粗/细冗余算法检测和校正辐射引起的误差。结合了抗辐射锁存比较器和D触发器(dff)来进一步减轻see。该原型设计采用台积电65纳米技术制造,ADC核心面积为0.0875 mm2,功耗为2.79 mW,电源为1.2 v。辐射后测试证实了高达100克拉(Si)总电离剂量(TID)的功能,并且在激光测试下显示了超过90%的大SEE抑制。
{"title":"A 10-bit 50-MS/s Radiation Tolerant Split Coarse/Fine SAR ADC in 65-nm CMOS","authors":"Ming Yan;Jaime Cardenas Chavez;Kamal El-Sankary;Li Chen;Xiaotong Lu","doi":"10.1109/TVLSI.2025.3576998","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3576998","url":null,"abstract":"This article presents a 10-bit radiation-hardened-by-design (RHBD) SAR analog-to-digital converter (ADC) operating at 50 MS/s, designed for aerospace applications in high-radiation environments. The system- and circuit-level redundancy techniques are implemented to mitigate radiation-induced errors and metastability. A novel split coarse/fine asynchronous SAR ADC architecture is proposed to provide system-level redundancy. At circuits level, single-event effects (SEEs) error detection and radiation-hardened techniques are implemented. Our co-designed SEE error detection scheme includes last-bit-cycle (LBC) detection following the LSB cycle and metastability detection (MD) via a ramp generator with a threshold trigger. This approach detects and corrects radiation-induced errors using a coarse/fine redundant algorithm. The radiation-hardened latch comparators and D flip-flops (DFFs) are incorporated to further mitigate SEEs. The prototype design is fabricated using TSMC 65-nm technology, with an ADC core area of 0.0875 mm<sup>2</sup> and a power consumption of 2.79 mW at a 1.2-V power supply. Postirradiation tests confirm functionality up to 100-krad(Si) total ionizing dose (TID) and demonstrate over 90% suppression of large SEE under laser testing.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2132-2142"},"PeriodicalIF":2.8,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A High-Density Energy-Efficient CNM Macro Using Hybrid RRAM and SRAM for Memory-Bound Applications 基于混合RRAM和SRAM的高密度高能效CNM宏
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-17 DOI: 10.1109/TVLSI.2025.3576889
Jun Wang;Shengzhe Yan;Xiangqu Fu;Zhihang Qian;Zhi Li;Zeyu Guo;Zhuoyu Dai;Zhaori Cong;Chunmeng Dou;Feng Zhang;Jinshan Yue;Dashan Shang
The big data era has facilitated various memory-centric algorithms, such as the Transformer decoder, neural network, stochastic computing (SC), and genetic sequence matching, which impose high demands on memory capacity, bandwidth, and access power consumption. The emerging nonvolatile memory devices and compute-near-memory (CNM) architecture offer a promising solution for memory-bound tasks. This work proposes a hybrid resistive random access memory (RRAM) and static random access memory (SRAM) CNM architecture. The main contributions include: 1) proposing an energy-efficient and high-density CNM architecture based on the hybrid integration of RRAM and SRAM arrays; 2) designing low-power CNM circuits using the logic gates and dynamic-logic adder with configurable datapath; and 3) proposing a broadcast mechanism with output-stationary workflow to reduce memory access. The proposed RRAM-SRAM CNM architecture and dataflow tailored for four distinct applications are evaluated at a 28-nm technology, achieving 4.62-TOPS $/$ W energy efficiency and 1.20-Mb $/$ mm2 memory density, which shows $11.35times $ $25.81times $ and $1.44times $ $4.92times $ improvement compared to previous works, respectively.
大数据时代催生了各种以内存为中心的算法,如Transformer解码器、神经网络、随机计算(SC)和基因序列匹配,这些算法对内存容量、带宽和访问功耗提出了很高的要求。新兴的非易失性存储设备和计算近内存(CNM)体系结构为内存受限任务提供了一个有前途的解决方案。本文提出了一种混合电阻随机存取存储器(RRAM)和静态随机存取存储器(SRAM)的CNM架构。主要贡献包括:1)提出了一种基于RRAM和SRAM阵列混合集成的节能高密度CNM架构;2)采用可配置数据通路的逻辑门和动态逻辑加法器设计低功耗CNM电路;3)提出了一种具有输出静止工作流的广播机制,以减少内存访问。针对四种不同应用的RRAM-SRAM CNM架构和数据流在28纳米技术下进行了评估,实现了4.62-TOPS $/$ W的能效和1.20 mb $/$ mm2的存储密度,与之前的工作相比,分别提高了11.35 $ ~ 25.81 $和1.44 $ ~ 4.92 $。
{"title":"A High-Density Energy-Efficient CNM Macro Using Hybrid RRAM and SRAM for Memory-Bound Applications","authors":"Jun Wang;Shengzhe Yan;Xiangqu Fu;Zhihang Qian;Zhi Li;Zeyu Guo;Zhuoyu Dai;Zhaori Cong;Chunmeng Dou;Feng Zhang;Jinshan Yue;Dashan Shang","doi":"10.1109/TVLSI.2025.3576889","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3576889","url":null,"abstract":"The big data era has facilitated various memory-centric algorithms, such as the Transformer decoder, neural network, stochastic computing (SC), and genetic sequence matching, which impose high demands on memory capacity, bandwidth, and access power consumption. The emerging nonvolatile memory devices and compute-near-memory (CNM) architecture offer a promising solution for memory-bound tasks. This work proposes a hybrid resistive random access memory (RRAM) and static random access memory (SRAM) CNM architecture. The main contributions include: 1) proposing an energy-efficient and high-density CNM architecture based on the hybrid integration of RRAM and SRAM arrays; 2) designing low-power CNM circuits using the logic gates and dynamic-logic adder with configurable datapath; and 3) proposing a broadcast mechanism with output-stationary workflow to reduce memory access. The proposed RRAM-SRAM CNM architecture and dataflow tailored for four distinct applications are evaluated at a 28-nm technology, achieving 4.62-TOPS<inline-formula> <tex-math>$/$ </tex-math></inline-formula>W energy efficiency and 1.20-Mb<inline-formula> <tex-math>$/$ </tex-math></inline-formula>mm<sup>2</sup> memory density, which shows <inline-formula> <tex-math>$11.35times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$25.81times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$1.44times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$4.92times $ </tex-math></inline-formula> improvement compared to previous works, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2339-2343"},"PeriodicalIF":2.8,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test Primitives: The Unified Notation for Characterizing March Test Sequences 测试原语:描述三月测试序列的统一符号
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-17 DOI: 10.1109/TVLSI.2025.3577448
Ruiqi Zhu;Houjun Wang;Susong Yang;Weikun Xie;Yindong Xiao
March algorithms are essential for detecting functional memory faults, characterized by their linear complexity and adaptability to emerging technologies. However, the increasing complexity of fault types presents significant challenges to existing fault detection models regarding analytical efficiency and adaptability. This article introduces the test primitive (TP), a unified notation that characterizes March test sequences through a novel methodology that decouples fault detection operations from sensitization states. The proposed TP achieves platform independence and seamless integration of fault models, supported by rigorous theoretical proofs. These proofs establish the fundamental properties of the TP in terms of completeness, uniqueness, and conciseness, providing a theoretical foundation that ensures the decoupling method reduces the computational complexity of March algorithm analysis to $O(1)$ . This reduction is analogous to Karnaugh map simplification in digital logic while enabling millisecond-level automated analysis. Experimental results demonstrate that the proposed method significantly enhances both analyzable fault coverage (FC) and detection accuracy, thereby addressing critical limitations of existing fault detection models.
行军算法具有线性复杂性和对新兴技术的适应性,是检测功能性记忆故障的关键。然而,故障类型日益复杂,对现有故障检测模型的分析效率和适应性提出了重大挑战。本文介绍了测试原语(TP),这是一种统一的符号,通过一种新颖的方法来表征March测试序列,该方法将故障检测操作与敏化状态解耦。该方法实现了故障模型的平台无关性和无缝集成,并有严格的理论证明。这些证明建立了TP在完备性、唯一性和简洁性方面的基本性质,为确保解耦方法将March算法分析的计算复杂度降低到$O(1)$提供了理论基础。这种减少类似于数字逻辑中的卡诺地图简化,同时实现毫秒级的自动分析。实验结果表明,该方法显著提高了可分析故障覆盖率(FC)和检测精度,从而解决了现有故障检测模型的关键局限性。
{"title":"Test Primitives: The Unified Notation for Characterizing March Test Sequences","authors":"Ruiqi Zhu;Houjun Wang;Susong Yang;Weikun Xie;Yindong Xiao","doi":"10.1109/TVLSI.2025.3577448","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3577448","url":null,"abstract":"March algorithms are essential for detecting functional memory faults, characterized by their linear complexity and adaptability to emerging technologies. However, the increasing complexity of fault types presents significant challenges to existing fault detection models regarding analytical efficiency and adaptability. This article introduces the test primitive (TP), a unified notation that characterizes March test sequences through a novel methodology that decouples fault detection operations from sensitization states. The proposed TP achieves platform independence and seamless integration of fault models, supported by rigorous theoretical proofs. These proofs establish the fundamental properties of the TP in terms of completeness, uniqueness, and conciseness, providing a theoretical foundation that ensures the decoupling method reduces the computational complexity of March algorithm analysis to <inline-formula> <tex-math>$O(1)$ </tex-math></inline-formula>. This reduction is analogous to Karnaugh map simplification in digital logic while enabling millisecond-level automated analysis. Experimental results demonstrate that the proposed method significantly enhances both analyzable fault coverage (FC) and detection accuracy, thereby addressing critical limitations of existing fault detection models.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2542-2555"},"PeriodicalIF":3.1,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Design Space Exploration for the BOOM Using SAC-Based Reinforcement Learning 基于sac强化学习的BOOM高效设计空间探索
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-17 DOI: 10.1109/TVLSI.2025.3572799
Mingjun Cheng;Shihan Zhang;Xin Zheng;Xian Lin;Huaien Gao;Shuting Cai;Xiaoming Xiong;Bei Yu
Design space exploration (DSE) is crucial for optimizing the performance, power, and area (PPA) of CPU microarchitectures ( $mu $ -archs). While various machine learning (ML) algorithms have been applied to the $mu $ -arch DSE problem, the potential of reinforcement learning (RL) remains underexplored. In this article, we propose a novel RL-based approach to address the reduced instruction set computer V (RISC-V) CPU $mu $ -arch DSE problem. This approach enables dynamic selection and optimization of $mu $ -arch parameters without relying on predefined modification sequences, thus significantly enhancing exploration flexibility. To address the challenges posed by high-dimensional action spaces and sparse rewards, we use a discrete soft actor-critic (SAC) framework with entropy maximization to promote efficient exploration. In addition, we integrate multistep temporal-difference (TD) learning, an experience replay (ER) buffer, and return normalization to improve sample efficiency and learning stability during training. Our method further aligns optimization with user-defined preferences by normalizing PPA metrics relative to baseline designs. Experimental results on the Berkeley out-of-order machine (BOOM) demonstrate that the proposed approach achieves superior performance compared with state-of-the-art methods, showcasing its effectiveness and efficiency for $mu $ -arch DSE. Our code is available at https://github.com/exhaust-create/SAC-DSE.
设计空间探索(DSE)对于优化CPU微架构($mu $ -arch)的性能、功耗和面积(PPA)至关重要。虽然各种机器学习(ML)算法已应用于$mu $ -arch DSE问题,但强化学习(RL)的潜力仍未得到充分开发。在本文中,我们提出了一种新的基于rl的方法来解决精简指令集计算机V (RISC-V) CPU $mu $ -arch DSE问题。该方法可以在不依赖于预定义修改序列的情况下动态选择和优化$mu $ -arch参数,从而大大提高了勘探的灵活性。为了解决高维行动空间和稀疏奖励带来的挑战,我们使用具有熵最大化的离散软行为者评论家(SAC)框架来促进有效的探索。此外,我们整合了多步时间差(TD)学习、经验回放(ER)缓冲和返回归一化,以提高训练过程中的样本效率和学习稳定性。我们的方法通过规范化相对于基线设计的PPA指标,进一步使优化与用户定义的首选项保持一致。在伯克利乱序机(BOOM)上的实验结果表明,与现有的方法相比,该方法取得了更好的性能,证明了该方法对$mu $ -arch DSE的有效性和效率。我们的代码可在https://github.com/exhaust-create/SAC-DSE上获得。
{"title":"Efficient Design Space Exploration for the BOOM Using SAC-Based Reinforcement Learning","authors":"Mingjun Cheng;Shihan Zhang;Xin Zheng;Xian Lin;Huaien Gao;Shuting Cai;Xiaoming Xiong;Bei Yu","doi":"10.1109/TVLSI.2025.3572799","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3572799","url":null,"abstract":"Design space exploration (DSE) is crucial for optimizing the performance, power, and area (PPA) of CPU microarchitectures (<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>-archs). While various machine learning (ML) algorithms have been applied to the <inline-formula> <tex-math>$mu $ </tex-math></inline-formula>-arch DSE problem, the potential of reinforcement learning (RL) remains underexplored. In this article, we propose a novel RL-based approach to address the reduced instruction set computer V (RISC-V) CPU <inline-formula> <tex-math>$mu $ </tex-math></inline-formula>-arch DSE problem. This approach enables dynamic selection and optimization of <inline-formula> <tex-math>$mu $ </tex-math></inline-formula>-arch parameters without relying on predefined modification sequences, thus significantly enhancing exploration flexibility. To address the challenges posed by high-dimensional action spaces and sparse rewards, we use a discrete soft actor-critic (SAC) framework with entropy maximization to promote efficient exploration. In addition, we integrate multistep temporal-difference (TD) learning, an experience replay (ER) buffer, and return normalization to improve sample efficiency and learning stability during training. Our method further aligns optimization with user-defined preferences by normalizing PPA metrics relative to baseline designs. Experimental results on the Berkeley out-of-order machine (BOOM) demonstrate that the proposed approach achieves superior performance compared with state-of-the-art methods, showcasing its effectiveness and efficiency for <inline-formula> <tex-math>$mu $ </tex-math></inline-formula>-arch DSE. Our code is available at <uri>https://github.com/exhaust-create/SAC-DSE</uri>.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2252-2263"},"PeriodicalIF":2.8,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144702123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1