首页 > 最新文献

IEEE Journal on Emerging and Selected Topics in Circuits and Systems最新文献

英文 中文
Guest Editorial Generative Artificial Intelligence Compute: Algorithms, Implementations, and Applications to CAS 生成人工智能计算:算法、实现和在CAS中的应用
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-25 DOI: 10.1109/JETCAS.2025.3572258
Chuan Zhang;Naigang Wang;Jongsun Park;Li Zhang
{"title":"Guest Editorial Generative Artificial Intelligence Compute: Algorithms, Implementations, and Applications to CAS","authors":"Chuan Zhang;Naigang Wang;Jongsun Park;Li Zhang","doi":"10.1109/JETCAS.2025.3572258","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3572258","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"144-148"},"PeriodicalIF":3.7,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11050018","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative AI Through CAS Lens: An Integrated Overview of Algorithmic Optimizations, Architectural Advances, and Automated Designs 通过CAS镜头生成人工智能:算法优化,架构进步和自动化设计的综合概述
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-06 DOI: 10.1109/JETCAS.2025.3575272
Chuan Zhang;You You;Naigang Wang;Jongsun Park;Li Zhang
Generative artificial intelligence (GenAI) has emerged as a pivotal focus in global innovation agendas, revealing transformative potential that extends beyond technological applications to reshape diverse societal domains. Given the fundamental dependency of GenAI deployment on circuits and systems (CAS), a co-evolutionary approach integrating both technological paradigms becomes imperative. This synergistic framework confronts three interrelated challenges: 1) developing deployment-ready GenAI algorithms, 2) engineering implementation-efficient CAS architectures, and 3) leveraging GenAI for autonomous CAS designs - each representing critical innovations vectors. Given the rapid advancement of GenAI-CAS technologies, a comprehensive synthesis has become an urgent priority across academia and industry. Consequently, this timely review systematically analyzes current advancements, provides integrative perspectives, and identifies emerging research trajectories. This review endeavors to serve both AI and CAS communities, thereby catalyzing an innovation feedback loop: GenAI-optimized CAS architectures in turn accelerate GenAI evolution through algorithm-hardware co-empowerment.
生成式人工智能(GenAI)已成为全球创新议程的关键焦点,它揭示了超越技术应用、重塑不同社会领域的变革潜力。鉴于GenAI部署对电路和系统(CAS)的基本依赖,集成两种技术范式的共同进化方法变得势在必行。这种协同框架面临着三个相互关联的挑战:1)开发可部署的GenAI算法,2)工程实现高效的CAS架构,以及3)利用GenAI进行自主CAS设计-每个都代表着关键的创新向量。鉴于GenAI-CAS技术的快速发展,综合合成已成为学术界和工业界的当务之急。因此,这篇及时的评论系统地分析了当前的进展,提供了综合的观点,并确定了新兴的研究轨迹。本综述努力为人工智能和CAS社区服务,从而催化创新反馈循环:通过算法-硬件协同授权,优化了GenAI的CAS架构反过来加速了GenAI的进化。
{"title":"Generative AI Through CAS Lens: An Integrated Overview of Algorithmic Optimizations, Architectural Advances, and Automated Designs","authors":"Chuan Zhang;You You;Naigang Wang;Jongsun Park;Li Zhang","doi":"10.1109/JETCAS.2025.3575272","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3575272","url":null,"abstract":"Generative artificial intelligence (GenAI) has emerged as a pivotal focus in global innovation agendas, revealing transformative potential that extends beyond technological applications to reshape diverse societal domains. Given the fundamental dependency of GenAI deployment on circuits and systems (CAS), a co-evolutionary approach integrating both technological paradigms becomes imperative. This synergistic framework confronts three interrelated challenges: 1) developing deployment-ready GenAI algorithms, 2) engineering implementation-efficient CAS architectures, and 3) leveraging GenAI for autonomous CAS designs - each representing critical innovations vectors. Given the rapid advancement of GenAI-CAS technologies, a comprehensive synthesis has become an urgent priority across academia and industry. Consequently, this timely review systematically analyzes current advancements, provides integrative perspectives, and identifies emerging research trajectories. This review endeavors to serve both AI and CAS communities, thereby catalyzing an innovation feedback loop: GenAI-optimized CAS architectures in turn accelerate GenAI evolution through algorithm-hardware co-empowerment.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"149-185"},"PeriodicalIF":3.7,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11024158","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RDDM: A Rate-Distortion Guided Diffusion Model for Learned Image Compression Enhancement RDDM:用于学习图像压缩增强的速率失真引导扩散模型
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-22 DOI: 10.1109/JETCAS.2025.3563228
Sanxin Jiang;Jiro Katto;Heming Sun
Currently, denoising diffusion probability models (DDPM) have achieved significant success in various image generation tasks, but their application in image compression, especially in the context of learned image compression (LIC), is quite limited. In this study, we introduce a rate-distortion (RD) guided diffusion model, referred to as RDDM, to enhance the performance of LIC. In RDDM, LIC is treated as a lossy codec function constrained by RD, dividing the input image into two parts through encoding and decoding operations: the reconstructed image and the residual image. The construction of RDDM is primarily based on two points. First, RDDM treats diffusion models as repositories of image structures and textures, built using extensive real-world datasets. Under the guidance of RD constraints, it extracts and utilizes the necessary structural and textural priors from these repositories to restore the input image. Second, RDDM employs a Bayesian network to progressively infer the input image based on the reconstructed image and its codec function. Additionally, our research reveals that RDDM’s performance declines when its codec function does not match the reconstructed image. However, using the highest bitrate codec function minimizes this performance drop. The resulting model is referred to as $text{RDDM}^{star }$ . The experimental results indicate that both RDDM and $text{RDDM}^{star }$ can be applied to various architectures of LICs, such as CNN, Transformer, and their hybrid. They can significantly improve the fidelity of these codecs while maintaining or even enhancing perceptual quality to some extent.
目前,去噪扩散概率模型(DDPM)在各种图像生成任务中取得了显著的成功,但其在图像压缩中的应用,特别是在学习图像压缩(LIC)中的应用非常有限。在这项研究中,我们引入了一种速率失真(RD)引导的扩散模型,简称RDDM,以提高LIC的性能。在RDDM中,LIC被视为受RD约束的有损编解码函数,通过编解码操作将输入图像分成重构图像和残差图像两部分。RDDM的构建主要基于两点。首先,RDDM将扩散模型视为图像结构和纹理的存储库,使用广泛的现实世界数据集构建。在RD约束的指导下,从这些存储库中提取并利用必要的结构和纹理先验来恢复输入图像。其次,RDDM基于重构图像及其编解码功能,采用贝叶斯网络逐步推断输入图像。此外,我们的研究表明,当其编解码功能与重建图像不匹配时,RDDM的性能会下降。然而,使用最高比特率编解码器功能可以最大限度地减少这种性能下降。生成的模型被称为$text{RDDM}^{star}$。实验结果表明,RDDM和$text{RDDM}^{star}$都可以应用于各种结构的lic,如CNN、Transformer及其混合结构。它们可以显著提高编解码器的保真度,同时在一定程度上保持甚至增强感知质量。
{"title":"RDDM: A Rate-Distortion Guided Diffusion Model for Learned Image Compression Enhancement","authors":"Sanxin Jiang;Jiro Katto;Heming Sun","doi":"10.1109/JETCAS.2025.3563228","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3563228","url":null,"abstract":"Currently, denoising diffusion probability models (DDPM) have achieved significant success in various image generation tasks, but their application in image compression, especially in the context of learned image compression (LIC), is quite limited. In this study, we introduce a rate-distortion (RD) guided diffusion model, referred to as RDDM, to enhance the performance of LIC. In RDDM, LIC is treated as a lossy codec function constrained by RD, dividing the input image into two parts through encoding and decoding operations: the reconstructed image and the residual image. The construction of RDDM is primarily based on two points. First, RDDM treats diffusion models as repositories of image structures and textures, built using extensive real-world datasets. Under the guidance of RD constraints, it extracts and utilizes the necessary structural and textural priors from these repositories to restore the input image. Second, RDDM employs a Bayesian network to progressively infer the input image based on the reconstructed image and its codec function. Additionally, our research reveals that RDDM’s performance declines when its codec function does not match the reconstructed image. However, using the highest bitrate codec function minimizes this performance drop. The resulting model is referred to as <inline-formula> <tex-math>$text{RDDM}^{star }$ </tex-math></inline-formula>. The experimental results indicate that both RDDM and <inline-formula> <tex-math>$text{RDDM}^{star }$ </tex-math></inline-formula> can be applied to various architectures of LICs, such as CNN, Transformer, and their hybrid. They can significantly improve the fidelity of these codecs while maintaining or even enhancing perceptual quality to some extent.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"186-199"},"PeriodicalIF":3.7,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Flexible Template for Edge Generative AI With High-Accuracy Accelerated Softmax and GELU 基于高精度加速Softmax和GELU的边缘生成人工智能灵活模板
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-21 DOI: 10.1109/JETCAS.2025.3562734
Andrea Belano;Yvan Tortorella;Angelo Garofalo;Luca Benini;Davide Rossi;Francesco Conti
Transformer-based generative Artificial Intelligence (GenAI) models achieve remarkable results in a wide range of fields, including natural language processing, computer vision, and audio processing. However, this comes at the cost of increased complexity and the need of sophisticated non-linearities such as softmax and GELU. Even if Transformers are computationally dominated by matrix multiplications (MatMul), these non-linearities can become a performance bottleneck, especially if dedicated hardware is used to accelerate MatMul operators. In this work, we introduce a GenAI BFloat16 Transformer acceleration template based on a heterogeneous tightly-coupled cluster containing 256KiB of shared SRAM, 8 general-purpose RISC-V cores, a $24times 8$ systolic array MatMul accelerator, and a novel accelerator for Transformer softmax, GELU and SiLU non-linearities: SoftEx. SoftEx introduces an approximate exponentiation algorithm balancing efficiency ( $121times $ speedup over glibc’s implementation) with accuracy (mean relative error of 0.14%). In 12 nm technology, SoftEx occupies 0.039 mm2, only 3.22% of the cluster, which achieves an operating frequency of 1.12 GHz. Compared to optimized software running on the RISC-V cores, SoftEx achieves significant improvements, accelerating softmax and GELU computations by up to $10.8times $ and $5.11times $ , respectively, while reducing their energy consumption by up to $10.8times $ and $5.29times $ . These enhancements translate into a $1.58times $ increase in throughput (310 GOPS at 0.8 V) and a $1.42times $ improvement in energy efficiency (1.34 TOPS/W at 0.55 V) on end-to-end ViT inference workloads.
基于变压器的生成式人工智能(GenAI)模型在包括自然语言处理、计算机视觉和音频处理在内的广泛领域取得了显著的成果。然而,这是以增加复杂性和需要复杂的非线性(如softmax和GELU)为代价的。即使transformer在计算上由矩阵乘法(MatMul)主导,这些非线性也可能成为性能瓶颈,特别是如果使用专用硬件来加速MatMul运算符。在这项工作中,我们介绍了一个基于异构紧密耦合集群的GenAI BFloat16 Transformer加速模板,该集群包含256KiB的共享SRAM, 8个通用RISC-V内核,一个$24 × 8$收缩阵列MatMul加速器,以及一个用于Transformer softmax, GELU和SiLU非线性的新型加速器:SoftEx。SoftEx引入了一种近似的指数算法来平衡效率(比glibc的实现加速121倍)和精度(平均相对误差为0.14%)。在12nm技术中,SoftEx占据了0.039 mm2,仅占集群的3.22%,实现了1.12 GHz的工作频率。与运行在RISC-V内核上的优化软件相比,SoftEx实现了显著的改进,分别将softmax和GELU的计算速度提高了10.8倍和5.11倍,同时将能耗降低了10.8倍和5.29倍。这些增强转化为端到端ViT推断工作负载的吞吐量增加1.58美元(0.8 V时310 GOPS),能效提高1.42美元(0.55 V时1.34 TOPS/W)。
{"title":"A Flexible Template for Edge Generative AI With High-Accuracy Accelerated Softmax and GELU","authors":"Andrea Belano;Yvan Tortorella;Angelo Garofalo;Luca Benini;Davide Rossi;Francesco Conti","doi":"10.1109/JETCAS.2025.3562734","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3562734","url":null,"abstract":"Transformer-based generative Artificial Intelligence (GenAI) models achieve remarkable results in a wide range of fields, including natural language processing, computer vision, and audio processing. However, this comes at the cost of increased complexity and the need of sophisticated non-linearities such as softmax and GELU. Even if Transformers are computationally dominated by matrix multiplications (MatMul), these non-linearities can become a performance bottleneck, especially if dedicated hardware is used to accelerate MatMul operators. In this work, we introduce a GenAI BFloat16 Transformer acceleration template based on a heterogeneous tightly-coupled cluster containing 256KiB of shared SRAM, 8 general-purpose RISC-V cores, a <inline-formula> <tex-math>$24times 8$ </tex-math></inline-formula> systolic array MatMul accelerator, and a novel accelerator for Transformer softmax, GELU and SiLU non-linearities: SoftEx. SoftEx introduces an approximate exponentiation algorithm balancing efficiency (<inline-formula> <tex-math>$121times $ </tex-math></inline-formula> speedup over glibc’s implementation) with accuracy (mean relative error of 0.14%). In 12 nm technology, SoftEx occupies 0.039 mm<sup>2</sup>, only 3.22% of the cluster, which achieves an operating frequency of 1.12 GHz. Compared to optimized software running on the RISC-V cores, SoftEx achieves significant improvements, accelerating softmax and GELU computations by up to <inline-formula> <tex-math>$10.8times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$5.11times $ </tex-math></inline-formula>, respectively, while reducing their energy consumption by up to <inline-formula> <tex-math>$10.8times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$5.29times $ </tex-math></inline-formula>. These enhancements translate into a <inline-formula> <tex-math>$1.58times $ </tex-math></inline-formula> increase in throughput (310 GOPS at 0.8 V) and a <inline-formula> <tex-math>$1.42times $ </tex-math></inline-formula> improvement in energy efficiency (1.34 TOPS/W at 0.55 V) on end-to-end ViT inference workloads.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"200-216"},"PeriodicalIF":3.7,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Two-Range Quantization and Hardware Co-Design for Large Language Model Acceleration 大型语言模型加速的自适应双量程量化与硬件协同设计
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-21 DOI: 10.1109/JETCAS.2025.3562937
Siqi Cai;Gang Wang;Wenjie Li;Dongxu Lyu;Guanghui He
Large language models (LLMs) face high computational and memory demands. While prior studies have leveraged quantization to reduce memory requirements, critical challenges persist: unaligned memory accesses, significant quantization errors when handling outliers that span larger quantization ranges, and the increased hardware overhead associated with processing high-bit-width outliers. To address these issues, we propose a quantization algorithm and hardware architecture co-design for efficient LLM acceleration. Algorithmically, a grouped adaptive two-range quantization (ATRQ) with an in-group embedded identifier is proposed to encode outliers and normal values in distinct ranges, achieving hardware-friendly aligned memory access and reducing quantization errors. From a hardware perspective, we develop a low-overhead ATRQ decoder and an outlier-bit-split processing element (PE) to reduce the hardware overhead associated with high-bit-width outliers, effectively leveraging their inherent sparsity. To support mixed-precision computation and accommodate diverse dataflows during the prefilling and decoding phases, we design a reconfigurable local accumulator that mitigates the overhead associated with additional adders. Experimental results show that the ATRQ-based accelerator outperforms existing solutions, achieving up to $2.48times $ speedup and $2.01times $ energy reduction in LLM prefilling phase, and $1.87times $ speedup and $2.03times $ energy reduction in the decoding phase, with superior model performance under post-training quantization.
大型语言模型(llm)面临着高计算和内存需求。虽然以前的研究已经利用量化来减少内存需求,但关键的挑战仍然存在:未对齐的内存访问,处理跨越更大量化范围的异常值时显着的量化错误,以及与处理高位宽异常值相关的硬件开销增加。为了解决这些问题,我们提出了一种量化算法和硬件架构协同设计,以实现高效的LLM加速。在算法上,提出了一种组内嵌入标识符的自适应双量程量化(ATRQ)方法,对不同范围内的离群值和正态值进行编码,实现了硬件友好的对齐存储器访问,减少了量化误差。从硬件的角度来看,我们开发了一个低开销的ATRQ解码器和一个离群值位分割处理元件(PE),以减少与高位宽离群值相关的硬件开销,有效地利用其固有的稀疏性。为了支持混合精度计算,并在预填充和解码阶段适应不同的数据流,我们设计了一个可重构的局部累加器,以减轻与附加加法器相关的开销。实验结果表明,基于atrq的加速方案优于现有方案,在LLM预填充阶段加速高达$2.48倍,能量降低$2.01倍,在解码阶段加速高达$1.87倍,能量降低$2.03倍,在训练后量化下模型性能优越。
{"title":"Adaptive Two-Range Quantization and Hardware Co-Design for Large Language Model Acceleration","authors":"Siqi Cai;Gang Wang;Wenjie Li;Dongxu Lyu;Guanghui He","doi":"10.1109/JETCAS.2025.3562937","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3562937","url":null,"abstract":"Large language models (LLMs) face high computational and memory demands. While prior studies have leveraged quantization to reduce memory requirements, critical challenges persist: unaligned memory accesses, significant quantization errors when handling outliers that span larger quantization ranges, and the increased hardware overhead associated with processing high-bit-width outliers. To address these issues, we propose a quantization algorithm and hardware architecture co-design for efficient LLM acceleration. Algorithmically, a grouped adaptive two-range quantization (ATRQ) with an in-group embedded identifier is proposed to encode outliers and normal values in distinct ranges, achieving hardware-friendly aligned memory access and reducing quantization errors. From a hardware perspective, we develop a low-overhead ATRQ decoder and an outlier-bit-split processing element (PE) to reduce the hardware overhead associated with high-bit-width outliers, effectively leveraging their inherent sparsity. To support mixed-precision computation and accommodate diverse dataflows during the prefilling and decoding phases, we design a reconfigurable local accumulator that mitigates the overhead associated with additional adders. Experimental results show that the ATRQ-based accelerator outperforms existing solutions, achieving up to <inline-formula> <tex-math>$2.48times $ </tex-math></inline-formula> speedup and <inline-formula> <tex-math>$2.01times $ </tex-math></inline-formula> energy reduction in LLM prefilling phase, and <inline-formula> <tex-math>$1.87times $ </tex-math></inline-formula> speedup and <inline-formula> <tex-math>$2.03times $ </tex-math></inline-formula> energy reduction in the decoding phase, with superior model performance under post-training quantization.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"272-284"},"PeriodicalIF":3.7,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Overview of Neural Rendering Accelerators: Challenges, Trends, and Future Directions 神经渲染加速器概述:挑战、趋势和未来方向
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-17 DOI: 10.1109/JETCAS.2025.3561777
Junha Ryu;Hoi-Jun Yoo
Rapid advancements in neural rendering have revolutionized the fields of augmented reality (AR) and virtual reality (VR) by enabling photorealistic 3D modeling and rendering. However, deploying neural rendering on edge devices presents significant challenges due to computational complexity, memory inefficiencies, and energy constraints. This paper provides a comprehensive overview of neural rendering accelerators, identifying the major hardware inefficiencies across sampling, positional encoding, and multi-layer perception (MLP) stages. We explore hardware-software co-optimization techniques that address these challenges and provide a summary for in-depth analysis. Additionally, emerging trends like 3D Gaussian Splatting (3DGS) and hybrid rendering approaches are briefly introduced, highlighting their potential to improve rendering quality and efficiency. By presenting a unified analysis of challenges, solutions, and future directions, this work aims to guide the development of next-generation neural rendering accelerators, especially for resource-constrained environments.
神经渲染的快速发展通过实现逼真的3D建模和渲染,彻底改变了增强现实(AR)和虚拟现实(VR)领域。然而,由于计算复杂性、内存效率低下和能量限制,在边缘设备上部署神经渲染存在重大挑战。本文提供了神经渲染加速器的全面概述,确定了采样,位置编码和多层感知(MLP)阶段的主要硬件效率低下。我们将探讨解决这些挑战的软硬件协同优化技术,并为深入分析提供总结。此外,还简要介绍了3D高斯喷溅(3DGS)和混合渲染方法等新兴趋势,强调了它们提高渲染质量和效率的潜力。通过对挑战、解决方案和未来方向的统一分析,本工作旨在指导下一代神经渲染加速器的开发,特别是在资源受限的环境中。
{"title":"An Overview of Neural Rendering Accelerators: Challenges, Trends, and Future Directions","authors":"Junha Ryu;Hoi-Jun Yoo","doi":"10.1109/JETCAS.2025.3561777","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3561777","url":null,"abstract":"Rapid advancements in neural rendering have revolutionized the fields of augmented reality (AR) and virtual reality (VR) by enabling photorealistic 3D modeling and rendering. However, deploying neural rendering on edge devices presents significant challenges due to computational complexity, memory inefficiencies, and energy constraints. This paper provides a comprehensive overview of neural rendering accelerators, identifying the major hardware inefficiencies across sampling, positional encoding, and multi-layer perception (MLP) stages. We explore hardware-software co-optimization techniques that address these challenges and provide a summary for in-depth analysis. Additionally, emerging trends like 3D Gaussian Splatting (3DGS) and hybrid rendering approaches are briefly introduced, highlighting their potential to improve rendering quality and efficiency. By presenting a unified analysis of challenges, solutions, and future directions, this work aims to guide the development of next-generation neural rendering accelerators, especially for resource-constrained environments.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"299-311"},"PeriodicalIF":3.7,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10967345","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LightRot: A Light-Weighted Rotation Scheme and Architecture for Accurate Low-Bit Large Language Model Inference LightRot:用于精确低比特大语言模型推理的轻量级旋转方案和体系结构
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-08 DOI: 10.1109/JETCAS.2025.3558300
Sangjin Kim;Yuseon Choi;Jungjun Oh;Byeongcheol Kim;Hoi-Jun Yoo
As large language models (LLMs) continue to demonstrate exceptional capabilities across various domains, the challenge of achieving energy-efficient and accurate inference becomes increasingly critical. This work presents LightRot, a lightweight rotation scheme and dedicated hardware accelerator designed for low-bit LLM inference. The proposed architecture integrates Grouped Local Rotation (GLR) and Outlier Direction Aligning (ODA) algorithms with a hierarchical Fast Hadamard Transform (FHT)-based rotation unit to address key challenges in low-bit quantization, including the energy overhead of rotation operations. The proposed accelerator, implemented in a 28nm CMOS process, achieves a peak energy efficiency of 27.4TOPS/W for 4-bit inference, surpassing prior state-of-the-art designs. Unlike conventional approaches that rely on higher-precision inference or evaluate on basic language modeling tasks like GPT-2, LightRot is optimized for advanced models such as LLaMA2-13B and LLaMA3-8B. Its performance is further validated on MT-Bench, demonstrating robust applicability to real-world conversational scenarios and redefining benchmarks for chat-based AI systems. By synergizing algorithmic innovations and hardware efficiency, this work sets a new paradigm for scalable, low-bit LLM inference, paving the way for sustainable AI advancements.
随着大型语言模型(llm)在各个领域不断展示出卓越的能力,实现节能和准确推理的挑战变得越来越重要。这项工作提出了LightRot,一个轻量级的旋转方案和专用硬件加速器,专为低比特LLM推理而设计。该架构将分组局部旋转(GLR)和离群方向对齐(ODA)算法与基于分层快速哈达玛变换(FHT)的旋转单元集成在一起,以解决低比特量化的关键挑战,包括旋转操作的能量开销。该加速器采用28nm CMOS工艺,在4位推理中实现了27.4TOPS/W的峰值能量效率,超过了之前最先进的设计。与依赖于更高精度推理或评估基本语言建模任务(如GPT-2)的传统方法不同,LightRot针对高级模型(如LLaMA2-13B和LLaMA3-8B)进行了优化。在MT-Bench上进一步验证了其性能,展示了对现实世界会话场景的强大适用性,并重新定义了基于聊天的人工智能系统的基准。通过协同算法创新和硬件效率,这项工作为可扩展的低比特LLM推理设定了一个新的范例,为可持续的人工智能发展铺平了道路。
{"title":"LightRot: A Light-Weighted Rotation Scheme and Architecture for Accurate Low-Bit Large Language Model Inference","authors":"Sangjin Kim;Yuseon Choi;Jungjun Oh;Byeongcheol Kim;Hoi-Jun Yoo","doi":"10.1109/JETCAS.2025.3558300","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3558300","url":null,"abstract":"As large language models (LLMs) continue to demonstrate exceptional capabilities across various domains, the challenge of achieving energy-efficient and accurate inference becomes increasingly critical. This work presents LightRot, a lightweight rotation scheme and dedicated hardware accelerator designed for low-bit LLM inference. The proposed architecture integrates Grouped Local Rotation (GLR) and Outlier Direction Aligning (ODA) algorithms with a hierarchical Fast Hadamard Transform (FHT)-based rotation unit to address key challenges in low-bit quantization, including the energy overhead of rotation operations. The proposed accelerator, implemented in a 28nm CMOS process, achieves a peak energy efficiency of 27.4TOPS/W for 4-bit inference, surpassing prior state-of-the-art designs. Unlike conventional approaches that rely on higher-precision inference or evaluate on basic language modeling tasks like GPT-2, LightRot is optimized for advanced models such as LLaMA2-13B and LLaMA3-8B. Its performance is further validated on MT-Bench, demonstrating robust applicability to real-world conversational scenarios and redefining benchmarks for chat-based AI systems. By synergizing algorithmic innovations and hardware efficiency, this work sets a new paradigm for scalable, low-bit LLM inference, paving the way for sustainable AI advancements.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"231-243"},"PeriodicalIF":3.7,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Hardware Architecture Design for Rotary Position Embedding of Large Language Models 大型语言模型旋转位置嵌入的高效硬件架构设计
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-31 DOI: 10.1109/JETCAS.2025.3556443
Wenjie Li;Gang Wang;Dongxu Lyu;Ningyi Xu;Guanghui He
Due to the substantial demands of storage and computation imposed by large language models (LLMs), there has been a surge of research interest in their hardware acceleration. As a technique involving non-linear operations, rotary position embedding (RoPE) has been adopted by some recently released LLMs. However, there is currently no reported research on its hardware design. This paper, for the first time, presents an efficient hardware architecture design for RoPE of LLMs. We first explore the similarities between RoPE and the coordinate rotation digital computer (CORDIC) algorithm, while also considering the commonly used quantization scheme for LLMs. Additionally, we propose a hardware-friendly solution to address the issue of excessively large input angle ranges. Then we present a CORDIC-based approximation for RoPE and develop a hardware architecture for it. The experimental results demonstrate that our design can save up to 45.7% area cost and 31.0% power consumption when compared with the fixed-point counterpart, while maintaining almost the same model performance. Compared to the straightforward implementation using floating-point arithmetic, our design can reduce up to 91.4% area cost and 88.9% power consumption, with negligible performance loss.
由于大型语言模型(llm)对存储和计算的巨大需求,对其硬件加速的研究兴趣激增。旋转位置嵌入(RoPE)作为一种涉及非线性操作的技术,已被一些新近发布的llm所采用。然而,目前还没有关于其硬件设计的研究报道。本文首次提出了一种高效的llm的硬件架构设计。我们首先探讨了RoPE和坐标旋转数字计算机(CORDIC)算法之间的相似性,同时也考虑了llm常用的量化方案。此外,我们提出了一个硬件友好的解决方案,以解决过大的输入角度范围的问题。然后,我们提出了基于cordic的RoPE近似算法,并为其开发了硬件体系结构。实验结果表明,在保持基本相同的模型性能的情况下,我们的设计与定点设计相比,可以节省45.7%的面积成本和31.0%的功耗。与使用浮点运算的直接实现相比,我们的设计可以减少高达91.4%的面积成本和88.9%的功耗,而性能损失可以忽略不计。
{"title":"Efficient Hardware Architecture Design for Rotary Position Embedding of Large Language Models","authors":"Wenjie Li;Gang Wang;Dongxu Lyu;Ningyi Xu;Guanghui He","doi":"10.1109/JETCAS.2025.3556443","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3556443","url":null,"abstract":"Due to the substantial demands of storage and computation imposed by large language models (LLMs), there has been a surge of research interest in their hardware acceleration. As a technique involving non-linear operations, rotary position embedding (RoPE) has been adopted by some recently released LLMs. However, there is currently no reported research on its hardware design. This paper, for the first time, presents an efficient hardware architecture design for RoPE of LLMs. We first explore the similarities between RoPE and the coordinate rotation digital computer (CORDIC) algorithm, while also considering the commonly used quantization scheme for LLMs. Additionally, we propose a hardware-friendly solution to address the issue of excessively large input angle ranges. Then we present a CORDIC-based approximation for RoPE and develop a hardware architecture for it. The experimental results demonstrate that our design can save up to 45.7% area cost and 31.0% power consumption when compared with the fixed-point counterpart, while maintaining almost the same model performance. Compared to the straightforward implementation using floating-point arithmetic, our design can reduce up to 91.4% area cost and 88.9% power consumption, with negligible performance loss.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"244-257"},"PeriodicalIF":3.7,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
F3: An FPGA-Based Transformer Fine-Tuning Accelerator With Flexible Floating Point Format F3:基于fpga的灵活浮点格式变压器微调加速器
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-31 DOI: 10.1109/JETCAS.2025.3555970
Zerong He;Xi Jin;Zhongguang Xu
Transformers have demonstrated remarkable success across various deep learning tasks. However, their inference and fine-tuning require substantial computation and memory resources, posing challenges for existing hardware platforms, particularly resource-constrained edge devices. To address these limitations, we propose F3, an FPGA-based accelerator for transformer fine-tuning. To reduce computation and memory overhead, this paper proposes a flexible floating point (FFP) format which consumes fewer resources than traditional floating-point formats of the same bitwidth. We adapt low-rank adaptation to FFP format and propose a fine-tuning strategy named LR-FFP which reduces the number of trainable parameters without compromising fine-tuning accuracy. At the hardware level, we design specialized processing elements (PEs) for the FFP format. The PE maximizes the utilization of DSP resources, enabling a single DSP to perform two multiply-accumulate operations per cycle. The PEs are organized into a systolic array (SA) to efficiently handle general matrix multiplication during fine-tuning. Through theoretical analysis and experimental evaluation, we determine the optimal dataflow and SA parameters to balance performance and resource consumption. We implement the architecture on the Xilinx VCU128 FPGA platform and F3 achieves a performance of 8.2 TFlops at 250 MHz. Compared with CPU and GPU implementations, F3 achieves speedups of $15.22 times $ and $3.44 times $ , respectively, and energy efficiency improvements of $70.52 times $ and $9.44 times $ .
变形金刚在各种深度学习任务中取得了显著的成功。然而,它们的推理和微调需要大量的计算和内存资源,对现有的硬件平台,特别是资源受限的边缘设备提出了挑战。为了解决这些限制,我们提出了F3,一种基于fpga的变压器微调加速器。为了减少计算和内存开销,本文提出了一种灵活的浮点(FFP)格式,该格式比相同位宽的传统浮点格式消耗更少的资源。我们对FFP格式进行了低秩自适应,并提出了一种名为LR-FFP的微调策略,该策略在不影响微调精度的情况下减少了可训练参数的数量。在硬件层面,我们为FFP格式设计了专门的处理元素(pe)。PE最大限度地利用了DSP资源,使单个DSP每个周期可以执行两次乘法累加操作。pe被组织成一个收缩数组(SA),以便在微调期间有效地处理一般的矩阵乘法。通过理论分析和实验评估,我们确定了最优的数据流和SA参数,以平衡性能和资源消耗。我们在Xilinx VCU128 FPGA平台上实现了该架构,F3在250 MHz时实现了8.2 TFlops的性能。与CPU和GPU实现相比,F3分别实现了15.22倍和3.44倍的速度提升,以及70.52倍和9.44倍的能效提升。
{"title":"F3: An FPGA-Based Transformer Fine-Tuning Accelerator With Flexible Floating Point Format","authors":"Zerong He;Xi Jin;Zhongguang Xu","doi":"10.1109/JETCAS.2025.3555970","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3555970","url":null,"abstract":"Transformers have demonstrated remarkable success across various deep learning tasks. However, their inference and fine-tuning require substantial computation and memory resources, posing challenges for existing hardware platforms, particularly resource-constrained edge devices. To address these limitations, we propose F<sup>3</sup>, an FPGA-based accelerator for transformer fine-tuning. To reduce computation and memory overhead, this paper proposes a flexible floating point (FFP) format which consumes fewer resources than traditional floating-point formats of the same bitwidth. We adapt low-rank adaptation to FFP format and propose a fine-tuning strategy named LR-FFP which reduces the number of trainable parameters without compromising fine-tuning accuracy. At the hardware level, we design specialized processing elements (PEs) for the FFP format. The PE maximizes the utilization of DSP resources, enabling a single DSP to perform two multiply-accumulate operations per cycle. The PEs are organized into a systolic array (SA) to efficiently handle general matrix multiplication during fine-tuning. Through theoretical analysis and experimental evaluation, we determine the optimal dataflow and SA parameters to balance performance and resource consumption. We implement the architecture on the Xilinx VCU128 FPGA platform and F<sup>3</sup> achieves a performance of 8.2 TFlops at 250 MHz. Compared with CPU and GPU implementations, F<sup>3</sup> achieves speedups of <inline-formula> <tex-math>$15.22 times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$3.44 times $ </tex-math></inline-formula>, respectively, and energy efficiency improvements of <inline-formula> <tex-math>$70.52 times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$9.44 times $ </tex-math></inline-formula>.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"258-271"},"PeriodicalIF":3.7,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Die-Level Transformation From 2D Shuttle Chips to 3D-IC With TSV for Advanced Rapid Prototyping Methodology With Meta Bonding 采用元键合的先进快速成型方法,从2D穿梭芯片到3d集成电路的TSV模级转换
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-20 DOI: 10.1109/JETCAS.2025.3572003
Takafumi Fukushima;Tetsu Tanaka;Mitsumasa Koyanagi
3D-IC technology, it may be more appropriate to refer to this as TSV (Through-Si Via) formation technology, has been maturing year by year and is increasingly utilized in advanced semiconductor devices, such as 3D CIS (CMOS Image Sensor), HBM (High-Bandwidth Memory), and SRAM-on-CPU (named 3D V-Cache) devices. However, the initial development costs remain prohibitively high, largely due to the substantial investment required for TSV formation at the wafer level. Meanwhile, conventional System on a Chips (SoCs) are transitioning from Fin-FET to GAA (Gate All Around) using the latest beyond 3-nm technology nodes, incorporating extreme ultraviolet (EUV) and other cutting-edge techniques. Meanwhile, the academic community is establishing an environment conducive to the utilization of nodes ranging from legacy 180 nm to 7 nm, making it feasible for designers to obtain 2D IC chips with their novel architectures at a reduced cost. Despite these advancements, foundry shuttle services employing TSV are still almost impossible to utilize, and performing proof of principle and functional verification using 3D-ICs remains extremely challenging. This article introduces recent advancements in technology that can transform 2D-ICs into 3D-ICs using shuttle chips for Multi-Project Wafers (MPWs) at a small scale to a large scale. This article mainly focuses on discussing the facilitation of die-level short-TAT (turnaround time) 3D-IC fabrication with key elemental technologies of multi-chip thinning and TSV/microbump formation. In addition, the effectiveness of Meta Bonding, such as fine-pitch microbump and direct/hybrid bonding, is described for future high-performance 3D-IC prototyping.
3D- ic技术,更合适的说法是TSV (Through-Si Via)形成技术,已经逐年成熟,并越来越多地应用于先进的半导体器件,如3D CIS (CMOS图像传感器)、HBM(高带宽存储器)和SRAM-on-CPU(称为3D V-Cache)器件。然而,最初的开发成本仍然过高,这主要是由于在晶圆一级形成TSV所需的大量投资。与此同时,传统的片上系统(soc)正在使用最新的超3nm技术节点,结合极紫外(EUV)和其他尖端技术,从Fin-FET过渡到GAA (Gate All Around)。与此同时,学术界正在建立一个有利于利用从传统的180纳米到7纳米节点的环境,使设计人员能够以更低的成本获得具有新颖架构的2D IC芯片。尽管取得了这些进步,但采用TSV的代工厂穿梭服务仍然几乎不可能利用,并且使用3d - ic进行原理验证和功能验证仍然极具挑战性。本文介绍了利用多项目晶圆(mpw)的穿梭芯片从小规模到大规模将2d - ic转换为3d - ic的最新技术进展。本文主要讨论了多芯片细化和TSV/微凸点形成的关键基本技术对模级短tat(周转时间)3d集成电路制造的促进作用。此外,Meta键合的有效性,如细间距微碰撞和直接/混合键合,描述了未来高性能3D-IC原型的有效性。
{"title":"Die-Level Transformation From 2D Shuttle Chips to 3D-IC With TSV for Advanced Rapid Prototyping Methodology With Meta Bonding","authors":"Takafumi Fukushima;Tetsu Tanaka;Mitsumasa Koyanagi","doi":"10.1109/JETCAS.2025.3572003","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3572003","url":null,"abstract":"3D-IC technology, it may be more appropriate to refer to this as TSV (Through-Si Via) formation technology, has been maturing year by year and is increasingly utilized in advanced semiconductor devices, such as 3D CIS (CMOS Image Sensor), HBM (High-Bandwidth Memory), and SRAM-on-CPU (named 3D V-Cache) devices. However, the initial development costs remain prohibitively high, largely due to the substantial investment required for TSV formation at the wafer level. Meanwhile, conventional System on a Chips (SoCs) are transitioning from Fin-FET to GAA (Gate All Around) using the latest beyond 3-nm technology nodes, incorporating extreme ultraviolet (EUV) and other cutting-edge techniques. Meanwhile, the academic community is establishing an environment conducive to the utilization of nodes ranging from legacy 180 nm to 7 nm, making it feasible for designers to obtain 2D IC chips with their novel architectures at a reduced cost. Despite these advancements, foundry shuttle services employing TSV are still almost impossible to utilize, and performing proof of principle and functional verification using 3D-ICs remains extremely challenging. This article introduces recent advancements in technology that can transform 2D-ICs into 3D-ICs using shuttle chips for Multi-Project Wafers (MPWs) at a small scale to a large scale. This article mainly focuses on discussing the facilitation of die-level short-TAT (turnaround time) 3D-IC fabrication with key elemental technologies of multi-chip thinning and TSV/microbump formation. In addition, the effectiveness of Meta Bonding, such as fine-pitch microbump and direct/hybrid bonding, is described for future high-performance 3D-IC prototyping.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"415-426"},"PeriodicalIF":3.8,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11007580","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145060958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1