首页 > 最新文献

IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

英文 中文
A High-Performance SCNN Accelerator Using Parallel Sparsity Detection and Index-Oriented Computation Workflow 基于并行稀疏性检测和面向索引计算工作流的高性能SCNN加速器
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-10 DOI: 10.1109/TVLSI.2025.3584657
Yishuo Meng;Jianfei Wang;Qiang Fu;Jia Hou;Siwei Xiang;Ge Li;Chen Yang
The customization of accelerators for sparse convolutional neural networks (SCNNs) has been shown to significantly enhance the computational efficiency of CNNs. However, while processing the widely existing irregularly distributed sparsity in filters and feature maps, serial sparsity detection (SSD) methods and small-capacity computation arrays are always applied in current works. As a result, it is difficult to fully translate the exploitation of sparsity into hardware performance improvement. Therefore, in this article, first, a novel parallel sparsity detection (PSD) scheme is proposed and hardware-implemented to efficiently extract the valid weights and activations. In addition, an index-oriented computation workflow for parallel sparse convolution is also proposed to eliminate the output index diversity during sparse convolutions. With the assistance of the above sparsity detection scheme and computation workflow, a large-scale two-side SCNN accelerator is designed and implemented on the Xilinx VCU118 platform, achieving a runtime frequency of 300 MHz. The evaluation results indicate that this work can achieve 1284.43/1105.31 GOPS performance while deploying VGG16/ResNet-50. Compared to the previous dense-/sparse-based works, this work can achieve a performance enhancement ranging from $1.284times $ to $12.266times $ and a DSP efficiency improvement from $1.718times $ to $6.131times $ . These results highlight the superior ability to translate sparsity exploitation into performance gains.
稀疏卷积神经网络(SCNNs)的加速器定制已被证明可以显著提高cnn的计算效率。然而,在处理滤波器和特征映射中广泛存在的不规则分布稀疏性时,目前的工作通常采用串行稀疏性检测(serial sparsity detection, SSD)方法和小容量计算阵列。因此,很难将稀疏性的利用完全转化为硬件性能的改进。因此,本文首先提出了一种新的并行稀疏度检测方案,并在硬件上实现了该方案,以有效地提取有效的权值和激活值。此外,提出了一种面向索引的并行稀疏卷积计算工作流,以消除稀疏卷积过程中输出索引的多样性。利用上述稀疏度检测方案和计算流程,在Xilinx VCU118平台上设计并实现了大型双侧SCNN加速器,运行频率达到300 MHz。评估结果表明,在部署VGG16/ResNet-50时,该工作可以达到1284.43/1105.31 GOPS性能。与之前基于密集/稀疏的工作相比,这项工作可以实现从1.284times $到12.266times $的性能提升,DSP效率从1.718times $提高到6.131times $。这些结果突出了将稀疏性利用转化为性能提升的卓越能力。
{"title":"A High-Performance SCNN Accelerator Using Parallel Sparsity Detection and Index-Oriented Computation Workflow","authors":"Yishuo Meng;Jianfei Wang;Qiang Fu;Jia Hou;Siwei Xiang;Ge Li;Chen Yang","doi":"10.1109/TVLSI.2025.3584657","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3584657","url":null,"abstract":"The customization of accelerators for sparse convolutional neural networks (SCNNs) has been shown to significantly enhance the computational efficiency of CNNs. However, while processing the widely existing irregularly distributed sparsity in filters and feature maps, serial sparsity detection (SSD) methods and small-capacity computation arrays are always applied in current works. As a result, it is difficult to fully translate the exploitation of sparsity into hardware performance improvement. Therefore, in this article, first, a novel parallel sparsity detection (PSD) scheme is proposed and hardware-implemented to efficiently extract the valid weights and activations. In addition, an index-oriented computation workflow for parallel sparse convolution is also proposed to eliminate the output index diversity during sparse convolutions. With the assistance of the above sparsity detection scheme and computation workflow, a large-scale two-side SCNN accelerator is designed and implemented on the Xilinx VCU118 platform, achieving a runtime frequency of 300 MHz. The evaluation results indicate that this work can achieve 1284.43/1105.31 GOPS performance while deploying VGG16/ResNet-50. Compared to the previous dense-/sparse-based works, this work can achieve a performance enhancement ranging from <inline-formula> <tex-math>$1.284times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$12.266times $ </tex-math></inline-formula> and a DSP efficiency improvement from <inline-formula> <tex-math>$1.718times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$6.131times $ </tex-math></inline-formula>. These results highlight the superior ability to translate sparsity exploitation into performance gains.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2449-2461"},"PeriodicalIF":3.1,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy-Efficient Syndrome Calculation Architecture for BCH Decoders BCH译码器的节能症候群计算架构
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-10 DOI: 10.1109/TVLSI.2025.3585971
Jeongmin Kim;Jaehoon Kwon;Hansol Jeong;In-Cheol Park
Syndrome calculation (SC) is a critical step in Bose-Chaudhuri-Hocquenghem (BCH) decoding, and its computational efficiency significantly impacts the energy consumption of the entire decoder. This article proposes an energy-efficient SC architecture designed for BCH decoders. The proposed architecture fundamentally adopts a remainder-based SC, which consumes less energy than the conventional Horner’s method-based SC unit. Furthermore, unlike previous remainder-based approaches, it uses a minimal polynomial to produce a shorter remainder, leading to reduced computation and improved energy efficiency. Implementation results demonstrate an 80% improvement in energy efficiency compared to the latest Horner’s method-based SC unit and a 35% improvement compared to the previous remainder-based SC unit.
症候群计算(SC)是BCH译码的关键步骤,其计算效率直接影响整个译码器的能耗。本文提出了一种针对BCH解码器的节能SC架构。所提出的架构从根本上采用了基于剩余的SC,它比传统的霍纳基于方法的SC消耗更少的能量。此外,与以前基于余数的方法不同,它使用最小多项式来产生更短的余数,从而减少了计算量并提高了能源效率。实施结果表明,与最新的Horner基于方法的SC单元相比,能效提高了80%,与之前的基于剩余的SC单元相比,能效提高了35%。
{"title":"Energy-Efficient Syndrome Calculation Architecture for BCH Decoders","authors":"Jeongmin Kim;Jaehoon Kwon;Hansol Jeong;In-Cheol Park","doi":"10.1109/TVLSI.2025.3585971","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3585971","url":null,"abstract":"Syndrome calculation (SC) is a critical step in Bose-Chaudhuri-Hocquenghem (BCH) decoding, and its computational efficiency significantly impacts the energy consumption of the entire decoder. This article proposes an energy-efficient SC architecture designed for BCH decoders. The proposed architecture fundamentally adopts a remainder-based SC, which consumes less energy than the conventional Horner’s method-based SC unit. Furthermore, unlike previous remainder-based approaches, it uses a minimal polynomial to produce a shorter remainder, leading to reduced computation and improved energy efficiency. Implementation results demonstrate an 80% improvement in energy efficiency compared to the latest Horner’s method-based SC unit and a 35% improvement compared to the previous remainder-based SC unit.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2488-2496"},"PeriodicalIF":3.1,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
All-Digital CMOS Pulse-Shrinking Time-to-Digital Converter With Built-in Offset-Error Cancellation and Smart Temperature Sensor 全数字CMOS脉冲收缩时间-数字转换器,内置偏移误差抵消和智能温度传感器
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-10 DOI: 10.1109/TVLSI.2025.3585732
Chun-Chi Chen;Chao-Lieh Chen;Kai-Hsiang Chang
This brief presents an all-digital CMOS time-to-digital converter (TDC) with an integrated smart temperature sensor (STS), effectively reducing circuit complexity and cost. Unlike previous designs employing a single coupling unit, the proposed TDC adopts a two-coupling-unit structure, simplifying the overall architecture while enabling pulse-shrinking time measurement and offset-error cancellation within a single cyclic delay line. The built-in cancellation enhances linearity while minimizing overhead. Notably, the integrated STS requires only one additional coupling unit, ensuring a negligible impact on circuit complexity and cost. Fabricated using the TSMC 0.35- $mu $ m CMOS process, the proposed design demonstrates improved cost efficiency compared to prior works. Experimental results validate the successful measurement of time and temperature, highlighting the advantages of reduced complexity and cost savings.
本文介绍了一种集成了智能温度传感器(STS)的全数字CMOS时间-数字转换器(TDC),有效地降低了电路的复杂性和成本。与以往采用单个耦合单元的设计不同,本文提出的TDC采用双耦合单元结构,简化了整体结构,同时在单个循环延迟线内实现了脉冲收缩时间测量和偏移误差抵消。内置抵消增强线性,同时最大限度地减少开销。值得注意的是,集成STS只需要一个额外的耦合单元,确保对电路复杂性和成本的影响可以忽略不计。采用TSMC 0.35- $mu $ m CMOS工艺制造,与先前的工作相比,所提出的设计具有更高的成本效率。实验结果验证了时间和温度的成功测量,突出了降低复杂性和节省成本的优点。
{"title":"All-Digital CMOS Pulse-Shrinking Time-to-Digital Converter With Built-in Offset-Error Cancellation and Smart Temperature Sensor","authors":"Chun-Chi Chen;Chao-Lieh Chen;Kai-Hsiang Chang","doi":"10.1109/TVLSI.2025.3585732","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3585732","url":null,"abstract":"This brief presents an all-digital CMOS time-to-digital converter (TDC) with an integrated smart temperature sensor (STS), effectively reducing circuit complexity and cost. Unlike previous designs employing a single coupling unit, the proposed TDC adopts a two-coupling-unit structure, simplifying the overall architecture while enabling pulse-shrinking time measurement and offset-error cancellation within a single cyclic delay line. The built-in cancellation enhances linearity while minimizing overhead. Notably, the integrated STS requires only one additional coupling unit, ensuring a negligible impact on circuit complexity and cost. Fabricated using the TSMC 0.35-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m CMOS process, the proposed design demonstrates improved cost efficiency compared to prior works. Experimental results validate the successful measurement of time and temperature, highlighting the advantages of reduced complexity and cost savings.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2597-2601"},"PeriodicalIF":3.1,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ReTern: Exploiting Natural Redundancy and Sign Transformations for Enhanced Fault Tolerance in Compute-in-Memory-Based Ternary LLMs 利用自然冗余和符号转换来增强基于内存计算的三元llm的容错性
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-08 DOI: 10.1109/TVLSI.2025.3585043
Akul Malhotra;Sumeet Kumar Gupta
Ternary large language models (LLMs), which use ternary precision weights and 8-bit activations, have demonstrated competitive performance while significantly reducing the high computational and memory requirements of full-precision LLMs. The energy efficiency and performance of ternary LLMs can be further improved by deploying them on ternary computing-in-memory (TCiM) accelerators, thereby alleviating the von-Neumann bottleneck. However, TCiM accelerators are prone to memory stuck-at faults (SAFs) leading to degradation in model accuracy. This is particularly severe for LLMs due to their low weight sparsity. To boost SAF tolerance of TCiM accelerators, we propose ReTern that is based on 1) fault-aware sign transformations (FASTs) and 2) TCiM bitcell reprogramming exploiting their natural redundancy. The key idea is to use FAST to minimize computation errors due to SAFs in +1/−1 weights, while the natural bitcell redundancy is exploited to target SAFs in 0 weights (zero-fix). Our experiments on BitNet b1.58 700M and 3B ternary LLMs show that our technique furnishes significant fault tolerance, notably ~35% reduction in perplexity on the Wikitext dataset in the presence of faults. These benefits come at the cost of <3%, <7%, and <1% energy, latency, and area overheads, respectively.
使用三元精度权值和8位激活的三元大型语言模型(llm)已经展示出具有竞争力的性能,同时显著降低了全精度llm的高计算和内存需求。通过将三元llm部署在三元内存计算(TCiM)加速器上,可以进一步提高其能源效率和性能,从而缓解冯-诺伊曼瓶颈。然而,TCiM加速器容易出现内存卡故障(saf),导致模型精度下降。这对于llm来说尤其严重,因为它们的权重稀疏性很低。为了提高TCiM加速器的SAF容错性,我们提出了基于1)故障感知符号转换(fast)和2)利用其自然冗余的TCiM位元重编程的ReTern。关键思想是使用FAST来最小化由于+1/ - 1权重的SAFs引起的计算错误,同时利用自然的位元冗余来瞄准0权重的SAFs(零修复)。我们在BitNet b1.58 700M和3B三元llm上的实验表明,我们的技术提供了显著的容错性,特别是在存在故障的情况下,Wikitext数据集的困惑度降低了约35%。这些好处的代价分别是<3%、<7%和<1%的能量、延迟和面积开销。
{"title":"ReTern: Exploiting Natural Redundancy and Sign Transformations for Enhanced Fault Tolerance in Compute-in-Memory-Based Ternary LLMs","authors":"Akul Malhotra;Sumeet Kumar Gupta","doi":"10.1109/TVLSI.2025.3585043","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3585043","url":null,"abstract":"Ternary large language models (LLMs), which use ternary precision weights and 8-bit activations, have demonstrated competitive performance while significantly reducing the high computational and memory requirements of full-precision LLMs. The energy efficiency and performance of ternary LLMs can be further improved by deploying them on ternary computing-in-memory (TCiM) accelerators, thereby alleviating the von-Neumann bottleneck. However, TCiM accelerators are prone to memory stuck-at faults (SAFs) leading to degradation in model accuracy. This is particularly severe for LLMs due to their low weight sparsity. To boost SAF tolerance of TCiM accelerators, we propose ReTern that is based on 1) fault-aware sign transformations (FASTs) and 2) TCiM bitcell reprogramming exploiting their natural redundancy. The key idea is to use FAST to minimize computation errors due to SAFs in +1/−1 weights, while the natural bitcell redundancy is exploited to target SAFs in 0 weights (zero-fix). Our experiments on BitNet b1.58 700M and 3B ternary LLMs show that our technique furnishes significant fault tolerance, notably ~35% reduction in perplexity on the Wikitext dataset in the presence of faults. These benefits come at the cost of <3%, <7%, and <1% energy, latency, and area overheads, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2518-2527"},"PeriodicalIF":3.1,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FPGA-Oriented Design and Efficient Implementation of a Geometrically Tunable Multiscroll Conservative Chaotic System Without Equilibrium Points 无平衡点几何可调谐多涡旋保守混沌系统的fpga设计与高效实现
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-04 DOI: 10.1109/TVLSI.2025.3580266
Yerui Guang;Qun Ding;Dongxu Liu
Although multiscroll conservative chaotic systems exhibit rich dynamical characteristics and hold great potential for secure communications, existing designs generally suffer from limited controllability and low hardware implementation efficiency. To address these challenges, this article proposes a novel 4-D multiscroll conservative chaotic system based on a nonlinear feedback structure constructed using the floor function. This original approach simplifies the system’s logical structure, facilitating efficient hardware modeling while enabling flexible control over the number, amplitude, and spatial distribution of scrolls in 3-D space. The system’s high complexity and coexisting behaviors are validated through dynamical analyses, including equilibrium point analysis, Poincaré sections, and Lyapunov exponents (LEs). To achieve efficient deployment of the chaotic system on field-programmable gate array (FPGA) platforms, this article first simplifies the hardware implementation logic of the feedback structure through the design of an algorithmic model based on bitwise operations. Subsequently, precise control of the system’s module signals is achieved through a finite state machine (FSM) design. The results of the resource comparison analysis indicate that the proposed model achieves a high throughput of 10.08 Gbps while consuming only 1051 look-up tables (LUTs). The lower energy efficiency is 0.0264 mW/Mbps. Hardware-software co-simulation and oscilloscope visual output confirm the numerical precision and hardware feasibility of the proposed system. Finally, this system is integrated with the ZUC stream cipher to construct a novel encryption core, enabling asynchronous ciphertext transmission as well as encryption and decryption functions, thereby demonstrating its potential for secure hardware applications.
尽管多涡旋保守混沌系统具有丰富的动态特性,在安全通信方面具有很大的潜力,但现有设计普遍存在可控性有限和硬件实现效率低等问题。为了解决这些问题,本文提出了一种新的基于非线性反馈结构的四维多涡旋保守混沌系统。这种原始的方法简化了系统的逻辑结构,促进了高效的硬件建模,同时能够灵活地控制三维空间中卷轴的数量、幅度和空间分布。通过动力学分析,包括平衡点分析、poincar剖面和Lyapunov指数(LEs),验证了系统的高复杂性和共存行为。为了实现混沌系统在现场可编程门阵列(FPGA)平台上的高效部署,本文首先通过设计基于位运算的算法模型,简化了反馈结构的硬件实现逻辑。随后,通过有限状态机(FSM)设计实现对系统模块信号的精确控制。资源比较分析的结果表明,该模型在仅消耗1051个查找表(lut)的情况下实现了10.08 Gbps的高吞吐量。较低的能源效率为0.0264 mW/Mbps。软硬件联合仿真和示波器视觉输出验证了系统的数值精度和硬件可行性。最后,将该系统与ZUC流密码集成,构建了一种新颖的加密核心,实现了异步密文传输以及加解密功能,从而展示了其在安全硬件应用中的潜力。
{"title":"FPGA-Oriented Design and Efficient Implementation of a Geometrically Tunable Multiscroll Conservative Chaotic System Without Equilibrium Points","authors":"Yerui Guang;Qun Ding;Dongxu Liu","doi":"10.1109/TVLSI.2025.3580266","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3580266","url":null,"abstract":"Although multiscroll conservative chaotic systems exhibit rich dynamical characteristics and hold great potential for secure communications, existing designs generally suffer from limited controllability and low hardware implementation efficiency. To address these challenges, this article proposes a novel 4-D multiscroll conservative chaotic system based on a nonlinear feedback structure constructed using the floor function. This original approach simplifies the system’s logical structure, facilitating efficient hardware modeling while enabling flexible control over the number, amplitude, and spatial distribution of scrolls in 3-D space. The system’s high complexity and coexisting behaviors are validated through dynamical analyses, including equilibrium point analysis, Poincaré sections, and Lyapunov exponents (LEs). To achieve efficient deployment of the chaotic system on field-programmable gate array (FPGA) platforms, this article first simplifies the hardware implementation logic of the feedback structure through the design of an algorithmic model based on bitwise operations. Subsequently, precise control of the system’s module signals is achieved through a finite state machine (FSM) design. The results of the resource comparison analysis indicate that the proposed model achieves a high throughput of 10.08 Gbps while consuming only 1051 look-up tables (LUTs). The lower energy efficiency is 0.0264 mW/Mbps. Hardware-software co-simulation and oscilloscope visual output confirm the numerical precision and hardware feasibility of the proposed system. Finally, this system is integrated with the ZUC stream cipher to construct a novel encryption core, enabling asynchronous ciphertext transmission as well as encryption and decryption functions, thereby demonstrating its potential for secure hardware applications.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2528-2541"},"PeriodicalIF":3.1,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Secure-by-Design Hardware/Operating System as a Substrate for Trustworthy Computing 一种设计安全的硬件/操作系统作为可信计算的基础
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-03 DOI: 10.1109/TVLSI.2025.3579484
Sebastian Haas;Christopher Dunkel;Friedrich Pauls;Mattis Hasler;Yogesh Verma;Nilanjana Das;Michael Raitza
Nowadays, digital devices like sensors, cell phones, and home servers are deeply embedded in our world to make our daily lives easier. Since we heavily rely on these systems, it is crucial to guarantee their correct functionality and to ensure security and privacy properties. As systems become increasingly complex, it is difficult to maintain security since it necessitates a thorough understanding of all functionalities in hardware and software. Complexity may lead to vulnerabilities that malicious components can exploit. These components can compromise security features provided by the processing cores and the operating system (OS), jeopardizing the overall trustworthiness of the system. In this article, we provide a secure-by-default hardware/OS co-design to build a substrate for trustworthy computing in digital devices. The design is based on a tiled architecture that can integrate untrusted hardware components. Instead of relying on isolation mechanisms of potentially malicious components, isolation is achieved by dedicated and independent hardware components called trusted communication units (TCUs). By keeping the attack surface small and isolating all components by default, malicious hardware and software are restricted in access permissions and, hence, cannot easily break the system’s security. We implemented a TCU-based multiprocessor architecture in a silicon research chip, called Masur23, and ran transfer workloads and selected portions of the microkernel-based OS M3. Our measurements demonstrate the feasibility of such a hardware/OS co-design for trustworthy computing. Compared to the entire chip implementation, security features require minimal latency, area, and power consumption overhead.
如今,像传感器、手机和家庭服务器这样的数字设备已经深入到我们的生活中,使我们的日常生活更加便利。由于我们严重依赖这些系统,因此确保它们的正确功能并确保安全和隐私属性至关重要。随着系统变得越来越复杂,维护安全性变得越来越困难,因为它需要彻底了解硬件和软件中的所有功能。复杂性可能导致被恶意组件利用的漏洞。这些组件可能会危及处理核心和操作系统提供的安全特性,从而危及系统的整体可信度。在本文中,我们提供了一个默认安全的硬件/操作系统协同设计,为数字设备中的可信计算构建一个基础。该设计基于可以集成不可信硬件组件的平铺架构。隔离不是依赖于潜在恶意组件的隔离机制,而是通过称为可信通信单元(tcu)的专用独立硬件组件实现。在默认情况下,通过保持攻击面较小和隔离所有组件,恶意硬件和软件的访问权限受到限制,因此无法轻易破坏系统的安全性。我们在一个名为Masur23的硅研究芯片上实现了一个基于tcu的多处理器架构,并运行了传输工作负载和基于微内核的OS M3的选择部分。我们的测量证明了这种硬件/操作系统协同设计用于可信计算的可行性。与整个芯片实现相比,安全功能需要最小的延迟、面积和功耗开销。
{"title":"A Secure-by-Design Hardware/Operating System as a Substrate for Trustworthy Computing","authors":"Sebastian Haas;Christopher Dunkel;Friedrich Pauls;Mattis Hasler;Yogesh Verma;Nilanjana Das;Michael Raitza","doi":"10.1109/TVLSI.2025.3579484","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3579484","url":null,"abstract":"Nowadays, digital devices like sensors, cell phones, and home servers are deeply embedded in our world to make our daily lives easier. Since we heavily rely on these systems, it is crucial to guarantee their correct functionality and to ensure security and privacy properties. As systems become increasingly complex, it is difficult to maintain security since it necessitates a thorough understanding of all functionalities in hardware and software. Complexity may lead to vulnerabilities that malicious components can exploit. These components can compromise security features provided by the processing cores and the operating system (OS), jeopardizing the overall trustworthiness of the system. In this article, we provide a secure-by-default hardware/OS co-design to build a substrate for trustworthy computing in digital devices. The design is based on a tiled architecture that can integrate untrusted hardware components. Instead of relying on isolation mechanisms of potentially malicious components, isolation is achieved by dedicated and independent hardware components called trusted communication units (TCUs). By keeping the attack surface small and isolating all components by default, malicious hardware and software are restricted in access permissions and, hence, cannot easily break the system’s security. We implemented a TCU-based multiprocessor architecture in a silicon research chip, called Masur23, and ran transfer workloads and selected portions of the microkernel-based OS M<sup>3</sup>. Our measurements demonstrate the feasibility of such a hardware/OS co-design for trustworthy computing. Compared to the entire chip implementation, security features require minimal latency, area, and power consumption overhead.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 10","pages":"2862-2872"},"PeriodicalIF":3.1,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stochastic Belief Propagation-Based Iterative Detection and Decoding for MIMO Systems 基于随机信念传播的MIMO系统迭代检测与解码
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-02 DOI: 10.1109/TVLSI.2024.3477963
Muhao Li;Houren Ji;Xiaosi Tan;Chuan Zhang
In this brief, a stochastic belief propagation (BP)-based iterative detection and decoding (IDD) for multiple-input and multiple-output (MIMO) system is proposed. We modify the algorithm of BP detection to make it more suitable for stochastic computation and enable the soft message to be transmitted between the detector and decoder in the format of stochastic sequences. Through IDD, the required number of iterations and quantization precision for the detector will decrease. By sharing the stochastic number generator, the hardware complexity of both the detector and decoder can be reduced. Hardware architectural optimizations and the corresponding implementation are also given, and we can implement $64times 32$ , four-QAM MIMO system with (128, 64) polar codes with $1.283~text {mm}^{2}$ area consumption. Compared with other detector, the hardware efficiency can be improved by 7.8 times.
本文提出了一种基于随机信念传播(BP)的多输入多输出(MIMO)系统迭代检测与解码(IDD)方法。我们改进了BP检测算法,使其更适合随机计算,并使软信息以随机序列的形式在检测器和解码器之间传输。通过IDD,探测器所需的迭代次数和量化精度会降低。通过共享随机数生成器,可以降低检测器和解码器的硬件复杂度。硬件架构优化和相应的实现,我们可以实现$64 × 32$, 4 - qam MIMO系统,(128,64)极码,$1.283~text {mm}^{2}$面积消耗。与其他检测器相比,硬件效率可提高7.8倍。
{"title":"Stochastic Belief Propagation-Based Iterative Detection and Decoding for MIMO Systems","authors":"Muhao Li;Houren Ji;Xiaosi Tan;Chuan Zhang","doi":"10.1109/TVLSI.2024.3477963","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3477963","url":null,"abstract":"In this brief, a stochastic belief propagation (BP)-based iterative detection and decoding (IDD) for multiple-input and multiple-output (MIMO) system is proposed. We modify the algorithm of BP detection to make it more suitable for stochastic computation and enable the soft message to be transmitted between the detector and decoder in the format of stochastic sequences. Through IDD, the required number of iterations and quantization precision for the detector will decrease. By sharing the stochastic number generator, the hardware complexity of both the detector and decoder can be reduced. Hardware architectural optimizations and the corresponding implementation are also given, and we can implement <inline-formula> <tex-math>$64times 32$ </tex-math></inline-formula>, four-QAM MIMO system with (128, 64) polar codes with <inline-formula> <tex-math>$1.283~text {mm}^{2}$ </tex-math></inline-formula> area consumption. Compared with other detector, the hardware efficiency can be improved by 7.8 times.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2324-2328"},"PeriodicalIF":2.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144705278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information IEEE超大规模集成电路(VLSI)系统学报
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-30 DOI: 10.1109/TVLSI.2025.3579662
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2025.3579662","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3579662","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11059982","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information 超大规模集成电路(VLSI)系统学报
IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-30 DOI: 10.1109/TVLSI.2025.3579664
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3579664","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3579664","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11059983","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Memory BIST With an Optimized RTL-BIST IP Core: A Low-Power, High-Fault-Coverage Approach 用优化的RTL-BIST IP核增强内存BIST:一种低功耗、高故障覆盖率的方法
IF 3.1 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-06-27 DOI: 10.1109/TVLSI.2025.3581296
Ming-Yi Lin;Wei-Kuan Chiang;Chin-Hung Wang
The increasing density of static random access memory (SRAM) in modern system-on-chip (SoC) architectures has intensified the need for efficient built-in self-test (BIST) solutions to ensure fault detection and repair. This article presents an optimized register transfer level (RTL)-BIST intellectual property core (IP core) that integrates a novel March mSR+ algorithm, providing a low-power, high-fault-coverage approach to embedded memory testing. Developed using high-level synthesis (HLS), the proposed framework enhances test efficiency while minimizing hardware complexity. Experimental results on field-programmable gate array (FPGA) implementations demonstrate that the March mSR+ algorithm achieves an 88.89% fault coverage while reducing power consumption compared with conventional March-based testing methods. These findings validate the effectiveness of the RTL-BIST framework in improving memory reliability for artificial intelligence (AI), high-performance computing (HPC), and safety-critical applications.
在现代片上系统(SoC)架构中,静态随机存取存储器(SRAM)的密度不断增加,这增加了对高效内置自检(BIST)解决方案的需求,以确保故障检测和修复。本文提出了一种优化的寄存器传输电平(RTL)-BIST知识产权核(IP核),它集成了一种新颖的March mSR+算法,为嵌入式内存测试提供了一种低功耗、高故障覆盖率的方法。使用高级综合(HLS)开发的框架提高了测试效率,同时最小化了硬件复杂性。在现场可编程门阵列(FPGA)上的实验结果表明,与传统的基于March的测试方法相比,March mSR+算法的故障覆盖率达到了88.89%,同时降低了功耗。这些发现验证了RTL-BIST框架在提高人工智能(AI)、高性能计算(HPC)和安全关键应用的内存可靠性方面的有效性。
{"title":"Enhancing Memory BIST With an Optimized RTL-BIST IP Core: A Low-Power, High-Fault-Coverage Approach","authors":"Ming-Yi Lin;Wei-Kuan Chiang;Chin-Hung Wang","doi":"10.1109/TVLSI.2025.3581296","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3581296","url":null,"abstract":"The increasing density of static random access memory (SRAM) in modern system-on-chip (SoC) architectures has intensified the need for efficient built-in self-test (BIST) solutions to ensure fault detection and repair. This article presents an optimized register transfer level (RTL)-BIST intellectual property core (IP core) that integrates a novel March mSR+ algorithm, providing a low-power, high-fault-coverage approach to embedded memory testing. Developed using high-level synthesis (HLS), the proposed framework enhances test efficiency while minimizing hardware complexity. Experimental results on field-programmable gate array (FPGA) implementations demonstrate that the March mSR+ algorithm achieves an 88.89% fault coverage while reducing power consumption compared with conventional March-based testing methods. These findings validate the effectiveness of the RTL-BIST framework in improving memory reliability for artificial intelligence (AI), high-performance computing (HPC), and safety-critical applications.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2556-2569"},"PeriodicalIF":3.1,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1