Proceedings of the Great Lakes Symposium on VLSI 2022最新文献

英文中文

Session details: Session 1B: Emerging Computing and Post-CMOS Technologies 会议详情:1B:新兴计算和后cmos技术

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3542683

Deliang Fan

引用次数: 0

Resiliency in Connected Vehicle Applications: Challenges and Approaches for Security Validation 互联汽车应用中的弹性:安全验证的挑战和方法

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530832

Srivalli Boddupalli, Richard Owoputi, Chengwei Duan, T. Choudhury, Sandip Ray

With the proliferation of connectivity and smart computing in vehicles, a new attack surface has emerged that targets subversion of vehicular applications by compromising sensors and communication. A unique feature of these attacks is that they no longer require intrusion into the hardware and software components of the victim vehicle; rather, it is possible to subvert the application by providing wrong or misleading information. We consider the problem of making vehicular systems resilient against these threats. A promising approach is to adapt resiliency solutions based on anomaly detection through Machine Learning. We discuss challenges in making such an approach viable. In particular, we consider the problem of validating such resiliency architectures, the factors that make the problem challenging, and our approaches to address the challenges.

随着车辆连接和智能计算的普及，一个新的攻击面出现了，目标是通过破坏传感器和通信来颠覆车辆应用。这些攻击的一个独特之处在于，它们不再需要入侵受害者车辆的硬件和软件组件;相反，有可能通过提供错误或误导性信息来破坏应用程序。我们考虑的问题是使车辆系统能够抵御这些威胁。一种很有前途的方法是通过机器学习来适应基于异常检测的弹性解决方案。我们将讨论使这种方法可行所面临的挑战。特别地，我们考虑了验证这种弹性架构的问题，使问题具有挑战性的因素，以及我们处理这些挑战的方法。

引用次数: 0

SRS-Mig: Selection and Run-time Scheduling of page Migration for improved response time in hybrid PCM-DRAM memories 在混合PCM-DRAM存储器中改进响应时间的页面迁移选择和运行时调度

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530327

N. Aswathy, Sreesiddesh Bhavanasi, A. Sarkar, H. Kapoor

Hybrid memory systems with a combination of DRAM and Non-Volatile Memory (NVM) types can make use of scalability and performance of both NVM and DRAM. Random placement of pages in Phase Change Memory (PCM) with more write accesses incurs higher write latencies. So, migrating write intensive pages from PCM to DRAM helps to reduce execution time and memory response time for applications. Existing techniques mainly focus on selecting the page migration candidate and migrate it immediately when it becomes eligible. This direct migration approach can hamper the response time of regular memory accesses. So, in our paper, we identify migration candidates and in addition, schedule when they can be migrated to DRAM. To realize this, we have used Selection and Run-time Scheduling of page Migration (SRS-Mig), a frame-based scheduling approach for migrations and read/write requests. SRS-Mig reduces migration overhead and guarantees future accesses to migrated pages to yield an improved execution time and memory response time for the applications. Experimental evaluation shows 30% improvement in execution time; 26% improvement memory response time, and considerable energy savings with the existing baseline techniques.

混合内存系统结合了DRAM和非易失性内存(NVM)类型，可以同时利用NVM和DRAM的可扩展性和性能。在具有更多写访问的相变存储器(PCM)中随机放置页面会导致更高的写延迟。因此，将写密集型页面从PCM迁移到DRAM有助于减少应用程序的执行时间和内存响应时间。现有的技术主要侧重于选择页面迁移候选项，并在它符合条件时立即进行迁移。这种直接迁移方法会影响常规内存访问的响应时间。因此，在我们的论文中，我们确定了迁移候选者，并且安排了何时可以迁移到DRAM。为了实现这一点，我们使用了页面迁移的选择和运行时调度(SRS-Mig)，这是一种基于帧的迁移和读/写请求调度方法。SRS-Mig减少了迁移开销，并保证将来对迁移页面的访问，从而改善了应用程序的执行时间和内存响应时间。实验评估表明，执行时间提高30%;与现有的基线技术相比，提高了26%的内存响应时间，并节省了大量的能源。

{"title":"SRS-Mig: Selection and Run-time Scheduling of page Migration for improved response time in hybrid PCM-DRAM memories","authors":"N. Aswathy, Sreesiddesh Bhavanasi, A. Sarkar, H. Kapoor","doi":"10.1145/3526241.3530327","DOIUrl":"https://doi.org/10.1145/3526241.3530327","url":null,"abstract":"Hybrid memory systems with a combination of DRAM and Non-Volatile Memory (NVM) types can make use of scalability and performance of both NVM and DRAM. Random placement of pages in Phase Change Memory (PCM) with more write accesses incurs higher write latencies. So, migrating write intensive pages from PCM to DRAM helps to reduce execution time and memory response time for applications. Existing techniques mainly focus on selecting the page migration candidate and migrate it immediately when it becomes eligible. This direct migration approach can hamper the response time of regular memory accesses. So, in our paper, we identify migration candidates and in addition, schedule when they can be migrated to DRAM. To realize this, we have used Selection and Run-time Scheduling of page Migration (SRS-Mig), a frame-based scheduling approach for migrations and read/write requests. SRS-Mig reduces migration overhead and guarantees future accesses to migrated pages to yield an improved execution time and memory response time for the applications. Experimental evaluation shows 30% improvement in execution time; 26% improvement memory response time, and considerable energy savings with the existing baseline techniques.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126526265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

P3S: A High Accuracy Probabilistic Prediction Processing System for CNN Acceleration P3S:用于CNN加速的高精度概率预测处理系统

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530322

Hang Xiao, Haobo Xu, Xiaoming Chen, Yujie Wang, Yinhe Han

Convolutional Neural Networks (CNNs) achieve state-of-the-art performance for perception tasks at the cost of billions of computational operations. In this paper, we propose a probabilistic prediction processing system, dubbed P3S, to eliminate redundant compute-heavy convolution operations by predicting whether output activations are zero-valued. By exploiting the probability characteristic of Gaussian-like distributed activations and weights in CNNs, P3S calculates the partial convolution across values greater than a standard deviation-related threshold, to predict the ineffectual output activations. P3S skips remaining convolutions and sets outputs to zero in advance if output activations are predicted to be zero. P3S reduces 67% computations within 0.2% accuracy loss and does not even require retraining or fine-tuning CNNs. We further implement a P3S-based CNN accelerator that achieves 2.02x speedup and 2.23x energy efficiency on average over the traditional accelerator. Compared with the state-of-the-art prediction-based accelerator with 3% accuracy degradation, our P$^3$S yields up to 1.49x speedup and 1.69x energy efficiency.

卷积神经网络(cnn)以数十亿次的计算操作为代价，实现了最先进的感知任务性能。在本文中，我们提出了一个概率预测处理系统，称为P3S，通过预测输出激活是否为零值来消除冗余的计算繁重的卷积操作。通过利用cnn中类高斯分布激活和权值的概率特征，P3S计算大于标准差相关阈值的部分卷积，以预测无效输出激活。如果预测输出激活为零，P3S将跳过剩余的卷积并提前将输出设置为零。P3S在0.2%的精度损失内减少67%的计算，甚至不需要重新训练或微调cnn。我们进一步实现了一个基于p3s的CNN加速器，与传统加速器相比，它的平均加速提高了2.02倍，能效提高了2.23倍。与精度下降3%的最先进的基于预测的加速器相比，我们的P$^3$S产生高达1.49倍的加速和1.69倍的能效。

{"title":"P3S: A High Accuracy Probabilistic Prediction Processing System for CNN Acceleration","authors":"Hang Xiao, Haobo Xu, Xiaoming Chen, Yujie Wang, Yinhe Han","doi":"10.1145/3526241.3530322","DOIUrl":"https://doi.org/10.1145/3526241.3530322","url":null,"abstract":"Convolutional Neural Networks (CNNs) achieve state-of-the-art performance for perception tasks at the cost of billions of computational operations. In this paper, we propose a probabilistic prediction processing system, dubbed P3S, to eliminate redundant compute-heavy convolution operations by predicting whether output activations are zero-valued. By exploiting the probability characteristic of Gaussian-like distributed activations and weights in CNNs, P3S calculates the partial convolution across values greater than a standard deviation-related threshold, to predict the ineffectual output activations. P3S skips remaining convolutions and sets outputs to zero in advance if output activations are predicted to be zero. P3S reduces 67% computations within 0.2% accuracy loss and does not even require retraining or fine-tuning CNNs. We further implement a P3S-based CNN accelerator that achieves 2.02x speedup and 2.23x energy efficiency on average over the traditional accelerator. Compared with the state-of-the-art prediction-based accelerator with 3% accuracy degradation, our P$^3$S yields up to 1.49x speedup and 1.69x energy efficiency.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"04 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130520273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DAReS: Deflection Aware Rerouting between Subnetworks in Bufferless On-Chip Networks 无缓冲片上网络中子网之间的偏转感知重路由

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530332

Rose George Kunthara, Rekha K. James, Simi Zerine Sleeba, John Jose

Network on Chip (NoC) is an effective intercommunication structure used in the design of efficient Tiled Chip Multi Processor (TCMP) systems as they improve system performance manifold. Bufferless NoC has emerged as a popular design choice to address area and energy concerns associated with buffered NoC systems. For low to medium injection rate applications, both bufferless and buffered routers show similar network performance. As the network load rises, network performance of bufferless router based designs deteriorate due to increased deflections. This paper proposes a subnetwork based bufferless design, DAReS, to minimize deflections by redirecting contending flit in one subnetwork to unoccupied productive ports of other subnetwork without incurring any extra cycle delay. From evaluations, we observe that our proposed design approach improves network performance by minimizing deflection rate, power dissipation and shows better throughput in comparison to state-of-the-art bufferless router.

片上网络(Network on Chip, NoC)是设计高效贴片多处理器(TCMP)系统时所采用的一种有效的通信结构，它可以大大提高系统的性能。无缓冲NoC已成为解决与缓冲NoC系统相关的面积和能源问题的流行设计选择。对于低到中等注入速率的应用程序，无缓冲和有缓冲路由器都显示出相似的网络性能。随着网络负载的增加，基于无缓冲路由器设计的网络性能由于增加的偏转而恶化。本文提出了一种基于子网的无缓冲设计(dare)，通过在不产生任何额外的周期延迟的情况下，将一个子网中的竞争流量重定向到另一个子网的未占用的生产端口，从而最小化偏转。从评估中，我们观察到我们提出的设计方法通过最小化偏转率，功耗来提高网络性能，并且与最先进的无缓冲路由器相比，显示出更好的吞吐量。

引用次数: 2

Session details: Session 4B: VLSI for Machine Learning and Artifical Intelligence 2 会议详情:4B: VLSI用于机器学习和人工智能

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3542689

J. Hu

引用次数: 0

Reducing Power Consumption using Approximate Encoding for CNN Accelerators at the Edge 在边缘使用近似编码降低CNN加速器的功耗

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530315

Tongxin Yang, Tomoaki Ukezono, Toshinori Sato

Convolutional neural networks (CNNs) have demonstrated significant potential across a range of applications due to their superior accuracy. Edge inference, in which inference is performed locally in embedded systems with limited power resources, is researched for its energy efficiency. An approximate encoder is proposed in this study for decreasing switching activity, which minimizes power consumption in CNN accelerators at the edge. The proposed encoder performs approximate encoding based on a pattern matching of a comparison pattern and current data. Software determines the value of the comparison pattern and the availability of the recommended encoder. Experiments with a CIFAR-10 dataset utilizing LeNet5 show that using the suggested encoder, depending upon the comparison pattern, power consumption of a CNN accelerator can be reduced by 21.5% with 1.59% degradation on inference quality.

卷积神经网络(cnn)由于其优越的准确性，在一系列应用中显示出巨大的潜力。针对边缘推理的能效问题，研究了在有限功耗条件下，嵌入式系统局部进行边缘推理的方法。本研究提出了一种近似编码器，以减少开关活动，从而最大限度地减少CNN加速器边缘的功耗。所提出的编码器基于比较模式和当前数据的模式匹配执行近似编码。软件决定比较模式的值和推荐编码器的可用性。利用LeNet5对CIFAR-10数据集进行的实验表明，根据比较模式的不同，使用建议的编码器，CNN加速器的功耗可以降低21.5%，推理质量降低1.59%。

引用次数: 1

A Novel 2T2R CR-based TCAM Design for High-speed and Energy-efficient Applications 基于2T2R cr的高速节能TCAM设计

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530336

Kangqiang Pan, Amr M. S. Tosson, Ningxuan Wang, N. Zhou, Lan Wei

A 2T2R current race (CR) based ternary content addressable memory (TCAM) design is proposed using resistive random-access memory (RRAM) technology. The suggested design adopts a match-line (ML) booster feature in sensing amplifier to improve search speed and tolerance to RRAM switching variations. An SR-latch cascading scheme is presented to further improve the speed and energy efficiency for large TCAM array. Additionally, a same clock phase cascading scheme is proposed to reduce latency in cascading structure, by placing evaluation phase of all stages in the same clock phase. With the suggested ML booster, our 64-bit 1-stage design has speed and energy consumption matching the best performance reported by other emerging non-volatile memory (eNVM) based TCAM design. Our 128-bit 2-stage design also has comparable speed and energy to SRAM-based TCAM design with significantly more compact size (90% reduction) and non-volatility.

采用电阻式随机存取存储器(RRAM)技术，提出了一种基于2T2R电流竞赛(CR)的三元内容可寻址存储器(TCAM)设计方案。建议的设计在传感放大器中采用匹配线(ML)增强功能，以提高搜索速度和对RRAM开关变化的容忍度。为了进一步提高大型TCAM阵列的速度和能量效率，提出了一种sr锁存级联方案。此外，提出了一种相同时钟相位的级联方案，通过将所有阶段的评估阶段置于同一时钟相位来减少级联结构中的延迟。使用建议的ML助推器，我们的64位单级设计的速度和能耗与其他新兴的基于非易失性存储器(eNVM)的TCAM设计的最佳性能相匹配。我们的128位2级设计也具有与基于sram的TCAM设计相当的速度和能量，并且尺寸更紧凑(减少90%)且不易挥发。

引用次数: 2

An Oracle-Less Machine-Learning Attack against Lookup-Table-based Logic Locking 针对基于查找表的逻辑锁的无oracle机器学习攻击

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530377

Kaveh Shamsi, Guangwei Zhao

Replacing cuts in a circuit with configurable lookup-tables (LUTs) that are securely programmed post-fabrication is a logic locking technique that can be used to hide the complete design from an untrusted foundry. In this paper, we study the security of basic LUT-based locking against a set of oracle-less attacks, i.e. attacks that do not have access to a functional oracle of the original circuit. Specifically we perform cut graph/truth-table prediction using deep and graph neural networks with various data encoding strategies. Overall we observe that naive LUT-based locking with small cuts with 2 or 3 inputs may be vulnerable to oracle-less approximation whereas such attacks become less feasible for higher cut sizes. We open source our software for this attack.

用可配置的查找表(lut)替换电路中的切口是一种逻辑锁定技术，可以用来对不可信的铸造厂隐藏完整的设计。在本文中，我们研究了基本的基于lut的锁定对一组无oracle攻击的安全性，即无法访问原始电路的功能oracle的攻击。具体来说，我们使用具有各种数据编码策略的深度和图形神经网络执行切图/真值表预测。总的来说，我们观察到，具有2或3个输入的小切口的朴素的基于lut的锁定可能容易受到无oracle近似的攻击，而对于更高的切口大小，这种攻击变得不太可行。我们为这次攻击开源了软件。

引用次数: 1

A Tutorial-style Single-cycle Fast Fourier Transform Processor 一个教程式的单周期快速傅立叶变换处理器

Proceedings of the Great Lakes Symposium on VLSI 2022

Pub Date : 2022-06-06 DOI: 10.1145/3526241.3530329

Alec Vercruysse, M. W. Miller, Joshua Brake, D. Harris

The Fast Fourier Transform (FFT) is one of the most important algorithms of the past century. It presents a way to compute the discrete Fourier transform with a computational complexity of O(N log2(N)). Its structure also provides an excellent example of the power of custom hardware accelerators. However, current tutorial-style papers implementing the FFT are not well-suited for undergraduate students since they are either too vague on important implementation details or use a pipelined architecture which can obscure important fundamental concepts of the accelerator. This paper presents a simple, single-cycle version of an FFT hardware accelerator that can be implemented on an FPGA and is accompanied with source code to easily simulate and synthesize the design available at https://doi.org/10.5281/zenodo.6219524.

快速傅里叶变换(FFT)是上个世纪最重要的算法之一。给出了一种计算离散傅里叶变换的方法，其计算复杂度为O(N log2(N))。它的结构也为定制硬件加速器的强大功能提供了一个很好的例子。然而，目前实现FFT的教程式论文并不适合本科生，因为它们要么在重要的实现细节上过于模糊，要么使用流水线架构，这可能会模糊加速器的重要基本概念。本文提出了一个简单的、单周期的FFT硬件加速器，它可以在FPGA上实现，并附带了源代码，可以在https://doi.org/10.5281/zenodo.6219524上轻松地模拟和综合设计。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Great Lakes Symposium on VLSI 2022

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀