2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)最新文献

英文中文

Ignore Clocking Constraints: An Alternative Physical Design Methodology for Field-Coupled Nanotechnologies 忽略时钟限制:场耦合纳米技术的另一种物理设计方法

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00121

R. Wille, Marcel Walter, F. Sill, Daniel Große, R. Drechsler

Field-Coupled Nanocomputing (FCN) allows for conducting computations with a power consumption that is magnitudes below current CMOS technologies. Recent physical implementations confirmed these prospects and put pressure on the Electronic Design Automation (EDA) community to develop physical design methods comparable to those available for conventional circuits. While the major design task boils down to a place and route problem, certain characteristics of FCN circuits introduce further challenges in terms of dedicated clock arrangements which lead to rather cumbersome clocking constraints. Thus far, those constraints have been addressed in a rather unsatisfactory fashion only. In this work, we propose a physical design methodology which tackles this problem by simply ignoring the clocking constraints and using adjusted conventional place and route algorithms. In order to deal with the resulting ramifications, a dedicated synchronization element is introduced. Results extracted from a physics simulator confirm the feasibility of the approach. A proof of concept implementation illustrates that ignoring clocking constraints indeed allows for a promising alternative direction for FCN design that overcomes the obstacles preventing the development of efficient solutions thus far.

场耦合纳米计算(FCN)允许以低于当前CMOS技术数量级的功耗进行计算。最近的物理实现证实了这些前景，并给电子设计自动化(EDA)社区施加了压力，要求他们开发可与传统电路相媲美的物理设计方法。虽然主要的设计任务归结为位置和路由问题，但FCN电路的某些特性在专用时钟安排方面带来了进一步的挑战，导致相当麻烦的时钟限制。到目前为止，这些限制只以一种相当不令人满意的方式得到解决。在这项工作中，我们提出了一种物理设计方法，通过简单地忽略时钟约束和使用调整后的传统位置和路径算法来解决这个问题。为了处理由此产生的后果，引入了专用的同步元素。从物理模拟器中提取的结果证实了该方法的可行性。概念验证实现表明，忽略时钟约束确实为FCN设计提供了一个有希望的替代方向，克服了迄今为止阻碍开发有效解决方案的障碍。

{"title":"Ignore Clocking Constraints: An Alternative Physical Design Methodology for Field-Coupled Nanotechnologies","authors":"R. Wille, Marcel Walter, F. Sill, Daniel Große, R. Drechsler","doi":"10.1109/ISVLSI.2019.00121","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00121","url":null,"abstract":"Field-Coupled Nanocomputing (FCN) allows for conducting computations with a power consumption that is magnitudes below current CMOS technologies. Recent physical implementations confirmed these prospects and put pressure on the Electronic Design Automation (EDA) community to develop physical design methods comparable to those available for conventional circuits. While the major design task boils down to a place and route problem, certain characteristics of FCN circuits introduce further challenges in terms of dedicated clock arrangements which lead to rather cumbersome clocking constraints. Thus far, those constraints have been addressed in a rather unsatisfactory fashion only. In this work, we propose a physical design methodology which tackles this problem by simply ignoring the clocking constraints and using adjusted conventional place and route algorithms. In order to deal with the resulting ramifications, a dedicated synchronization element is introduced. Results extracted from a physics simulator confirm the feasibility of the approach. A proof of concept implementation illustrates that ignoring clocking constraints indeed allows for a promising alternative direction for FCN design that overcomes the obstacles preventing the development of efficient solutions thus far.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"129 1","pages":"651-656"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85616464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Design of a Hierarchical Clos-Benes Optical Network-on-Chip Architecture 一种分层闭合光网络片上结构的设计

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00100

Renjie Yao, Yaoyao Ye, Weichen Liu

As chip multiprocessors keep growing in capability, on-chip communication efficiency is crucial to the overall performance. However, on-chip networks based on electronic switches suffer from excessive power consumption and limited performance. In order to take advantages of optical interconnect for large-scale on-chip communication in chip multiprocessors, we propose a design of hierarchical Clos-Benes optical network-on-chip (NoC) with an optimized control and routing scheme. The proposed control and routing scheme includes a priority based round-robin virtual output queue selection and a Q-learning based heuristic routing algorithm for the Clos network, and a traffic-aware adaptive routing for the intra-switch Benes network. By taking network load and runtime path allocation into account, the proposed Q-learning based heuristic routing can finally predict the best alternative path among all possible available paths with a much better path allocation success rate. A case study on a 256-core chip multiprocessor under uniform traffic shows that the network throughput is increased by 400%, 60%, and 16% respectively than the mesh, fattree and the baseline Clos-Benes optical NoC. On average of a set of real applications, the application ETE delay is reduced by 48%, 29%, and 20% respectively than the mesh, fattree and the baseline Clos-Benes network.

随着芯片多处理器性能的不断提高，片内通信效率对芯片的整体性能至关重要。然而，基于电子交换机的片上网络存在功耗过大、性能受限的问题。为了充分利用光互连技术在芯片多处理器中实现大规模片上通信，提出了一种具有优化控制和路由方案的层次化Clos-Benes片上光网络(NoC)设计。提出的控制和路由方案包括Clos网络的基于优先级的轮询虚拟输出队列选择和基于q学习的启发式路由算法，以及交换机内Benes网络的流量感知自适应路由。通过考虑网络负载和运行时路径分配，所提出的基于q学习的启发式路由最终能够在所有可能的可用路径中预测出最佳备选路径，并且路径分配成功率大大提高。通过对256核芯片多处理器在均匀流量下的实例研究表明，网络吞吐量比mesh、attree和基线Clos-Benes光NoC分别提高400%、60%和16%。在一组实际应用中，应用程序的ETE延迟平均比mesh、fatree和基线Clos-Benes网络分别减少48%、29%和20%。

{"title":"Design of a Hierarchical Clos-Benes Optical Network-on-Chip Architecture","authors":"Renjie Yao, Yaoyao Ye, Weichen Liu","doi":"10.1109/ISVLSI.2019.00100","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00100","url":null,"abstract":"As chip multiprocessors keep growing in capability, on-chip communication efficiency is crucial to the overall performance. However, on-chip networks based on electronic switches suffer from excessive power consumption and limited performance. In order to take advantages of optical interconnect for large-scale on-chip communication in chip multiprocessors, we propose a design of hierarchical Clos-Benes optical network-on-chip (NoC) with an optimized control and routing scheme. The proposed control and routing scheme includes a priority based round-robin virtual output queue selection and a Q-learning based heuristic routing algorithm for the Clos network, and a traffic-aware adaptive routing for the intra-switch Benes network. By taking network load and runtime path allocation into account, the proposed Q-learning based heuristic routing can finally predict the best alternative path among all possible available paths with a much better path allocation success rate. A case study on a 256-core chip multiprocessor under uniform traffic shows that the network throughput is increased by 400%, 60%, and 16% respectively than the mesh, fattree and the baseline Clos-Benes optical NoC. On average of a set of real applications, the application ETE delay is reduced by 48%, 29%, and 20% respectively than the mesh, fattree and the baseline Clos-Benes network.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"17 1","pages":"523-528"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81861268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Accelerating Deep Neural Networks in Processing-in-Memory Platforms: Analog or Digital Approach? 在内存处理平台中加速深度神经网络:模拟还是数字方法?

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00044

Shaahin Angizi, Zhezhi He, D. Reis, X. Hu, Wilman Tsai, Shy-Jay Lin, Deliang Fan

Nowadays, research topics on AI accelerator designs have attracted great interest, where accelerating Deep Neural Network (DNN) using Processing-in-Memory (PIM) platforms is an actively-explored direction with great potential. PIM platforms, which simultaneously aims to address power- and memory-wall bottlenecks, have shown orders of performance enhancement in comparison to the conventional computing platforms with Von-Neumann architecture. As one direction of accelerating DNN in PIM, resistive memory array (aka. crossbar) has drawn great research interest owing to its analog current-mode weighted summation operation which intrinsically matches the dominant Multiplication-and-Accumulation (MAC) operation in DNN, making it one of the most promising candidates. An alternative direction for PIM-based DNN acceleration is through bulk bit-wise logic operations directly performed on the content in digital memories. Thanks to the high fault-tolerant characteristic of DNN, the latest algorithmic progression successfully quantized DNN parameters to low bit-width representations, while maintaining competitive accuracy levels. Such DNN quantization techniques essentially convert MAC operation to much simpler addition/subtraction or comparison operations, which can be performed by bulk bit-wise logic operations in a highly parallel fashion. In this paper, we build a comprehensive evaluation framework to quantitatively compare and analyze aforementioned PIM based analog and digital approaches for DNN acceleration.

目前，人工智能加速器设计的研究课题引起了人们的极大兴趣，其中利用内存中处理(Processing-in-Memory, PIM)平台加速深度神经网络(Deep Neural Network, DNN)是一个积极探索且潜力巨大的方向。PIM平台同时致力于解决功耗和内存瓶颈问题，与采用冯-诺伊曼架构的传统计算平台相比，PIM平台表现出了数量级的性能提升。作为PIM中加速DNN的一个方向，电阻式存储阵列(又称记忆阵列)。由于其模拟电流模式加权求和运算本质上与DNN中占主导地位的乘法累加(MAC)运算相匹配，因此引起了极大的研究兴趣，使其成为最有前途的候选算法之一。基于pim的DNN加速的另一个方向是通过直接对数字存储器中的内容执行批量逐位逻辑运算。由于深度神经网络的高容错特性，最新的算法进展成功地将深度神经网络参数量化为低位宽表示，同时保持有竞争力的精度水平。这种DNN量化技术本质上将MAC操作转换为更简单的加法/减法或比较操作，这些操作可以通过高度并行的批量按位逻辑操作来执行。在本文中，我们建立了一个全面的评估框架，以定量地比较和分析上述基于PIM的DNN加速模拟和数字方法。

{"title":"Accelerating Deep Neural Networks in Processing-in-Memory Platforms: Analog or Digital Approach?","authors":"Shaahin Angizi, Zhezhi He, D. Reis, X. Hu, Wilman Tsai, Shy-Jay Lin, Deliang Fan","doi":"10.1109/ISVLSI.2019.00044","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00044","url":null,"abstract":"Nowadays, research topics on AI accelerator designs have attracted great interest, where accelerating Deep Neural Network (DNN) using Processing-in-Memory (PIM) platforms is an actively-explored direction with great potential. PIM platforms, which simultaneously aims to address power- and memory-wall bottlenecks, have shown orders of performance enhancement in comparison to the conventional computing platforms with Von-Neumann architecture. As one direction of accelerating DNN in PIM, resistive memory array (aka. crossbar) has drawn great research interest owing to its analog current-mode weighted summation operation which intrinsically matches the dominant Multiplication-and-Accumulation (MAC) operation in DNN, making it one of the most promising candidates. An alternative direction for PIM-based DNN acceleration is through bulk bit-wise logic operations directly performed on the content in digital memories. Thanks to the high fault-tolerant characteristic of DNN, the latest algorithmic progression successfully quantized DNN parameters to low bit-width representations, while maintaining competitive accuracy levels. Such DNN quantization techniques essentially convert MAC operation to much simpler addition/subtraction or comparison operations, which can be performed by bulk bit-wise logic operations in a highly parallel fashion. In this paper, we build a comprehensive evaluation framework to quantitatively compare and analyze aforementioned PIM based analog and digital approaches for DNN acceleration.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"79 1","pages":"197-202"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83770302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Deep Learning for Edge Computing: Current Trends, Cross-Layer Optimizations, and Open Research Challenges 边缘计算的深度学习:当前趋势、跨层优化和开放研究挑战

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00105

Alberto Marchisio, Muhammad Abdullah Hanif, Faiq Khalid, George Plastiras, C. Kyrkou, T. Theocharides, M. Shafique

In the Machine Learning era, Deep Neural Networks (DNNs) have taken the spotlight, due to their unmatchable performance in several applications, such as image processing, computer vision, and natural language processing. However, as DNNs grow in their complexity, their associated energy consumption becomes a challenging problem. Such challenge heightens for edge computing, where the computing devices are resource-constrained while operating on limited energy budget. Therefore, specialized optimizations for deep learning have to be performed at both software and hardware levels. In this paper, we comprehensively survey the current trends of such optimizations and discuss key open research mid-term and long-term challenges.

然而，随着深度神经网络复杂性的增长，其相关的能量消耗成为一个具有挑战性的问题。对于边缘计算来说，这种挑战更加严峻，因为计算设备在有限的能源预算下运行时资源受限。因此，深度学习的专门优化必须在软件和硬件层面进行。在本文中，我们全面调查了这类优化的当前趋势，并讨论了重点开放研究的中期和长期挑战。

引用次数: 67

Towards Efficient Compact Network Training on Edge-Devices 基于边缘设备的高效紧凑网络训练

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00020

Feng Xiong, Fengbin Tu, S. Yin, Shaojun Wei

Currently, there is a trend to deploy training on edge devices, which is crucial to future AI applications in various scenarios with transfer and online learning demands. Specifically, there may be a severe degradation of accuracy when directly deploying the trained models on edge devices, because the local environment forms an edge local dataset that is often different from the generic dataset. However, training on edge devices with limited computing and memory capability is a challenge problem. In this paper, we propose a novel quantization training framework for efficient compact network training on edge devices. Firstly, training-aware symmetric quantization is introduced to quantize all of the data types in the training process. Then, channel-wise quantization method is adopted for comapact network quantization, which has significantly high tolerance to quantization errors and can make the training process more stable. For further efficient training, we build a hardware evaluation platform to evaluate different settings of the network, so as to achieve a better trade-off among accuracy, energy and latency. Finally, we evaluate two widely used compact networks on a domain adaptation dataset for image classification, and the results demonstrate that the proposed methods can allow us achieve an improvement of 8.4 × -17.2× in energy reduction and 11.9 × -16.3× in latency reduction compared with 32-bit implementations, while maintaining the classification accuracy.

目前，在边缘设备上部署培训是一种趋势，这对于未来具有迁移和在线学习需求的各种场景中的人工智能应用至关重要。具体来说，当直接在边缘设备上部署训练好的模型时，可能会严重降低准确性，因为局部环境形成的边缘局部数据集通常不同于通用数据集。然而，在计算和存储能力有限的边缘设备上进行训练是一个具有挑战性的问题。本文提出了一种新的量化训练框架，用于在边缘设备上进行高效的紧凑网络训练。首先，引入训练感知对称量化，对训练过程中的所有数据类型进行量化;然后，采用通道量化方法对紧致网络进行量化，对量化误差的容忍度显著提高，使训练过程更加稳定。为了进一步提高训练效率，我们建立了一个硬件评估平台来评估不同的网络设置，从而在准确率、能量和延迟之间实现更好的权衡。最后，我们在一个领域自适应数据集上对两种广泛使用的压缩网络进行了评估，结果表明，与32位实现相比，我们提出的方法在保持分类精度的同时，可以实现8.4 × -17.2×的能量减少和11.9 × -16.3×的延迟减少。

{"title":"Towards Efficient Compact Network Training on Edge-Devices","authors":"Feng Xiong, Fengbin Tu, S. Yin, Shaojun Wei","doi":"10.1109/ISVLSI.2019.00020","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00020","url":null,"abstract":"Currently, there is a trend to deploy training on edge devices, which is crucial to future AI applications in various scenarios with transfer and online learning demands. Specifically, there may be a severe degradation of accuracy when directly deploying the trained models on edge devices, because the local environment forms an edge local dataset that is often different from the generic dataset. However, training on edge devices with limited computing and memory capability is a challenge problem. In this paper, we propose a novel quantization training framework for efficient compact network training on edge devices. Firstly, training-aware symmetric quantization is introduced to quantize all of the data types in the training process. Then, channel-wise quantization method is adopted for comapact network quantization, which has significantly high tolerance to quantization errors and can make the training process more stable. For further efficient training, we build a hardware evaluation platform to evaluate different settings of the network, so as to achieve a better trade-off among accuracy, energy and latency. Finally, we evaluate two widely used compact networks on a domain adaptation dataset for image classification, and the results demonstrate that the proposed methods can allow us achieve an improvement of 8.4 × -17.2× in energy reduction and 11.9 × -16.3× in latency reduction compared with 32-bit implementations, while maintaining the classification accuracy.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"306 1","pages":"61-67"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77127026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Effect of Loop Positions on Reliability and Attack Resistance of Feed-Forward PUFs 环路位置对前馈puf可靠性和抗攻击能力的影响

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00073

S. V. S. Avvaru, K. Parhi

In this paper, we study multiplexer (MUX) based feed-forward (FF) physical unclonable functions (FF PUFs) with 64 stages. This paper provides the first systematic empirical analysis of the effect of FF PUF design choices on their performance by evaluating various FF PUF structures in terms of their reliability and attack resistance. To this end, the change in reliability is studied by varying the location of FF loops and varying the number of loops within the circuit. It is observed adding more loops and arbiters makes PUFs more susceptible to noise; FF PUFs with 5 intermediate arbiters can have reliability values that are as low as 81%. It is further demonstrated that a soft-response thresholding strategy can significantly increase the reliability during authentication to more than 96%. We also show that attack resistance can change as a consequence of relative positioning of the FF loops. In case of double-loop FF PUFs (one intermediate arbiter with two utputs), it is shown that appropriately choosing the input and output locations of the FF loops, the number of challenge-response pairs required to attack can be increased by 7 times and can be further increased by 15 times if two intermediate arbiters are used.

本文研究了基于多路复用器(MUX)的64级前馈(FF)物理不可克隆函数(FF PUFs)。本文通过评估各种FF PUF结构的可靠性和抗攻击能力，首次系统地实证分析了FF PUF设计选择对其性能的影响。为此，通过改变FF回路的位置和改变电路内回路的数量来研究可靠性的变化。观察到，增加更多的环路和仲裁器使puf更容易受到噪声的影响;具有5个中间仲裁器的FF puf的可靠性值可低至81%。进一步证明，采用软响应阈值策略可以将认证过程中的可靠性显著提高到96%以上。我们还表明，攻击阻力可以随着FF回路的相对位置而改变。对于双环FF puf(一个中间仲裁器有两个输出)，研究表明，适当选择FF回路的输入和输出位置，攻击所需的挑战响应对数量可以增加7倍，如果使用两个中间仲裁器，则可以进一步增加15倍。

{"title":"Effect of Loop Positions on Reliability and Attack Resistance of Feed-Forward PUFs","authors":"S. V. S. Avvaru, K. Parhi","doi":"10.1109/ISVLSI.2019.00073","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00073","url":null,"abstract":"In this paper, we study multiplexer (MUX) based feed-forward (FF) physical unclonable functions (FF PUFs) with 64 stages. This paper provides the first systematic empirical analysis of the effect of FF PUF design choices on their performance by evaluating various FF PUF structures in terms of their reliability and attack resistance. To this end, the change in reliability is studied by varying the location of FF loops and varying the number of loops within the circuit. It is observed adding more loops and arbiters makes PUFs more susceptible to noise; FF PUFs with 5 intermediate arbiters can have reliability values that are as low as 81%. It is further demonstrated that a soft-response thresholding strategy can significantly increase the reliability during authentication to more than 96%. We also show that attack resistance can change as a consequence of relative positioning of the FF loops. In case of double-loop FF PUFs (one intermediate arbiter with two utputs), it is shown that appropriately choosing the input and output locations of the FF loops, the number of challenge-response pairs required to attack can be increased by 7 times and can be further increased by 15 times if two intermediate arbiters are used.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"76 1","pages":"366-371"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78688023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Neuromorphic Image Sensor Design with Region-Aware Processing 基于区域感知处理的神经形态图像传感器设计

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00089

Md Jubaer Hossain Pantho, Pankaj Bhowmik, C. Bobda

This paper presents a pixel parallel architecture of a neuromorphic image sensor, designed as a 3D bottom-up architecture composing of several computational planes where each plane performs different image processing algorithms. The model emulates the hierarchical process in biological vision by providing feedforward and feedback information flow between different planes. The on-chip attention module dynamically detects regions with relevant information and produces a feedback path to sample those regions with a higher clock frequency, whereas regions with low spatial and temporal information receive less attention. The results suggest that by sampling non-relevant regions with a lower frequency, the sensor can reduce redundancy and enable high-performance computing at low power. Furthermore, by deploying high-level reasoning only on the selected regions instead of the entire image the model can decrease computational expenses.

本文提出了一种神经形态图像传感器的像素并行架构，该架构由多个计算平面组成，每个计算平面执行不同的图像处理算法。该模型通过提供不同平面之间的前馈和反馈信息流，模拟了生物视觉中的分层过程。片上注意力模块动态检测具有相关信息的区域，并产生反馈路径对时钟频率较高的区域进行采样，而时空信息较低的区域受到的关注较少。结果表明，通过以较低的频率采样非相关区域，传感器可以减少冗余并实现低功耗下的高性能计算。此外，通过仅在选定区域而不是整个图像上部署高级推理，该模型可以减少计算开销。

引用次数: 1

Focus on What is Needed: Area and Power Efficient FPGAs Using Turn-Restricted Switch Boxes 重点是什么需要:面积和功率效率的fpga使用受限开关箱

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00115

Fatemeh Serajeh-hassani, Mohammad Sadrosadati, S. Pointner, R. Wille, H. Sarbazi-Azad

Field-Programmable Gate Arrays (FPGAs) employ a significant amount of SRAM cells in order to provide a flexible routing architecture. While this flexibility allows for a rather easy realization of arbitrary functionality, the respectively required cells significantly increase the area and power consumption of the FPGA. At the same time, it can be observed that full routing flexibility is frequently not needed in order to efficiently realize the desired functionality. In this work, we are proposing an FPGA realization which focuses on what is needed and realizes only a subset of the possible routing options using what we call Turn-Restricted Switch-Boxes. While this may yield a slight decrease in the run-time performance of the realized functionality, it allows for substantial improvements with respect to area and power consumption. In fact, experimental evaluations confirm that area and power can be reduced by more than 40% and 60%, respectively, in the best cases. The performance overhead is negligible (up to 3%), on average.

现场可编程门阵列(fpga)采用大量的SRAM单元，以提供灵活的路由架构。虽然这种灵活性允许相当容易地实现任意功能，但各自所需的单元显着增加了FPGA的面积和功耗。同时，可以观察到，为了有效地实现期望的功能，通常不需要完全的路由灵活性。在这项工作中，我们提出了一种FPGA实现，它专注于需要什么，并且只实现了使用我们称为匝限开关盒的可能路由选项的一个子集。虽然这可能会导致所实现功能的运行时性能略有下降，但它允许在面积和功耗方面进行实质性改进。事实上，实验评估证实，在最好的情况下，面积和功率分别可以减少40%和60%以上。平均而言，性能开销可以忽略不计(最多3%)。

引用次数: 0

Memory Locking: An Automated Approach to Processor Design Obfuscation 内存锁定:处理器设计混淆的一种自动化方法

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00103

Michael Zuzak, Ankur Srivastava

Conventional logic obfuscation techniques largely focus on locking the functionality of combinational modules. However, for processor design obfuscation, module-level errors are tangential to the fundamental adversarial goal: to produce a processor capable of running useful applications. As noted in previous work such as SFLL, module-level locking poses the following problem: high corruption in a locked module results in a high application-level error rate, but fundamentally leads to SAT attack susceptibility. Therefore, for combinational, module-level locking, increases in application-level error rates are accompanied by a corresponding increase in SAT susceptibility and vice versa. To address this, we introduce an automated and attack-resistant obfuscation technique, called memory locking, which targets on-chip SRAM. We demonstrate the application-level effectiveness of memory locking through system-level simulations of obfuscated processors.

传统的逻辑混淆技术主要侧重于锁定组合模块的功能。然而，对于处理器设计混淆而言，模块级错误与基本的对抗目标(生成能够运行有用应用程序的处理器)相去甚远。正如之前的工作(如SFLL)所指出的，模块级锁定带来了以下问题:锁定模块的高损坏导致高应用级错误率，但从根本上导致SAT攻击的易感性。因此，对于组合的模块级锁定，应用级错误率的增加伴随着SAT敏感性的相应增加，反之亦然。为了解决这个问题，我们引入了一种自动化和抗攻击的混淆技术，称为内存锁定，其目标是片上SRAM。我们通过系统级模拟混淆处理器来演示内存锁定的应用级有效性。

引用次数: 4

Machine Learning Based IoT Edge Node Security Attack and Countermeasures 基于机器学习的物联网边缘节点安全攻击及对策

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00124

Vishalini R. Laguduva, S. A. Islam, Sathyanarayanan N. Aakur, S. Katkoori, Robert Karam

Advances in technology have enabled tremendous progress in the development of a highly connected ecosystem of ubiquitous computing devices collectively called the Internet of Things (IoT). Ensuring the security of IoT devices is a high priority due to the sensitive nature of the collected data. Physically Unclonable Functions (PUFs) have emerged as critical hardware primitive for ensuring the security of IoT nodes. Malicious modeling of PUF architectures has proven to be difficult due to the inherently stochastic nature of PUF architectures. Extant approaches to malicious PUF modeling assume that a priori knowledge and physical access to the PUF architecture is available for malicious attack on the IoT node. However, many IoT networks make the underlying assumption that the PUF architecture is sufficiently tamper-proof, both physically and mathematically. In this work, we show that knowledge of the underlying PUF structure is not necessary to clone a PUF. We present a novel non-invasive, architecture independent, machine learning attack for strong PUF designs with a cloning accuracy of 93.5% and improvements of up to 48.31% over an alternative, two-stage brute force attack model. We also propose a machine-learning based countermeasure, discriminator, which can distinguish cloned PUF devices and authentic PUFs with an average accuracy of 96.01%. The proposed discriminator can be used for rapidly authenticating millions of IoT nodes remotely from the cloud server.

技术的进步使无处不在的计算设备的高度连接生态系统的发展取得了巨大进展，统称为物联网(IoT)。由于所收集数据的敏感性，确保物联网设备的安全性是重中之重。物理不可克隆功能(puf)已成为确保物联网节点安全的关键硬件原语。由于PUF体系结构固有的随机性，对PUF体系结构进行恶意建模已被证明是困难的。现有的恶意PUF建模方法假设对PUF架构的先验知识和物理访问可用于对物联网节点的恶意攻击。然而，许多物联网网络都假设PUF架构在物理和数学上都是足够防篡改的。在这项工作中，我们表明克隆PUF并不需要了解底层PUF结构。我们提出了一种新颖的非侵入性，架构独立的机器学习攻击，用于强PUF设计，克隆精度为93.5%，比另一种两阶段蛮力攻击模型提高了48.31%。我们还提出了一种基于机器学习的对策——鉴别器，它可以区分克隆PUF设备和真实PUF设备，平均准确率为96.01%。所提出的鉴别器可用于从云服务器远程快速认证数百万个物联网节点。

{"title":"Machine Learning Based IoT Edge Node Security Attack and Countermeasures","authors":"Vishalini R. Laguduva, S. A. Islam, Sathyanarayanan N. Aakur, S. Katkoori, Robert Karam","doi":"10.1109/ISVLSI.2019.00124","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00124","url":null,"abstract":"Advances in technology have enabled tremendous progress in the development of a highly connected ecosystem of ubiquitous computing devices collectively called the Internet of Things (IoT). Ensuring the security of IoT devices is a high priority due to the sensitive nature of the collected data. Physically Unclonable Functions (PUFs) have emerged as critical hardware primitive for ensuring the security of IoT nodes. Malicious modeling of PUF architectures has proven to be difficult due to the inherently stochastic nature of PUF architectures. Extant approaches to malicious PUF modeling assume that a priori knowledge and physical access to the PUF architecture is available for malicious attack on the IoT node. However, many IoT networks make the underlying assumption that the PUF architecture is sufficiently tamper-proof, both physically and mathematically. In this work, we show that knowledge of the underlying PUF structure is not necessary to clone a PUF. We present a novel non-invasive, architecture independent, machine learning attack for strong PUF designs with a cloning accuracy of 93.5% and improvements of up to 48.31% over an alternative, two-stage brute force attack model. We also propose a machine-learning based countermeasure, discriminator, which can distinguish cloned PUF devices and authentic PUFs with an average accuracy of 96.01%. The proposed discriminator can be used for rapidly authenticating millions of IoT nodes remotely from the cloud server.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"670-675"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84070138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀