2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)最新文献

英文中文

Improving Logic Optimization in Sequential Circuits using Majority-inverter Graphs 利用多数逆变器图改进顺序电路的逻辑优化

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00049

Walter Lau Neto, Xifan Tang, Max Austin, L. Amarù, P. Gaillardon

Majority-inverter graph (MIG) is a recently introduced Boolean network that enables efficient logic manipulation. Recent works show that MIGs are capable of achieving significant improvements in area, delay, and power when comparing to current academic and commercial tools. However, current MIG optimizations are limited to combinational circuits, missing the sequential elements which are ubiquitous in practical implementations. This paper is the first to study the sequential optimization opportunities using MIGs. The presented extension leverages the efficiency of MIGs area and depth-oriented rewriting algorithms for combinational circuits in sequential networks. Experimental results showed that, averaged over the OpenCores benchmark suite, (1) when considering technology-independent evaluations, compared to a popular academic tool, our MIG-based sequential optimization brings an improvement of 9% and 38% in area and delay respectively; (2) when using a standard optimization+technology mapping flow for ASICs with a 7nm predictive standard cell library, the proposed sequential optimizer outperforms both academic and commercial tools in energy-delay product (EDP) by 12% and 4% respectively and area-delay product (ADP) by 13% and 7% respectively.

多数逆变器图(MIG)是最近引入的一种布尔网络，可以实现高效的逻辑操作。最近的研究表明，与目前的学术和商业工具相比，米格战斗机能够在面积、延迟和功率方面取得重大改进。然而，目前的MIG优化仅限于组合电路，缺少在实际实现中普遍存在的顺序元素。本文首次利用mig模型研究了序列优化机会。所提出的扩展利用了顺序网络中组合电路的MIGs区域和面向深度重写算法的效率。实验结果表明，在OpenCores基准测试套件的平均值上，(1)在考虑与技术无关的评估时，与流行的学术工具相比，我们基于mig的顺序优化在面积和延迟方面分别提高了9%和38%;(2)当使用标准优化+技术映射流程用于具有7nm预测标准单元库的asic时，所提出的顺序优化器在能量延迟积(EDP)方面分别优于学术和商业工具12%和4%，在面积延迟积(ADP)方面分别优于学术和商业工具13%和7%。

{"title":"Improving Logic Optimization in Sequential Circuits using Majority-inverter Graphs","authors":"Walter Lau Neto, Xifan Tang, Max Austin, L. Amarù, P. Gaillardon","doi":"10.1109/ISVLSI.2019.00049","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00049","url":null,"abstract":"Majority-inverter graph (MIG) is a recently introduced Boolean network that enables efficient logic manipulation. Recent works show that MIGs are capable of achieving significant improvements in area, delay, and power when comparing to current academic and commercial tools. However, current MIG optimizations are limited to combinational circuits, missing the sequential elements which are ubiquitous in practical implementations. This paper is the first to study the sequential optimization opportunities using MIGs. The presented extension leverages the efficiency of MIGs area and depth-oriented rewriting algorithms for combinational circuits in sequential networks. Experimental results showed that, averaged over the OpenCores benchmark suite, (1) when considering technology-independent evaluations, compared to a popular academic tool, our MIG-based sequential optimization brings an improvement of 9% and 38% in area and delay respectively; (2) when using a standard optimization+technology mapping flow for ASICs with a 7nm predictive standard cell library, the proposed sequential optimizer outperforms both academic and commercial tools in energy-delay product (EDP) by 12% and 4% respectively and area-delay product (ADP) by 13% and 7% respectively.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"28 1","pages":"224-229"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77656693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Dark-Silicon Inspired Energy Efficient Hierarchical TDM NoC 暗硅启发的节能分层TDM NoC

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00030

Salma Hesham, D. Göhringer, M. A. E. Ghany

In this paper, we propose a dark-silicon inspired hierarchical Time-Division-Multiplexing (TDM) network-on-chip (NoC) with online distributed setup-scheme for slots allocation. In addition to the normal mesh routers, we propose hierarchical routers, making use of the dim silicon parts of the chip, to hierarchically connect quad-routers units. Normal routers operate at full-chip-frequency at supply of 1. 2V, while hierarchical routers operate at half-chip-frequency and supply of 0.8V with double datawidth and half slot-size. Routers follow a proposed architecture that distinguishes between data-path and control-setup sub-routers. This allows separate clocking and operating supplies between data and control and to keep the control as a single-slot-cycle design independent of the data slot size. The proposed NoC architecture as well as a base NoC architecture from state-of-the-art are evaluated under uniform random traffic using Synopsys VCS and synthesized using Synopsys Design Compiler for SAED90nm technology. With the same power budget of the base NoC, the proposed architecture provides up to 74% improved setup latency, 32% increased NoC saturation load, and 21% higher success rates. The proposed hierarchical quad is based on leveraging the dim silicon parts of the chip for an energy efficient design. Though it consumes 1.78 times the area of the base quad, however with 56% under-clocked area operating at half the maximum chip frequency; thus reducing the power density to 52% of the base NoC.

在本文中，我们提出了一种受暗硅启发的分层时分复用(TDM)片上网络(NoC)，该网络具有在线分布式的插槽分配方案。除了普通的网状路由器外，我们还提出了分层路由器，利用芯片的暗硅部分分层连接四路路由器单元。普通路由器在电源1时以全芯片频率工作。而分层路由器工作在半芯片频率，0.8V电源，数据宽度加倍，插槽大小减半。路由器遵循一种区分数据路径子路由器和控制设置子路由器的拟议架构。这允许在数据和控制之间单独的时钟和操作电源，并保持控制作为独立于数据插槽大小的单插槽周期设计。建议的NoC架构以及最先进的基本NoC架构使用Synopsys VCS在均匀随机流量下进行评估，并使用Synopsys Design Compiler针对SAED90nm技术进行合成。在与基础NoC相同的功耗预算下，所提出的架构可将设置延迟提高74%，将NoC饱和负载提高32%，成功率提高21%。提出的分层四边形是基于利用芯片的暗淡硅部分进行节能设计。虽然它消耗了1.78倍的基本四分之一的面积，但56%的低时钟面积在最大芯片频率的一半下工作;从而将功率密度降低到基础NoC的52%。

{"title":"Dark-Silicon Inspired Energy Efficient Hierarchical TDM NoC","authors":"Salma Hesham, D. Göhringer, M. A. E. Ghany","doi":"10.1109/ISVLSI.2019.00030","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00030","url":null,"abstract":"In this paper, we propose a dark-silicon inspired hierarchical Time-Division-Multiplexing (TDM) network-on-chip (NoC) with online distributed setup-scheme for slots allocation. In addition to the normal mesh routers, we propose hierarchical routers, making use of the dim silicon parts of the chip, to hierarchically connect quad-routers units. Normal routers operate at full-chip-frequency at supply of 1. 2V, while hierarchical routers operate at half-chip-frequency and supply of 0.8V with double datawidth and half slot-size. Routers follow a proposed architecture that distinguishes between data-path and control-setup sub-routers. This allows separate clocking and operating supplies between data and control and to keep the control as a single-slot-cycle design independent of the data slot size. The proposed NoC architecture as well as a base NoC architecture from state-of-the-art are evaluated under uniform random traffic using Synopsys VCS and synthesized using Synopsys Design Compiler for SAED90nm technology. With the same power budget of the base NoC, the proposed architecture provides up to 74% improved setup latency, 32% increased NoC saturation load, and 21% higher success rates. The proposed hierarchical quad is based on leveraging the dim silicon parts of the chip for an energy efficient design. Though it consumes 1.78 times the area of the base quad, however with 56% under-clocked area operating at half the maximum chip frequency; thus reducing the power density to 52% of the base NoC.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"87 1","pages":"116-121"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78266186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Linear Optimization for Memristive Device in Neuromorphic Hardware 神经形态硬件中记忆器件的线性优化

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00088

Jingyan Fu, Zhiheng Liao, Na Gong, Jinhui Wang

Memristors offer advantages as a hardware solution for neuromorphic computing, however, their nonlinear property makes the weight update difficult and reduces the accuracy of a neural network. A piecewise linear (PL) method is proposed in this paper to mitigate the nonlinear effect of memristors by calculating the weight update parameters along a piecewise line, which reduces errors in the weight update process. It is a simple but efficient method for the nonlinearity mitigation without reading the current conductance of the memristor in each updating, thereby avoiding complex peripheral circuits. The PL methods with respectively with 2-segment, 3-segment, and 4-segment models in two split points selection strategies are investigated, and the results show that under different nonlinearity, the PL method improves the recognition accuracy of MNIST handwriting digits to 87.87%-95.05%, as compared to 10.77%-73.18% of the cases without PL method. Finally, it concludes that the more segments in PL methods, the less weight deviation caused by the non-linearity of the synapse device.

记忆电阻器作为神经形态计算的硬件解决方案具有优势，但其非线性特性使得权重更新困难，降低了神经网络的精度。本文提出了一种分段线性(PL)方法，通过沿分段线计算权值更新参数来减轻忆阻器的非线性影响，从而减小了权值更新过程中的误差。这是一种简单而有效的非线性缓解方法，无需在每次更新时读取忆阻器的电流电导，从而避免了复杂的外围电路。研究了2段、3段和4段三种分割点选择策略下的分割点识别方法，结果表明:在不同非线性下，分割点识别方法对MNIST手写数字的识别准确率为87.87% ~ 95.05%，而未分割点识别方法的识别准确率为10.77% ~ 73.18%。最后得出结论:在PL方法中，节段越多，由突触器件非线性引起的权重偏差越小。

引用次数: 4

TrustFlow: A Trusted Memory Support for Data Flow Integrity TrustFlow:支持数据流完整性的可信内存

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00063

C. Bresch, D. Hély, Stéphanie Chollet, I. Parissis

With the emergence of Internet of Things (IoT), embedded computing cores tend to be increasingly used to handle critical applications. In order to avoid faulty scenarios on these devices, there is a need to provide extra hardware support against memory corruption bug exploits. To address this issue, the presented paper provides a new efficient fine-grained data flow integrity mechanism based on a translation lookaside buffer. The concept is validated by extending the RISC-V instruction set and implementing it on a Digilent Xilinx Arty-35T board. The obtained results show that the contribution extends few features in the processor pipeline, the compiler and does not induce any software overhead at run-time.

随着物联网(IoT)的出现，嵌入式计算核心越来越多地用于处理关键应用。为了避免在这些设备上出现错误情况，需要提供额外的硬件支持，以防止内存损坏漏洞的利用。为了解决这一问题，本文提出了一种基于翻译暂置缓冲区的高效细粒度数据流完整性机制。该概念通过扩展RISC-V指令集并在Digilent Xilinx Arty-35T板上实现来验证。得到的结果表明，该贡献扩展了处理器管道和编译器的一些特性，并且在运行时不会引起任何软件开销。

引用次数: 2

Modeling Hardware Trojans in 3D ICs 3D集成电路中的硬件木马建模

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00093

Zhiming Zhang, Qiaoyan Yu

Three-dimensional (3D) integration facilitates to integrate increasing number of transistors into a single package. Despite of improved performance and power efficiency, the integration of multiple dies into the same package potentially leads to new security threats, such as 3D hardware Trojans. In this work, we first provide a thorough survey of reported hardware Trojans in 3D integrated circuits and systems, and then propose comprehensive 3D hardware Trojan models. A case study is performed to verify the implementation feasibility of thermal-triggered 3D Trojan. The activation speed of the 3D Trojan is compared to its 2D counterpart to confirm that 3D IC provides a better environment to hide thermal Trojans.

三维(3D)集成有助于将越来越多的晶体管集成到单个封装中。尽管提高了性能和功率效率，但将多个芯片集成到同一个封装中可能会导致新的安全威胁，例如3D硬件木马。在这项工作中，我们首先对3D集成电路和系统中的硬件木马进行了全面的调查，然后提出了全面的3D硬件木马模型。通过实例验证了热触发三维木马的实现可行性。3D木马的激活速度与2D木马的激活速度进行了比较，以证实3D IC为隐藏热木马提供了更好的环境。

引用次数: 5

Morphed Standard Cell Layouts for Pin Length Reduction 变形的标准单元布局引脚长度减少

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00025

Cheng-Wei Tai, Rung-Bin Lin

In this article we present a concept called morphed layouts which are layouts of a standard cell with different footprints on the pins of each layout. We propose two approaches to exploiting morphed layouts for pin length reduction. The first approach is performed after placement but before routing. This approach enables design space exploration to seek best trade-off between total wire length and via count. It can obtain better results than the previous work when dealing with large circuits. The second approach is applied to a routed design, which can always achieve pin length reduction without via count increase. It can on average reduce total pin length by 12.1% and total wire length by 3.4%.

在本文中，我们提出了一个称为变形布局的概念，它是一个标准单元的布局，每个布局的引脚上都有不同的足迹。我们提出了两种利用变形布局来减少引脚长度的方法。第一种方法在放置之后但在路由之前执行。这种方法使设计空间探索能够在总导线长度和通径数之间寻求最佳权衡。在处理大型电路时，可以获得比以往更好的结果。第二种方法应用于路由设计，它总是可以实现引脚长度减少而不增加通过计数。平均可使总引脚长度减少12.1%，总导线长度减少3.4%。

引用次数: 4

CAESAR-MPSoC: Dynamic and Efficient MPSoC Security Zones CAESAR-MPSoC:动态和高效的MPSoC安全区域

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00092

Siavoosh Payandeh Azad, G. Jervan, Michael Tempelmeier, Martha Johanna Sepúlveda

Dynamic security zones in Multiprocessor System-on-Chip (MP-SoC) has been used to isolate sensitive applications from possible attackers. These physical wrappers are usually configured through programmable hardware firewalls. Previous works have shown the efficiency of this security mechanism against a wide variety of attacks. However, the security zone configuration is performed in an unprotected way, exposing the system to attacks caused by rogue firewall update. In this work we propose CAESAR-MPSoC, an enhanced MPSoC able to ensure the protected configuration of the firewalls through encrypted and authenticated reconfiguration packets. To this end, we present two contributions. First, we integrate two CAESAR (Competition for Authenticated Encryption: Security, Applicability, and Robustness) hardware IP cores, ASCON and AEGIS, into MPSoCs. Second, we developed a light-weight interface that allows to plug-in the different CAESAR cores into MPSoC environment. Third, we show the protected configuration of security zones. Fourth, we evaluate the security, area and cost of CAESAR-MPSoC. The results show that our solution is feasible and effective to allow the protected and efficient security zone configuration.

多处理器片上系统(MP-SoC)中的动态安全区域已被用于隔离敏感应用程序与可能的攻击者。这些物理包装器通常通过可编程硬件防火墙进行配置。以前的工作已经证明了这种安全机制对各种攻击的有效性。但是，在不受保护的情况下进行安全区域配置，使系统容易受到恶意更新防火墙的攻击。在这项工作中，我们提出了CAESAR-MPSoC，这是一种增强型MPSoC，能够通过加密和认证的重新配置数据包来确保防火墙的保护配置。为此，我们提出两项贡献。首先，我们将两个CAESAR(竞争认证加密:安全性，适用性和鲁棒性)硬件IP核，ASCON和AEGIS集成到mpsoc中。其次，我们开发了一个轻量级接口，允许将不同的CAESAR内核插入MPSoC环境中。第三，我们展示了安全区域的受保护配置。第四，我们评估了CAESAR-MPSoC的安全性、面积和成本。结果表明，该解决方案是可行和有效的，可以实现安全区域的保护和高效配置。

{"title":"CAESAR-MPSoC: Dynamic and Efficient MPSoC Security Zones","authors":"Siavoosh Payandeh Azad, G. Jervan, Michael Tempelmeier, Martha Johanna Sepúlveda","doi":"10.1109/ISVLSI.2019.00092","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00092","url":null,"abstract":"Dynamic security zones in Multiprocessor System-on-Chip (MP-SoC) has been used to isolate sensitive applications from possible attackers. These physical wrappers are usually configured through programmable hardware firewalls. Previous works have shown the efficiency of this security mechanism against a wide variety of attacks. However, the security zone configuration is performed in an unprotected way, exposing the system to attacks caused by rogue firewall update. In this work we propose CAESAR-MPSoC, an enhanced MPSoC able to ensure the protected configuration of the firewalls through encrypted and authenticated reconfiguration packets. To this end, we present two contributions. First, we integrate two CAESAR (Competition for Authenticated Encryption: Security, Applicability, and Robustness) hardware IP cores, ASCON and AEGIS, into MPSoCs. Second, we developed a light-weight interface that allows to plug-in the different CAESAR cores into MPSoC environment. Third, we show the protected configuration of security zones. Fourth, we evaluate the security, area and cost of CAESAR-MPSoC. The results show that our solution is feasible and effective to allow the protected and efficient security zone configuration.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"27 1","pages":"477-482"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79016018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Optimization of Comparator Selection Algorithm for TIQ Flash ADC Using Dynamic Programming Approach TIQ Flash ADC比较器选择算法的动态规划优化

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00095

Ali Ozdemir, Mshabab Alrizah, Kyusun Choi

A Threshold Inverter Quantization (TIQ) architecture for Flash Analog to Digital Converters (ADCs) uses inverters as a voltage comparator. TIQ approach has many advantages over a differential voltage comparator, but it is hard to create and select comparators for it. Precise selection of gate switching voltage is crucial for Flash Analog to Digital Converters (ADCs). Therefore, Differential Non-Linearity (DNL) and Integral Non-Linearity (INL) error measurements are used to understand how precisely voltage comparators are selected. Different selection algorithms are used to make selection as precise as possible. In this work, we present two new algorithms based on a dynamic programming approach along with DNL and INL simulation results. Comparison with state-of-the-art methods, 4 times, and 5 times DNL improvements are achieved through the new approach for 6-bit and 8-bit respectively.

一种用于闪存模数转换器(adc)的阈值逆变量化(TIQ)架构使用逆变器作为电压比较器。TIQ方法比差分电压比较器有许多优点，但是很难为它创建和选择比较器。栅极开关电压的精确选择是Flash模数转换器(adc)的关键。因此，微分非线性(DNL)和积分非线性(INL)误差测量用于了解如何精确地选择电压比较器。使用不同的选择算法使选择尽可能精确。在这项工作中，我们提出了两种基于动态规划方法的新算法以及DNL和INL仿真结果。与现有方法相比，新方法在6位和8位上分别实现了4倍和5倍的DNL改进。

引用次数: 4

PageCmp: Bandwidth Efficient Page Deduplication through In-memory Page Comparison PageCmp:通过内存中页面比较的带宽高效页面重复数据删除

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00023

Mehrnoosh Raoufi, Quan Deng, Youtao Zhang, Jun Yang

KSM-based page deduplication is an important Linux system service for reducing main memory consumption on cloud servers. However, it tends to incur large computation and memory bandwidth overheads. Recently proposed hardware-assisted KSM approaches, while effectively addressing the computation overhead, still need to consume a dramatic amount of off-chip memory bandwidth. In this paper, we propose PageCmp, a PIM (Processing-In-Memory) based page deduplication approach, to achieve bandwidth efficiency on cloud servers. PageCmp exploits the bitwise operation capability inside the DRAM cell array to enable fast page comparison. By integrating a lightweight local comparator inside the output buffer of DRAM modules, PageCmp sends only the page comparison result back to the processor. Our experimental results show that, comparing to the state-of-the-art, PageCmp achieves 4x memory bandwidth reduction while introducing less than 1% hardware overhead.

基于ksm的页面重复删除是一项重要的Linux系统服务，用于减少云服务器上的主内存消耗。然而，它往往会导致大量的计算和内存带宽开销。最近提出的硬件辅助KSM方法虽然有效地解决了计算开销，但仍然需要消耗大量的片外内存带宽。在本文中，我们提出了PageCmp，一种基于PIM(内存中处理)的页面重复数据删除方法，以实现云服务器上的带宽效率。PageCmp利用DRAM单元阵列内部的位操作能力来实现快速的页面比较。通过在DRAM模块的输出缓冲区中集成一个轻量级的本地比较器，PageCmp只将页面比较结果发送回处理器。我们的实验结果表明，与最先进的技术相比，PageCmp实现了4倍的内存带宽减少，同时引入的硬件开销不到1%。

引用次数: 4

Towards Efficient On-Board Deployment of DNNs on Intelligent Autonomous Systems 智能自治系统上dnn的有效机载部署

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-15 DOI: 10.1109/ISVLSI.2019.00107

Alexandros Kouris, Stylianos I. Venieris, C. Bouganis

With their unprecedented performance in major AI tasks, deep neural networks (DNNs) have emerged as a primary building block in modern autonomous systems. Intelligent systems such as drones, mobile robots and driverless cars largely base their perception, planning and application-specific tasks on DNN models. Nevertheless, due to the nature of these applications, such systems require on-board local processing in order to retain their autonomy and meet latency and throughput constraints. In this respect, the large computational and memory demands of DNN workloads pose a significant barrier on their deployment on the resource-and power-constrained compute platforms that are available on-board. This paper presents an overview of recent methods and hardware architectures that address the system-level challenges of modern DNN-enabled autonomous systems at both the algorithmic and hardware design level. Spanning from latency-driven approximate computing techniques to high-throughput mixed-precision cascaded classifiers, the presented set of works paves the way for the on-board deployment of sophisticated DNN models on robots and autonomous systems.

深度神经网络(dnn)在重大人工智能任务中的表现前所未有，已成为现代自主系统的主要组成部分。无人机、移动机器人和无人驾驶汽车等智能系统的感知、规划和特定应用任务主要基于深度神经网络模型。然而，由于这些应用程序的性质，这些系统需要机载本地处理，以保持其自主性并满足延迟和吞吐量限制。在这方面，DNN工作负载的大量计算和内存需求对它们在可用的机载资源和功率受限的计算平台上的部署构成了重大障碍。本文概述了在算法和硬件设计层面解决现代dnn自主系统系统级挑战的最新方法和硬件架构。从延迟驱动的近似计算技术到高通量混合精度级联分类器，所提出的一系列工作为在机器人和自主系统上部署复杂的深度神经网络模型铺平了道路。

{"title":"Towards Efficient On-Board Deployment of DNNs on Intelligent Autonomous Systems","authors":"Alexandros Kouris, Stylianos I. Venieris, C. Bouganis","doi":"10.1109/ISVLSI.2019.00107","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00107","url":null,"abstract":"With their unprecedented performance in major AI tasks, deep neural networks (DNNs) have emerged as a primary building block in modern autonomous systems. Intelligent systems such as drones, mobile robots and driverless cars largely base their perception, planning and application-specific tasks on DNN models. Nevertheless, due to the nature of these applications, such systems require on-board local processing in order to retain their autonomy and meet latency and throughput constraints. In this respect, the large computational and memory demands of DNN workloads pose a significant barrier on their deployment on the resource-and power-constrained compute platforms that are available on-board. This paper presents an overview of recent methods and hardware architectures that address the system-level challenges of modern DNN-enabled autonomous systems at both the algorithmic and hardware design level. Spanning from latency-driven approximate computing techniques to high-throughput mixed-precision cascaded classifiers, the presented set of works paves the way for the on-board deployment of sophisticated DNN models on robots and autonomous systems.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"9 1","pages":"568-573"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81896013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀