首页 > 最新文献

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献

英文 中文
RRAM-based reconfigurable in-memory computing architecture with hybrid routing 基于rram的混合路由可重构内存计算体系结构
Pub Date : 2017-11-13 DOI: 10.5555/3199700.3199770
Yue Zha, J. Li
Recent advances in resistive random-access memory (RRAM) evoke great interests in exploring alternative architectures. One interesting work is a RRAM-based reconfigurable architecture that provides superior programmbility and blurs the boundary between computation and storage, but long-distance routing becomes a performance bottleneck. However, long-distance routing in FPGA is efficiently implemented, but its fine-grained routing structure results in a large routing overhead. In this work, we present a RRAM-based reconfigurable architecture that addresses the routing challenges using hybrid routing, i.e., local and global routing by taking the best advantages of both architectures (prior RRAM-based and FPGA). We also provide a complete CAD framework that exhibits high parallelism and good scalability. Experimental results show that our reconfigurable architecture outperforms both architectures. It achieves a 46.88% reduction in delay and improves the energy efficiency by 66.23% compared with the prior RRAM-based architecture with a slightly increased area overhead. While comparing with FPGA, it reduces the delay and the routing overhead by 36.00% and 50.20%, respectively. Additionally, our CAD framework achieves 5.39x speedup, compared with the prior framework.
电阻式随机存取存储器(RRAM)的最新进展引起了人们对探索替代架构的极大兴趣。一项有趣的工作是基于ram的可重构架构,它提供了优越的可编程性,模糊了计算和存储之间的界限,但是长距离路由成为性能瓶颈。然而,在FPGA中实现长距离路由是有效的,但其细粒度的路由结构导致了较大的路由开销。在这项工作中,我们提出了一种基于rram的可重构架构,该架构利用混合路由解决路由挑战,即通过利用两种架构(先前基于rram和FPGA)的最佳优势,实现本地和全局路由。我们还提供了一个完整的CAD框架,具有高并行性和良好的可扩展性。实验结果表明,我们的可重构结构优于两种结构。与之前基于ram的架构相比,它实现了46.88%的延迟减少和66.23%的能源效率提高,但面积开销略有增加。与FPGA相比,时延和路由开销分别降低了36.00%和50.20%。此外,我们的CAD框架与之前的框架相比,实现了5.39倍的加速。
{"title":"RRAM-based reconfigurable in-memory computing architecture with hybrid routing","authors":"Yue Zha, J. Li","doi":"10.5555/3199700.3199770","DOIUrl":"https://doi.org/10.5555/3199700.3199770","url":null,"abstract":"Recent advances in resistive random-access memory (RRAM) evoke great interests in exploring alternative architectures. One interesting work is a RRAM-based reconfigurable architecture that provides superior programmbility and blurs the boundary between computation and storage, but long-distance routing becomes a performance bottleneck. However, long-distance routing in FPGA is efficiently implemented, but its fine-grained routing structure results in a large routing overhead. In this work, we present a RRAM-based reconfigurable architecture that addresses the routing challenges using hybrid routing, i.e., local and global routing by taking the best advantages of both architectures (prior RRAM-based and FPGA). We also provide a complete CAD framework that exhibits high parallelism and good scalability. Experimental results show that our reconfigurable architecture outperforms both architectures. It achieves a 46.88% reduction in delay and improves the energy efficiency by 66.23% compared with the prior RRAM-based architecture with a slightly increased area overhead. While comparing with FPGA, it reduces the delay and the routing overhead by 36.00% and 50.20%, respectively. Additionally, our CAD framework achieves 5.39x speedup, compared with the prior framework.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126673391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Virtual persistent cache: Remedy the long latency behavior of host-aware shingled magnetic recording drives 虚拟持久缓存:补救主机感知的带状磁记录驱动器的长延迟行为
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203755
Ming-Chang Yang, Yuan-Hao Chang, Fenggang Wu, Tei-Wei Kuo, D. Du
This paper presents a Virtual Persistent Cache design to remedy the long latency behavior of the Host-Aware Shingled Magnetic Recording (HA-SMR) drive. Our design keeps the cost-effective model of the existing HA-SMR drives, but at the same time asks the great help from the host system for adaptively providing some computing and management resources to improve the drive performance when needed. The technical contribution is to trick the HA-SMR drives by smartly reshaping the access patterns to HA-SMR drives, so as to avoid the occurrences of long latencies in most cases and thus to ultimately improve the drive performance and responsiveness. We conduct experiments on real Seagate 8 TB HA-SMR drives to demonstrate the advantages of Virtual Persistent Cache over the real workloads from Microsoft Research Cambridge. The results show that the proposed design can remedy most of the long latencies and improve the drive performance by at least 58.11%, under the evaluated workloads.
本文提出了一种虚拟持久缓存设计,以弥补主机感知的带状磁记录(HA-SMR)驱动器的长延迟行为。我们的设计既保留了现有HA-SMR驱动器的性价比模式,同时又要求主机系统在需要时自适应地提供一些计算和管理资源,以提高驱动器的性能。技术上的贡献是通过巧妙地重塑对HA-SMR驱动器的访问模式来欺骗HA-SMR驱动器,从而避免在大多数情况下出现长延迟,从而最终提高驱动器性能和响应性。我们在真正的希捷8 TB HA-SMR驱动器上进行了实验,以证明虚拟持久缓存相对于微软剑桥研究院的实际工作负载的优势。结果表明,在评估的工作负载下,所提出的设计可以弥补大部分长延迟,并将驱动器性能提高至少58.11%。
{"title":"Virtual persistent cache: Remedy the long latency behavior of host-aware shingled magnetic recording drives","authors":"Ming-Chang Yang, Yuan-Hao Chang, Fenggang Wu, Tei-Wei Kuo, D. Du","doi":"10.1109/ICCAD.2017.8203755","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203755","url":null,"abstract":"This paper presents a Virtual Persistent Cache design to remedy the long latency behavior of the Host-Aware Shingled Magnetic Recording (HA-SMR) drive. Our design keeps the cost-effective model of the existing HA-SMR drives, but at the same time asks the great help from the host system for adaptively providing some computing and management resources to improve the drive performance when needed. The technical contribution is to trick the HA-SMR drives by smartly reshaping the access patterns to HA-SMR drives, so as to avoid the occurrences of long latencies in most cases and thus to ultimately improve the drive performance and responsiveness. We conduct experiments on real Seagate 8 TB HA-SMR drives to demonstrate the advantages of Virtual Persistent Cache over the real workloads from Microsoft Research Cambridge. The results show that the proposed design can remedy most of the long latencies and improve the drive performance by at least 58.11%, under the evaluated workloads.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133660094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Toward safe interoperations in network connected medical cyber-physical systems using open-loop safe protocols 基于开环安全协议的联网医疗信息物理系统安全互操作研究
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203884
Andrew Y.-Z. Ou, M. Rahmaniheris, Yu Jiang, Po-Liang Wu, L. Sha
Using wireless networks in medical Cyber-Physical Systems could be challenging. Because the medical system not only assists the medical personnel to deliver medical services to the patient but also needs to deal with accidental situations such as communication failures without compromising the patient's safety. Previous research work tackled the communication failure problems in medical CPS from architecture perspectives. However, as medical devices configurations become more complex when a medical CPS is composed of many medical devices, we need to know that whether the certain configuration and a combination of the devices will not compromise the patient's safety. We present an algorithm to tackle the problem that whether a given system configuration exists a possible series of system transitions that allows the physicians to perform medical operations; in the mean time, the system transitions ensure the patient's safety while communication failures may happen during the transitions.
在医疗信息物理系统中使用无线网络可能具有挑战性。因为医疗系统不仅要协助医务人员向患者提供医疗服务,还需要在不影响患者安全的情况下处理诸如通信故障之类的意外情况。以往的研究工作是从体系结构的角度来解决医疗CPS的通信失败问题。然而,当医疗CPS由许多医疗设备组成时,随着医疗设备配置变得更加复杂,我们需要知道设备的特定配置和组合是否不会危及患者的安全。我们提出了一种算法来解决给定系统配置是否存在允许医生执行医疗操作的一系列可能的系统转换的问题;同时,系统的过渡保证了患者的安全,但在过渡过程中可能会出现通信故障。
{"title":"Toward safe interoperations in network connected medical cyber-physical systems using open-loop safe protocols","authors":"Andrew Y.-Z. Ou, M. Rahmaniheris, Yu Jiang, Po-Liang Wu, L. Sha","doi":"10.1109/ICCAD.2017.8203884","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203884","url":null,"abstract":"Using wireless networks in medical Cyber-Physical Systems could be challenging. Because the medical system not only assists the medical personnel to deliver medical services to the patient but also needs to deal with accidental situations such as communication failures without compromising the patient's safety. Previous research work tackled the communication failure problems in medical CPS from architecture perspectives. However, as medical devices configurations become more complex when a medical CPS is composed of many medical devices, we need to know that whether the certain configuration and a combination of the devices will not compromise the patient's safety. We present an algorithm to tackle the problem that whether a given system configuration exists a possible series of system transitions that allows the physicians to perform medical operations; in the mean time, the system transitions ensure the patient's safety while communication failures may happen during the transitions.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131419623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Edge segmentation: Empowering mobile telemedicine with compressed cellular neural networks 边缘分割:利用压缩细胞神经网络增强移动远程医疗的能力
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203873
Xiaowei Xu, Q. Lu, Tianchen Wang, Jinglan Liu, Cheng Zhuo, X. Hu, Yiyu Shi
With the need for increased care and welfare of the rapidly aging population, mobile telemedicine is becoming popular for providing remote health care to increase the quality of life. Recently, image analysis is being actively applied for medical diagnosis and treatment, in which image segmentation is of the fundamental importance for other image processing such as visualization and detection. However, given the tasks challenges in transmitting large volume of high-resolution images and the real-time constraints that are commonly present for mobile telemedicine, image segmentation is best done at the “edge”, i.e., locally so that only segmentation results are communicated. A powerful approach to medical image segmentation is cellular neural network (CeNN), which can achieve very high accuracy through proper training. However, CeNNs typically involve extensive computations in a recursive manner. As an example, to simply process an image of 1920×1080 pixels requires 4–8 Giga floating point multiplications (for 3×3 templates and 50–100 iterations), which needs to be done in a timely manner for real-time medical image segmentation. Such a demand is too high for most low power mobile computing platforms in IoTs, This paper presents a compressed CeNN framework for computation reduction in CeNNs, which is the first in the literature. It involves various techniques such as early exit and parameter quantization, which significantly reduces computation demands while maintaining an acceptable performance.
随着对快速老龄化人口的护理和福利需求的增加,移动远程医疗正在成为提供远程保健以提高生活质量的流行方式。近年来,图像分析在医学诊断和治疗中得到了积极的应用,其中图像分割是图像可视化和检测等其他图像处理的基础。然而,考虑到传输大量高分辨率图像的任务挑战以及移动远程医疗通常存在的实时性限制,图像分割最好在“边缘”进行,即在本地进行,以便仅传达分割结果。细胞神经网络是医学图像分割的一种有效方法,通过适当的训练可以达到很高的分割精度。然而,cenn通常以递归的方式涉及大量的计算。例如,简单地处理1920×1080像素的图像需要4-8千兆浮点乘法(对于3×3模板和50-100次迭代),这需要及时完成实时医学图像分割。对于物联网中大多数低功耗移动计算平台来说,这样的需求太高了。本文提出了一种压缩的CeNN框架,用于减少CeNN中的计算量,这在文献中是第一次。它涉及各种技术,如早期退出和参数量化,这大大减少了计算需求,同时保持可接受的性能。
{"title":"Edge segmentation: Empowering mobile telemedicine with compressed cellular neural networks","authors":"Xiaowei Xu, Q. Lu, Tianchen Wang, Jinglan Liu, Cheng Zhuo, X. Hu, Yiyu Shi","doi":"10.1109/ICCAD.2017.8203873","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203873","url":null,"abstract":"With the need for increased care and welfare of the rapidly aging population, mobile telemedicine is becoming popular for providing remote health care to increase the quality of life. Recently, image analysis is being actively applied for medical diagnosis and treatment, in which image segmentation is of the fundamental importance for other image processing such as visualization and detection. However, given the tasks challenges in transmitting large volume of high-resolution images and the real-time constraints that are commonly present for mobile telemedicine, image segmentation is best done at the “edge”, i.e., locally so that only segmentation results are communicated. A powerful approach to medical image segmentation is cellular neural network (CeNN), which can achieve very high accuracy through proper training. However, CeNNs typically involve extensive computations in a recursive manner. As an example, to simply process an image of 1920×1080 pixels requires 4–8 Giga floating point multiplications (for 3×3 templates and 50–100 iterations), which needs to be done in a timely manner for real-time medical image segmentation. Such a demand is too high for most low power mobile computing platforms in IoTs, This paper presents a compressed CeNN framework for computation reduction in CeNNs, which is the first in the literature. It involves various techniques such as early exit and parameter quantization, which significantly reduces computation demands while maintaining an acceptable performance.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"690 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132057489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
SAMG: Sparsified graph-theoretic algebraic multigrid for solving large symmetric diagonally dominant (SDD) matrices 求解大型对称对角占优(SDD)矩阵的稀疏化图论代数多重网格
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203832
Zhiqiang Zhao, Yongyu Wang, Zhuo Feng
Algebraic multigrid (AMG) is a class of high-performance linear solvers based on multigrid principles. Compared to geometric multigrid (GMG) solvers that rely on the geometric information of underlying problems, AMG solvers build hierarchical coarse level problems according to the input matrices. Graph-theoretic Algebraic Multigrid (AMG) algorithms have emerged for solving large Symmetric Diagonally Dominant (SDD) matrices by taking advantages of spectral properties of graph Laplacians. This paper proposes a Sparsified graph-theoretic Algebraic Multigrid (SAMG) framework that allows efficiently constructing nearly-linear sized graph Laplacians for coarse level problems while maintaining good spectral approximation during the AMG setup phase by leveraging a scalable spectral graph sparsification engine. Our experimental results show that the proposed method can offer more scalable performance than existing graph-theoretic AMG solvers for solving large SDD matrices in integrated circuit (IC) simulations, 3D-IC thermal analysis, image processing, finite element analysis as well as data mining and machine learning applications.
代数多重网格(AMG)是一类基于多重网格原理的高性能线性求解器。与依赖底层问题几何信息的几何多重网格(GMG)求解器相比,AMG求解器根据输入矩阵构建分层的粗级问题。图论代数多网格(AMG)算法是利用图拉普拉斯算子的谱特性来求解大型对称对角占优(SDD)矩阵的。本文提出了一个稀疏的图论代数多网格(SAMG)框架,该框架允许有效地为粗级问题构造近线性大小的图拉普拉斯,同时在AMG设置阶段通过利用可扩展的谱图稀疏化引擎保持良好的谱近似。实验结果表明,在集成电路(IC)仿真、3D-IC热分析、图像处理、有限元分析以及数据挖掘和机器学习应用中,所提出的方法可以提供比现有图论AMG求解器更高的可扩展性。
{"title":"SAMG: Sparsified graph-theoretic algebraic multigrid for solving large symmetric diagonally dominant (SDD) matrices","authors":"Zhiqiang Zhao, Yongyu Wang, Zhuo Feng","doi":"10.1109/ICCAD.2017.8203832","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203832","url":null,"abstract":"Algebraic multigrid (AMG) is a class of high-performance linear solvers based on multigrid principles. Compared to geometric multigrid (GMG) solvers that rely on the geometric information of underlying problems, AMG solvers build hierarchical coarse level problems according to the input matrices. Graph-theoretic Algebraic Multigrid (AMG) algorithms have emerged for solving large Symmetric Diagonally Dominant (SDD) matrices by taking advantages of spectral properties of graph Laplacians. This paper proposes a Sparsified graph-theoretic Algebraic Multigrid (SAMG) framework that allows efficiently constructing nearly-linear sized graph Laplacians for coarse level problems while maintaining good spectral approximation during the AMG setup phase by leveraging a scalable spectral graph sparsification engine. Our experimental results show that the proposed method can offer more scalable performance than existing graph-theoretic AMG solvers for solving large SDD matrices in integrated circuit (IC) simulations, 3D-IC thermal analysis, image processing, finite element analysis as well as data mining and machine learning applications.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132121650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Clepsydra: Modeling timing flows in hardware designs 漏壶:在硬件设计中建模时序流
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203772
Armaiti Ardeshiricham, Wei Hu, R. Kastner
Emergence of side channel security attacks has challenged the classic assumptions regarding what data is publicly available. As demonstrated repeatedly, statistical analysis of information collected by measuring completion time of hardware designs can reveal confidential information. Even though timing-based side channel leakage can be easily exploited to breach data privacy, conventional hardware verification tools are not yet suited to assess these vulnerabilities. To acquaint the hardware design process with formal security evaluations, we introduce a model for tracking timing-based information flows through HDL codes. Based on this model, we have developed Clepsydra, a tool for automatically generating circuitry for tracking timing flows and generic logical flows within hardware designs in two distinct channels. The circuit generated by Clepsydra can be analyzed by EDA tools to detect timing leakage or formally prove constant execution time. We present proofs regarding soundness and precision of the proposed model along with results of employing Clepsydra to verify security properties on a variety of hardware units including crypto cores, bus architectures, caches and arithmetic modules.
侧信道安全攻击的出现挑战了关于什么数据是公开可用的经典假设。事实一再证明,通过测量硬件设计完成时间收集的信息进行统计分析可以揭示机密信息。尽管基于时间的侧信道泄漏很容易被利用来破坏数据隐私,但传统的硬件验证工具尚不适合评估这些漏洞。为了使硬件设计过程熟悉正式的安全评估,我们引入了一个通过HDL代码跟踪基于时间的信息流的模型。基于该模型,我们开发了Clepsydra,这是一种自动生成电路的工具,用于在两个不同通道的硬件设计中跟踪时序流和通用逻辑流。通过EDA工具可以对Clepsydra产生的电路进行分析,以检测时序泄漏或正式证明恒定的执行时间。我们提出了关于所提出模型的可靠性和精度的证明,以及使用Clepsydra在各种硬件单元(包括加密核心,总线架构,缓存和算术模块)上验证安全属性的结果。
{"title":"Clepsydra: Modeling timing flows in hardware designs","authors":"Armaiti Ardeshiricham, Wei Hu, R. Kastner","doi":"10.1109/ICCAD.2017.8203772","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203772","url":null,"abstract":"Emergence of side channel security attacks has challenged the classic assumptions regarding what data is publicly available. As demonstrated repeatedly, statistical analysis of information collected by measuring completion time of hardware designs can reveal confidential information. Even though timing-based side channel leakage can be easily exploited to breach data privacy, conventional hardware verification tools are not yet suited to assess these vulnerabilities. To acquaint the hardware design process with formal security evaluations, we introduce a model for tracking timing-based information flows through HDL codes. Based on this model, we have developed Clepsydra, a tool for automatically generating circuitry for tracking timing flows and generic logical flows within hardware designs in two distinct channels. The circuit generated by Clepsydra can be analyzed by EDA tools to detect timing leakage or formally prove constant execution time. We present proofs regarding soundness and precision of the proposed model along with results of employing Clepsydra to verify security properties on a variety of hardware units including crypto cores, bus architectures, caches and arithmetic modules.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"46 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114024655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
ACQUA: Adaptive and cooperative quality-aware control for automotive cyber-physical systems 汽车信息物理系统的自适应和协作质量意识控制
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203778
K. Vatanparvar, M. A. Faruque
Controllers in cyber-physical systems integrate a design-time behavioral model of the system under design to improve their own quality. In the state-of-the-art control designs, behavioral models of other interacting neighbor systems are also integrated to form a centralized behavioral model and to enable a system-level optimization and control. Although this ideal embedded control design may result in pareto-optimal solutions, it is not scalable to larger number of systems. Moreover, the behavior of the multi-domain physical systems may be too complex for a control designer to model and may dynamically change at run time. In this paper, we propose a novel Adaptive and Cooperative Quality-Aware (ACQUA) control design which addresses these challenges. In this control design, an ACQUA-based controller for the system under design will monitor the quality of the neighbor systems to dynamically learn their behavior. Therefore, it can quickly adapt its control to cooperate with other neighbor controllers for improving the quality of not only itself, but also other neighbor systems. We apply ACQUA to design a cooperative controller for automotive navigation system, motor control unit, and battery management system in an electric vehicle. We use this automotive example to analyze the performance of the design. We show that by using our ACQUA control, we can reach up to 86% improvements achievable by an ideal embedded control design such that energy consumption reduces by 18% and battery capacity loss decreases by 12% compared to the state-of-the-art on average.
网络物理系统中的控制器集成了被设计系统的设计时行为模型,以提高其自身的质量。在最先进的控制设计中,还集成了其他相互作用的邻居系统的行为模型,形成集中的行为模型,从而实现系统级的优化和控制。虽然这种理想的嵌入式控制设计可能导致帕累托最优解决方案,但它不能扩展到更大数量的系统。此外,多域物理系统的行为可能过于复杂,控制设计人员无法建模,并且可能在运行时动态更改。在本文中,我们提出了一种新的自适应和协作质量意识(ACQUA)控制设计来解决这些挑战。在这种控制设计中,设计系统的基于acqua的控制器将监视相邻系统的质量以动态学习它们的行为。因此,它可以快速调整其控制以与其他邻居控制器合作,从而提高自身和其他邻居系统的质量。应用ACQUA技术设计了电动汽车导航系统、电机控制单元和电池管理系统的协同控制器。以汽车为例,分析了该设计的性能。我们表明,通过使用我们的ACQUA控制,我们可以通过理想的嵌入式控制设计实现高达86%的改进,与最先进的平均水平相比,能耗降低18%,电池容量损失降低12%。
{"title":"ACQUA: Adaptive and cooperative quality-aware control for automotive cyber-physical systems","authors":"K. Vatanparvar, M. A. Faruque","doi":"10.1109/ICCAD.2017.8203778","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203778","url":null,"abstract":"Controllers in cyber-physical systems integrate a design-time behavioral model of the system under design to improve their own quality. In the state-of-the-art control designs, behavioral models of other interacting neighbor systems are also integrated to form a centralized behavioral model and to enable a system-level optimization and control. Although this ideal embedded control design may result in pareto-optimal solutions, it is not scalable to larger number of systems. Moreover, the behavior of the multi-domain physical systems may be too complex for a control designer to model and may dynamically change at run time. In this paper, we propose a novel Adaptive and Cooperative Quality-Aware (ACQUA) control design which addresses these challenges. In this control design, an ACQUA-based controller for the system under design will monitor the quality of the neighbor systems to dynamically learn their behavior. Therefore, it can quickly adapt its control to cooperate with other neighbor controllers for improving the quality of not only itself, but also other neighbor systems. We apply ACQUA to design a cooperative controller for automotive navigation system, motor control unit, and battery management system in an electric vehicle. We use this automotive example to analyze the performance of the design. We show that by using our ACQUA control, we can reach up to 86% improvements achievable by an ideal embedded control design such that energy consumption reduces by 18% and battery capacity loss decreases by 12% compared to the state-of-the-art on average.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122040395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
SAT-based compilation to a non-vonNeumann processor 基于sat的编译到非诺伊曼处理器
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203842
S. Chaudhuri, A. Hetzel
This paper describes a compilation technique used to accelerate dataflow computations, common in deep neural network computing, onto Coarse Grained Reconfigurable Array (CGRA) architectures. This technique has been demonstrated to automatically compile dataflow programs onto a commercial massively parallel CGRA-based dataflow processor (DPU) containing 16000 processing elements. The DPU architecture overcomes the von Neumann bottleneck by spatially flowing and reusing data from local memories, and provides higher computation efficiency compared to temporal parallel architectures such as GPUs and multi-core CPUs. However, existing software development tools for CGRAs are limited to compiling domain specific programs to processing elements with uniform structures, and are not effective on complex micro architectures where latencies of memory access vary in a nontrivial fashion depending on data locality. A primary contribution of this paper is to provide a general algorithm that can compile general dataflow graphs, and can efficiently utilize processing elements with rich micro-architectural features such as complex instructions, multi-precision data paths, local memories, register files, switches etc. Another contribution is a uniquely innovative application of Boolean Satisfiability to formally solve this complex, and irregular optimization problem and produce high-quality results comparable to hand-written assembly code produced by human experts. A third contribution is an adaptive windowing algorithm that harnesses the complexity of the SAT-based approach and delivers a scalable and robust solution.
本文描述了一种用于在粗粒度可重构阵列(CGRA)架构上加速深度神经网络计算中常见的数据流计算的编译技术。该技术已被证明可以自动将数据流程序编译到包含16000个处理元素的商用大规模并行基于cgra的数据流处理器(DPU)上。DPU架构通过空间流动和重用本地内存中的数据来克服冯·诺依曼瓶颈,与gpu和多核cpu等时间并行架构相比,提供了更高的计算效率。然而,现有的用于CGRAs的软件开发工具仅限于编译特定领域的程序来处理具有统一结构的元素,并且在复杂的微体系结构中不有效,因为内存访问延迟会根据数据位置而以一种重要的方式变化。本文的主要贡献是提供了一种通用算法,可以编译通用数据流图,并能有效地利用具有复杂指令、多精度数据路径、本地存储器、寄存器文件、开关等丰富微结构特征的处理元素。另一个贡献是布尔可满足性的独特创新应用,它正式解决了这个复杂的、不规则的优化问题,并产生了可与人类专家编写的汇编代码相媲美的高质量结果。第三个贡献是自适应窗口算法,该算法利用了基于sat方法的复杂性,并提供了可扩展且健壮的解决方案。
{"title":"SAT-based compilation to a non-vonNeumann processor","authors":"S. Chaudhuri, A. Hetzel","doi":"10.1109/ICCAD.2017.8203842","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203842","url":null,"abstract":"This paper describes a compilation technique used to accelerate dataflow computations, common in deep neural network computing, onto Coarse Grained Reconfigurable Array (CGRA) architectures. This technique has been demonstrated to automatically compile dataflow programs onto a commercial massively parallel CGRA-based dataflow processor (DPU) containing 16000 processing elements. The DPU architecture overcomes the von Neumann bottleneck by spatially flowing and reusing data from local memories, and provides higher computation efficiency compared to temporal parallel architectures such as GPUs and multi-core CPUs. However, existing software development tools for CGRAs are limited to compiling domain specific programs to processing elements with uniform structures, and are not effective on complex micro architectures where latencies of memory access vary in a nontrivial fashion depending on data locality. A primary contribution of this paper is to provide a general algorithm that can compile general dataflow graphs, and can efficiently utilize processing elements with rich micro-architectural features such as complex instructions, multi-precision data paths, local memories, register files, switches etc. Another contribution is a uniquely innovative application of Boolean Satisfiability to formally solve this complex, and irregular optimization problem and produce high-quality results comparable to hand-written assembly code produced by human experts. A third contribution is an adaptive windowing algorithm that harnesses the complexity of the SAT-based approach and delivers a scalable and robust solution.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"322 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116296315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Cyclist: Accelerating hardware development 自行车手:加速硬件开发
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203892
J. Bachrach, Albert Magyar, D. Dabbelt, Patrick Li, Richard Lin, K. Asanović
The end of Dennard scaling has led to an increase in demand for energy-efficient custom hardware accelerators, but current hardware design is slow and laborious, partly because each iteration of the compile-run-debug cycle can take hours or even days with existing simulation and emulation platforms. Cyclist is a new emulation platform designed specifically to shorten the total compile-run-debug cycle. The Cyclist toolflow converts a Chisel RTL design to a parallel dataflow graph, which is then mapped to the Cyclist hardware architecture, consisting of a tiled array of custom parallel emulation engines. Cyclist provides cycle-accurate/bit-accurate RTL emulation at speeds approaching FPGA emulation, but with compile time closer to software simulation. Cyclist provides full visibility and debuggability of the hardware design, including moving forwards and backwards in simulation time while searching for trigger events. The snapshot facility used for debugging is also used to provide a “pay-as-you-go” mapping strategy, which allows emulation to begin execution with a low-effort placement, while higher-quality emulation placements are optimized in the background and swapped in to a running emulation. The Cyclist ASIC design requires 0.069mm2 per tile and runs at 2GHz in a 45nm CMOS process. Our evaluation demonstrate that Cyclist outperforms FPGA emulation, VCS, and C+,+, simulation on combined compile and run time for up to a billion cycles for a set of real-world hardware benchmarks.
Dennard扩展的终结导致了对高能效定制硬件加速器的需求增加,但是当前的硬件设计缓慢而费力,部分原因是使用现有的仿真和仿真平台,编译-运行-调试周期的每次迭代可能需要数小时甚至数天的时间。自行车是一个新的仿真平台,专为缩短总编译-运行-调试周期而设计。cycling工具流将Chisel RTL设计转换为并行数据流图,然后将其映射到由自定义并行仿真引擎的平排阵列组成的cycling硬件架构。cycling以接近FPGA仿真的速度提供周期精确/位精确的RTL仿真,但编译时间更接近软件仿真。自行车提供了硬件设计的完整可见性和可调试性,包括在搜索触发事件时在模拟时间内向前和向后移动。用于调试的快照功能还用于提供“按需付费”的映射策略,该策略允许模拟以低工作量的放置开始执行,而高质量的模拟放置在后台进行优化并交换到正在运行的模拟中。自行车ASIC设计要求每瓦0.069mm2,在45纳米CMOS工艺中以2GHz运行。我们的评估表明,在一组真实的硬件基准测试中,在组合编译和运行时,骑车者的性能优于FPGA仿真、VCS和c++、+仿真,可达到10亿次循环。
{"title":"Cyclist: Accelerating hardware development","authors":"J. Bachrach, Albert Magyar, D. Dabbelt, Patrick Li, Richard Lin, K. Asanović","doi":"10.1109/ICCAD.2017.8203892","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203892","url":null,"abstract":"The end of Dennard scaling has led to an increase in demand for energy-efficient custom hardware accelerators, but current hardware design is slow and laborious, partly because each iteration of the compile-run-debug cycle can take hours or even days with existing simulation and emulation platforms. Cyclist is a new emulation platform designed specifically to shorten the total compile-run-debug cycle. The Cyclist toolflow converts a Chisel RTL design to a parallel dataflow graph, which is then mapped to the Cyclist hardware architecture, consisting of a tiled array of custom parallel emulation engines. Cyclist provides cycle-accurate/bit-accurate RTL emulation at speeds approaching FPGA emulation, but with compile time closer to software simulation. Cyclist provides full visibility and debuggability of the hardware design, including moving forwards and backwards in simulation time while searching for trigger events. The snapshot facility used for debugging is also used to provide a “pay-as-you-go” mapping strategy, which allows emulation to begin execution with a low-effort placement, while higher-quality emulation placements are optimized in the background and swapped in to a running emulation. The Cyclist ASIC design requires 0.069mm2 per tile and runs at 2GHz in a 45nm CMOS process. Our evaluation demonstrate that Cyclist outperforms FPGA emulation, VCS, and C+,+, simulation on combined compile and run time for up to a billion cycles for a set of real-world hardware benchmarks.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114676159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hardening extended memory access control schemes with self-verified address spaces 使用自验证地址空间加强扩展内存访问控制方案
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203804
J. Elwell, Dmitry Evtyushkin, D. Ponomarev, N. Abu-Ghazaleh, Ryan D. Riley
In this paper we revisit the security properties of extended access control schemes that are used to protect application secrets from untrusted system software. We demonstrate the vulnerability of several recent proposals to a class of attacks we call mapping attacks. We argue that protection from such attacks requires verification of the address space integrity and propose the concept of self-verified address spaces (SVAS), where the applications themselves are made aware of the requested changes in the page mappings and are placed in charge of verifying them. SVAS equips an application with a customized verification model with several attractive functional and performance properties. We implemented the attacks and a complete prototype of SVAS in Linux and the QEMU emulator. Our results demonstrate that SVAS can prevent mapping attacks on extended access control systems with minimal performance overhead, hardware modifications and software complexity.
在本文中,我们重新讨论了用于保护应用程序秘密不受不可信系统软件攻击的扩展访问控制方案的安全特性。我们展示了最近几个提案对一类攻击的脆弱性,我们称之为映射攻击。我们认为,防止此类攻击需要验证地址空间的完整性,并提出了自验证地址空间(SVAS)的概念,其中应用程序本身知道页面映射中请求的更改,并负责验证它们。SVAS为应用程序配备了具有几个有吸引力的功能和性能属性的自定义验证模型。我们在Linux和QEMU仿真器中实现了攻击和SVAS的完整原型。我们的研究结果表明,SVAS可以以最小的性能开销、硬件修改和软件复杂性来防止对扩展访问控制系统的映射攻击。
{"title":"Hardening extended memory access control schemes with self-verified address spaces","authors":"J. Elwell, Dmitry Evtyushkin, D. Ponomarev, N. Abu-Ghazaleh, Ryan D. Riley","doi":"10.1109/ICCAD.2017.8203804","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203804","url":null,"abstract":"In this paper we revisit the security properties of extended access control schemes that are used to protect application secrets from untrusted system software. We demonstrate the vulnerability of several recent proposals to a class of attacks we call mapping attacks. We argue that protection from such attacks requires verification of the address space integrity and propose the concept of self-verified address spaces (SVAS), where the applications themselves are made aware of the requested changes in the page mappings and are placed in charge of verifying them. SVAS equips an application with a customized verification model with several attractive functional and performance properties. We implemented the attacks and a complete prototype of SVAS in Linux and the QEMU emulator. Our results demonstrate that SVAS can prevent mapping attacks on extended access control systems with minimal performance overhead, hardware modifications and software complexity.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114973686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1