Recent advances in resistive random-access memory (RRAM) evoke great interests in exploring alternative architectures. One interesting work is a RRAM-based reconfigurable architecture that provides superior programmbility and blurs the boundary between computation and storage, but long-distance routing becomes a performance bottleneck. However, long-distance routing in FPGA is efficiently implemented, but its fine-grained routing structure results in a large routing overhead. In this work, we present a RRAM-based reconfigurable architecture that addresses the routing challenges using hybrid routing, i.e., local and global routing by taking the best advantages of both architectures (prior RRAM-based and FPGA). We also provide a complete CAD framework that exhibits high parallelism and good scalability. Experimental results show that our reconfigurable architecture outperforms both architectures. It achieves a 46.88% reduction in delay and improves the energy efficiency by 66.23% compared with the prior RRAM-based architecture with a slightly increased area overhead. While comparing with FPGA, it reduces the delay and the routing overhead by 36.00% and 50.20%, respectively. Additionally, our CAD framework achieves 5.39x speedup, compared with the prior framework.
{"title":"RRAM-based reconfigurable in-memory computing architecture with hybrid routing","authors":"Yue Zha, J. Li","doi":"10.5555/3199700.3199770","DOIUrl":"https://doi.org/10.5555/3199700.3199770","url":null,"abstract":"Recent advances in resistive random-access memory (RRAM) evoke great interests in exploring alternative architectures. One interesting work is a RRAM-based reconfigurable architecture that provides superior programmbility and blurs the boundary between computation and storage, but long-distance routing becomes a performance bottleneck. However, long-distance routing in FPGA is efficiently implemented, but its fine-grained routing structure results in a large routing overhead. In this work, we present a RRAM-based reconfigurable architecture that addresses the routing challenges using hybrid routing, i.e., local and global routing by taking the best advantages of both architectures (prior RRAM-based and FPGA). We also provide a complete CAD framework that exhibits high parallelism and good scalability. Experimental results show that our reconfigurable architecture outperforms both architectures. It achieves a 46.88% reduction in delay and improves the energy efficiency by 66.23% compared with the prior RRAM-based architecture with a slightly increased area overhead. While comparing with FPGA, it reduces the delay and the routing overhead by 36.00% and 50.20%, respectively. Additionally, our CAD framework achieves 5.39x speedup, compared with the prior framework.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126673391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203755
Ming-Chang Yang, Yuan-Hao Chang, Fenggang Wu, Tei-Wei Kuo, D. Du
This paper presents a Virtual Persistent Cache design to remedy the long latency behavior of the Host-Aware Shingled Magnetic Recording (HA-SMR) drive. Our design keeps the cost-effective model of the existing HA-SMR drives, but at the same time asks the great help from the host system for adaptively providing some computing and management resources to improve the drive performance when needed. The technical contribution is to trick the HA-SMR drives by smartly reshaping the access patterns to HA-SMR drives, so as to avoid the occurrences of long latencies in most cases and thus to ultimately improve the drive performance and responsiveness. We conduct experiments on real Seagate 8 TB HA-SMR drives to demonstrate the advantages of Virtual Persistent Cache over the real workloads from Microsoft Research Cambridge. The results show that the proposed design can remedy most of the long latencies and improve the drive performance by at least 58.11%, under the evaluated workloads.
{"title":"Virtual persistent cache: Remedy the long latency behavior of host-aware shingled magnetic recording drives","authors":"Ming-Chang Yang, Yuan-Hao Chang, Fenggang Wu, Tei-Wei Kuo, D. Du","doi":"10.1109/ICCAD.2017.8203755","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203755","url":null,"abstract":"This paper presents a Virtual Persistent Cache design to remedy the long latency behavior of the Host-Aware Shingled Magnetic Recording (HA-SMR) drive. Our design keeps the cost-effective model of the existing HA-SMR drives, but at the same time asks the great help from the host system for adaptively providing some computing and management resources to improve the drive performance when needed. The technical contribution is to trick the HA-SMR drives by smartly reshaping the access patterns to HA-SMR drives, so as to avoid the occurrences of long latencies in most cases and thus to ultimately improve the drive performance and responsiveness. We conduct experiments on real Seagate 8 TB HA-SMR drives to demonstrate the advantages of Virtual Persistent Cache over the real workloads from Microsoft Research Cambridge. The results show that the proposed design can remedy most of the long latencies and improve the drive performance by at least 58.11%, under the evaluated workloads.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133660094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203884
Andrew Y.-Z. Ou, M. Rahmaniheris, Yu Jiang, Po-Liang Wu, L. Sha
Using wireless networks in medical Cyber-Physical Systems could be challenging. Because the medical system not only assists the medical personnel to deliver medical services to the patient but also needs to deal with accidental situations such as communication failures without compromising the patient's safety. Previous research work tackled the communication failure problems in medical CPS from architecture perspectives. However, as medical devices configurations become more complex when a medical CPS is composed of many medical devices, we need to know that whether the certain configuration and a combination of the devices will not compromise the patient's safety. We present an algorithm to tackle the problem that whether a given system configuration exists a possible series of system transitions that allows the physicians to perform medical operations; in the mean time, the system transitions ensure the patient's safety while communication failures may happen during the transitions.
{"title":"Toward safe interoperations in network connected medical cyber-physical systems using open-loop safe protocols","authors":"Andrew Y.-Z. Ou, M. Rahmaniheris, Yu Jiang, Po-Liang Wu, L. Sha","doi":"10.1109/ICCAD.2017.8203884","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203884","url":null,"abstract":"Using wireless networks in medical Cyber-Physical Systems could be challenging. Because the medical system not only assists the medical personnel to deliver medical services to the patient but also needs to deal with accidental situations such as communication failures without compromising the patient's safety. Previous research work tackled the communication failure problems in medical CPS from architecture perspectives. However, as medical devices configurations become more complex when a medical CPS is composed of many medical devices, we need to know that whether the certain configuration and a combination of the devices will not compromise the patient's safety. We present an algorithm to tackle the problem that whether a given system configuration exists a possible series of system transitions that allows the physicians to perform medical operations; in the mean time, the system transitions ensure the patient's safety while communication failures may happen during the transitions.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131419623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the need for increased care and welfare of the rapidly aging population, mobile telemedicine is becoming popular for providing remote health care to increase the quality of life. Recently, image analysis is being actively applied for medical diagnosis and treatment, in which image segmentation is of the fundamental importance for other image processing such as visualization and detection. However, given the tasks challenges in transmitting large volume of high-resolution images and the real-time constraints that are commonly present for mobile telemedicine, image segmentation is best done at the “edge”, i.e., locally so that only segmentation results are communicated. A powerful approach to medical image segmentation is cellular neural network (CeNN), which can achieve very high accuracy through proper training. However, CeNNs typically involve extensive computations in a recursive manner. As an example, to simply process an image of 1920×1080 pixels requires 4–8 Giga floating point multiplications (for 3×3 templates and 50–100 iterations), which needs to be done in a timely manner for real-time medical image segmentation. Such a demand is too high for most low power mobile computing platforms in IoTs, This paper presents a compressed CeNN framework for computation reduction in CeNNs, which is the first in the literature. It involves various techniques such as early exit and parameter quantization, which significantly reduces computation demands while maintaining an acceptable performance.
{"title":"Edge segmentation: Empowering mobile telemedicine with compressed cellular neural networks","authors":"Xiaowei Xu, Q. Lu, Tianchen Wang, Jinglan Liu, Cheng Zhuo, X. Hu, Yiyu Shi","doi":"10.1109/ICCAD.2017.8203873","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203873","url":null,"abstract":"With the need for increased care and welfare of the rapidly aging population, mobile telemedicine is becoming popular for providing remote health care to increase the quality of life. Recently, image analysis is being actively applied for medical diagnosis and treatment, in which image segmentation is of the fundamental importance for other image processing such as visualization and detection. However, given the tasks challenges in transmitting large volume of high-resolution images and the real-time constraints that are commonly present for mobile telemedicine, image segmentation is best done at the “edge”, i.e., locally so that only segmentation results are communicated. A powerful approach to medical image segmentation is cellular neural network (CeNN), which can achieve very high accuracy through proper training. However, CeNNs typically involve extensive computations in a recursive manner. As an example, to simply process an image of 1920×1080 pixels requires 4–8 Giga floating point multiplications (for 3×3 templates and 50–100 iterations), which needs to be done in a timely manner for real-time medical image segmentation. Such a demand is too high for most low power mobile computing platforms in IoTs, This paper presents a compressed CeNN framework for computation reduction in CeNNs, which is the first in the literature. It involves various techniques such as early exit and parameter quantization, which significantly reduces computation demands while maintaining an acceptable performance.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"690 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132057489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203832
Zhiqiang Zhao, Yongyu Wang, Zhuo Feng
Algebraic multigrid (AMG) is a class of high-performance linear solvers based on multigrid principles. Compared to geometric multigrid (GMG) solvers that rely on the geometric information of underlying problems, AMG solvers build hierarchical coarse level problems according to the input matrices. Graph-theoretic Algebraic Multigrid (AMG) algorithms have emerged for solving large Symmetric Diagonally Dominant (SDD) matrices by taking advantages of spectral properties of graph Laplacians. This paper proposes a Sparsified graph-theoretic Algebraic Multigrid (SAMG) framework that allows efficiently constructing nearly-linear sized graph Laplacians for coarse level problems while maintaining good spectral approximation during the AMG setup phase by leveraging a scalable spectral graph sparsification engine. Our experimental results show that the proposed method can offer more scalable performance than existing graph-theoretic AMG solvers for solving large SDD matrices in integrated circuit (IC) simulations, 3D-IC thermal analysis, image processing, finite element analysis as well as data mining and machine learning applications.
{"title":"SAMG: Sparsified graph-theoretic algebraic multigrid for solving large symmetric diagonally dominant (SDD) matrices","authors":"Zhiqiang Zhao, Yongyu Wang, Zhuo Feng","doi":"10.1109/ICCAD.2017.8203832","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203832","url":null,"abstract":"Algebraic multigrid (AMG) is a class of high-performance linear solvers based on multigrid principles. Compared to geometric multigrid (GMG) solvers that rely on the geometric information of underlying problems, AMG solvers build hierarchical coarse level problems according to the input matrices. Graph-theoretic Algebraic Multigrid (AMG) algorithms have emerged for solving large Symmetric Diagonally Dominant (SDD) matrices by taking advantages of spectral properties of graph Laplacians. This paper proposes a Sparsified graph-theoretic Algebraic Multigrid (SAMG) framework that allows efficiently constructing nearly-linear sized graph Laplacians for coarse level problems while maintaining good spectral approximation during the AMG setup phase by leveraging a scalable spectral graph sparsification engine. Our experimental results show that the proposed method can offer more scalable performance than existing graph-theoretic AMG solvers for solving large SDD matrices in integrated circuit (IC) simulations, 3D-IC thermal analysis, image processing, finite element analysis as well as data mining and machine learning applications.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132121650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203772
Armaiti Ardeshiricham, Wei Hu, R. Kastner
Emergence of side channel security attacks has challenged the classic assumptions regarding what data is publicly available. As demonstrated repeatedly, statistical analysis of information collected by measuring completion time of hardware designs can reveal confidential information. Even though timing-based side channel leakage can be easily exploited to breach data privacy, conventional hardware verification tools are not yet suited to assess these vulnerabilities. To acquaint the hardware design process with formal security evaluations, we introduce a model for tracking timing-based information flows through HDL codes. Based on this model, we have developed Clepsydra, a tool for automatically generating circuitry for tracking timing flows and generic logical flows within hardware designs in two distinct channels. The circuit generated by Clepsydra can be analyzed by EDA tools to detect timing leakage or formally prove constant execution time. We present proofs regarding soundness and precision of the proposed model along with results of employing Clepsydra to verify security properties on a variety of hardware units including crypto cores, bus architectures, caches and arithmetic modules.
{"title":"Clepsydra: Modeling timing flows in hardware designs","authors":"Armaiti Ardeshiricham, Wei Hu, R. Kastner","doi":"10.1109/ICCAD.2017.8203772","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203772","url":null,"abstract":"Emergence of side channel security attacks has challenged the classic assumptions regarding what data is publicly available. As demonstrated repeatedly, statistical analysis of information collected by measuring completion time of hardware designs can reveal confidential information. Even though timing-based side channel leakage can be easily exploited to breach data privacy, conventional hardware verification tools are not yet suited to assess these vulnerabilities. To acquaint the hardware design process with formal security evaluations, we introduce a model for tracking timing-based information flows through HDL codes. Based on this model, we have developed Clepsydra, a tool for automatically generating circuitry for tracking timing flows and generic logical flows within hardware designs in two distinct channels. The circuit generated by Clepsydra can be analyzed by EDA tools to detect timing leakage or formally prove constant execution time. We present proofs regarding soundness and precision of the proposed model along with results of employing Clepsydra to verify security properties on a variety of hardware units including crypto cores, bus architectures, caches and arithmetic modules.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"46 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114024655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203778
K. Vatanparvar, M. A. Faruque
Controllers in cyber-physical systems integrate a design-time behavioral model of the system under design to improve their own quality. In the state-of-the-art control designs, behavioral models of other interacting neighbor systems are also integrated to form a centralized behavioral model and to enable a system-level optimization and control. Although this ideal embedded control design may result in pareto-optimal solutions, it is not scalable to larger number of systems. Moreover, the behavior of the multi-domain physical systems may be too complex for a control designer to model and may dynamically change at run time. In this paper, we propose a novel Adaptive and Cooperative Quality-Aware (ACQUA) control design which addresses these challenges. In this control design, an ACQUA-based controller for the system under design will monitor the quality of the neighbor systems to dynamically learn their behavior. Therefore, it can quickly adapt its control to cooperate with other neighbor controllers for improving the quality of not only itself, but also other neighbor systems. We apply ACQUA to design a cooperative controller for automotive navigation system, motor control unit, and battery management system in an electric vehicle. We use this automotive example to analyze the performance of the design. We show that by using our ACQUA control, we can reach up to 86% improvements achievable by an ideal embedded control design such that energy consumption reduces by 18% and battery capacity loss decreases by 12% compared to the state-of-the-art on average.
{"title":"ACQUA: Adaptive and cooperative quality-aware control for automotive cyber-physical systems","authors":"K. Vatanparvar, M. A. Faruque","doi":"10.1109/ICCAD.2017.8203778","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203778","url":null,"abstract":"Controllers in cyber-physical systems integrate a design-time behavioral model of the system under design to improve their own quality. In the state-of-the-art control designs, behavioral models of other interacting neighbor systems are also integrated to form a centralized behavioral model and to enable a system-level optimization and control. Although this ideal embedded control design may result in pareto-optimal solutions, it is not scalable to larger number of systems. Moreover, the behavior of the multi-domain physical systems may be too complex for a control designer to model and may dynamically change at run time. In this paper, we propose a novel Adaptive and Cooperative Quality-Aware (ACQUA) control design which addresses these challenges. In this control design, an ACQUA-based controller for the system under design will monitor the quality of the neighbor systems to dynamically learn their behavior. Therefore, it can quickly adapt its control to cooperate with other neighbor controllers for improving the quality of not only itself, but also other neighbor systems. We apply ACQUA to design a cooperative controller for automotive navigation system, motor control unit, and battery management system in an electric vehicle. We use this automotive example to analyze the performance of the design. We show that by using our ACQUA control, we can reach up to 86% improvements achievable by an ideal embedded control design such that energy consumption reduces by 18% and battery capacity loss decreases by 12% compared to the state-of-the-art on average.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122040395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203842
S. Chaudhuri, A. Hetzel
This paper describes a compilation technique used to accelerate dataflow computations, common in deep neural network computing, onto Coarse Grained Reconfigurable Array (CGRA) architectures. This technique has been demonstrated to automatically compile dataflow programs onto a commercial massively parallel CGRA-based dataflow processor (DPU) containing 16000 processing elements. The DPU architecture overcomes the von Neumann bottleneck by spatially flowing and reusing data from local memories, and provides higher computation efficiency compared to temporal parallel architectures such as GPUs and multi-core CPUs. However, existing software development tools for CGRAs are limited to compiling domain specific programs to processing elements with uniform structures, and are not effective on complex micro architectures where latencies of memory access vary in a nontrivial fashion depending on data locality. A primary contribution of this paper is to provide a general algorithm that can compile general dataflow graphs, and can efficiently utilize processing elements with rich micro-architectural features such as complex instructions, multi-precision data paths, local memories, register files, switches etc. Another contribution is a uniquely innovative application of Boolean Satisfiability to formally solve this complex, and irregular optimization problem and produce high-quality results comparable to hand-written assembly code produced by human experts. A third contribution is an adaptive windowing algorithm that harnesses the complexity of the SAT-based approach and delivers a scalable and robust solution.
{"title":"SAT-based compilation to a non-vonNeumann processor","authors":"S. Chaudhuri, A. Hetzel","doi":"10.1109/ICCAD.2017.8203842","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203842","url":null,"abstract":"This paper describes a compilation technique used to accelerate dataflow computations, common in deep neural network computing, onto Coarse Grained Reconfigurable Array (CGRA) architectures. This technique has been demonstrated to automatically compile dataflow programs onto a commercial massively parallel CGRA-based dataflow processor (DPU) containing 16000 processing elements. The DPU architecture overcomes the von Neumann bottleneck by spatially flowing and reusing data from local memories, and provides higher computation efficiency compared to temporal parallel architectures such as GPUs and multi-core CPUs. However, existing software development tools for CGRAs are limited to compiling domain specific programs to processing elements with uniform structures, and are not effective on complex micro architectures where latencies of memory access vary in a nontrivial fashion depending on data locality. A primary contribution of this paper is to provide a general algorithm that can compile general dataflow graphs, and can efficiently utilize processing elements with rich micro-architectural features such as complex instructions, multi-precision data paths, local memories, register files, switches etc. Another contribution is a uniquely innovative application of Boolean Satisfiability to formally solve this complex, and irregular optimization problem and produce high-quality results comparable to hand-written assembly code produced by human experts. A third contribution is an adaptive windowing algorithm that harnesses the complexity of the SAT-based approach and delivers a scalable and robust solution.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"322 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116296315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203892
J. Bachrach, Albert Magyar, D. Dabbelt, Patrick Li, Richard Lin, K. Asanović
The end of Dennard scaling has led to an increase in demand for energy-efficient custom hardware accelerators, but current hardware design is slow and laborious, partly because each iteration of the compile-run-debug cycle can take hours or even days with existing simulation and emulation platforms. Cyclist is a new emulation platform designed specifically to shorten the total compile-run-debug cycle. The Cyclist toolflow converts a Chisel RTL design to a parallel dataflow graph, which is then mapped to the Cyclist hardware architecture, consisting of a tiled array of custom parallel emulation engines. Cyclist provides cycle-accurate/bit-accurate RTL emulation at speeds approaching FPGA emulation, but with compile time closer to software simulation. Cyclist provides full visibility and debuggability of the hardware design, including moving forwards and backwards in simulation time while searching for trigger events. The snapshot facility used for debugging is also used to provide a “pay-as-you-go” mapping strategy, which allows emulation to begin execution with a low-effort placement, while higher-quality emulation placements are optimized in the background and swapped in to a running emulation. The Cyclist ASIC design requires 0.069mm2 per tile and runs at 2GHz in a 45nm CMOS process. Our evaluation demonstrate that Cyclist outperforms FPGA emulation, VCS, and C+,+, simulation on combined compile and run time for up to a billion cycles for a set of real-world hardware benchmarks.
{"title":"Cyclist: Accelerating hardware development","authors":"J. Bachrach, Albert Magyar, D. Dabbelt, Patrick Li, Richard Lin, K. Asanović","doi":"10.1109/ICCAD.2017.8203892","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203892","url":null,"abstract":"The end of Dennard scaling has led to an increase in demand for energy-efficient custom hardware accelerators, but current hardware design is slow and laborious, partly because each iteration of the compile-run-debug cycle can take hours or even days with existing simulation and emulation platforms. Cyclist is a new emulation platform designed specifically to shorten the total compile-run-debug cycle. The Cyclist toolflow converts a Chisel RTL design to a parallel dataflow graph, which is then mapped to the Cyclist hardware architecture, consisting of a tiled array of custom parallel emulation engines. Cyclist provides cycle-accurate/bit-accurate RTL emulation at speeds approaching FPGA emulation, but with compile time closer to software simulation. Cyclist provides full visibility and debuggability of the hardware design, including moving forwards and backwards in simulation time while searching for trigger events. The snapshot facility used for debugging is also used to provide a “pay-as-you-go” mapping strategy, which allows emulation to begin execution with a low-effort placement, while higher-quality emulation placements are optimized in the background and swapped in to a running emulation. The Cyclist ASIC design requires 0.069mm2 per tile and runs at 2GHz in a 45nm CMOS process. Our evaluation demonstrate that Cyclist outperforms FPGA emulation, VCS, and C+,+, simulation on combined compile and run time for up to a billion cycles for a set of real-world hardware benchmarks.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114676159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203804
J. Elwell, Dmitry Evtyushkin, D. Ponomarev, N. Abu-Ghazaleh, Ryan D. Riley
In this paper we revisit the security properties of extended access control schemes that are used to protect application secrets from untrusted system software. We demonstrate the vulnerability of several recent proposals to a class of attacks we call mapping attacks. We argue that protection from such attacks requires verification of the address space integrity and propose the concept of self-verified address spaces (SVAS), where the applications themselves are made aware of the requested changes in the page mappings and are placed in charge of verifying them. SVAS equips an application with a customized verification model with several attractive functional and performance properties. We implemented the attacks and a complete prototype of SVAS in Linux and the QEMU emulator. Our results demonstrate that SVAS can prevent mapping attacks on extended access control systems with minimal performance overhead, hardware modifications and software complexity.
{"title":"Hardening extended memory access control schemes with self-verified address spaces","authors":"J. Elwell, Dmitry Evtyushkin, D. Ponomarev, N. Abu-Ghazaleh, Ryan D. Riley","doi":"10.1109/ICCAD.2017.8203804","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203804","url":null,"abstract":"In this paper we revisit the security properties of extended access control schemes that are used to protect application secrets from untrusted system software. We demonstrate the vulnerability of several recent proposals to a class of attacks we call mapping attacks. We argue that protection from such attacks requires verification of the address space integrity and propose the concept of self-verified address spaces (SVAS), where the applications themselves are made aware of the requested changes in the page mappings and are placed in charge of verifying them. SVAS equips an application with a customized verification model with several attractive functional and performance properties. We implemented the attacks and a complete prototype of SVAS in Linux and the QEMU emulator. Our results demonstrate that SVAS can prevent mapping attacks on extended access control systems with minimal performance overhead, hardware modifications and software complexity.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114973686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}