首页 > 最新文献

2020 57th ACM/IEEE Design Automation Conference (DAC)最新文献

英文 中文
AXI HyperConnect: A Predictable, Hypervisor-level Interconnect for Hardware Accelerators in FPGA SoC AXI HyperConnect: FPGA SoC中硬件加速器的可预测的管理程序级互连
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218652
Francesco Restuccia, Alessandro Biondi, Mauro Marinoni, Giorgiomaria Cicero, G. Buttazzo
FPGA-based system-on-chips (SoC) are powerful computing platforms to implement mixed-criticality systems that require both multiprocessing and hardware acceleration. Virtualization via hypervisor technologies is, de-facto, an effective technique to allow the co-existence of multiple execution domains with different criticality levels in isolation upon the same platform. Implementing such technologies on FPGA-based SoC poses new challenges: one of such is the isolation of hardware accelerators deployed on the FPGA fabric that belong to different domains but share common resources such as a memory bus. This paper proposes AXI HyperConnect, a hypervisor-level hardware component that allows interconnecting hardware accelerators to the same bus while ensuring isolation and predictability features. AXI HyperConnect has been implemented on modern FPGA-SoC by Xilinx and tested with real-world accelerators, including one for Deep Neural Network inference.
基于fpga的片上系统(SoC)是实现需要多处理和硬件加速的混合临界系统的强大计算平台。实际上,通过hypervisor技术实现虚拟化是一种有效的技术,它允许在同一平台上隔离地共存具有不同临界级别的多个执行域。在基于FPGA的SoC上实现这些技术带来了新的挑战:其中之一是部署在FPGA结构上的硬件加速器的隔离,这些硬件加速器属于不同的领域,但共享公共资源,如内存总线。本文提出了AXI HyperConnect,这是一个管理程序级别的硬件组件,允许将硬件加速器互连到同一总线,同时确保隔离和可预测性。AXI HyperConnect已在赛灵思的现代FPGA-SoC上实现,并在现实世界的加速器上进行了测试,其中包括深度神经网络推理加速器。
{"title":"AXI HyperConnect: A Predictable, Hypervisor-level Interconnect for Hardware Accelerators in FPGA SoC","authors":"Francesco Restuccia, Alessandro Biondi, Mauro Marinoni, Giorgiomaria Cicero, G. Buttazzo","doi":"10.1109/DAC18072.2020.9218652","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218652","url":null,"abstract":"FPGA-based system-on-chips (SoC) are powerful computing platforms to implement mixed-criticality systems that require both multiprocessing and hardware acceleration. Virtualization via hypervisor technologies is, de-facto, an effective technique to allow the co-existence of multiple execution domains with different criticality levels in isolation upon the same platform. Implementing such technologies on FPGA-based SoC poses new challenges: one of such is the isolation of hardware accelerators deployed on the FPGA fabric that belong to different domains but share common resources such as a memory bus. This paper proposes AXI HyperConnect, a hypervisor-level hardware component that allows interconnecting hardware accelerators to the same bus while ensuring isolation and predictability features. AXI HyperConnect has been implemented on modern FPGA-SoC by Xilinx and tested with real-world accelerators, including one for Deep Neural Network inference.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"52 18","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114027614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Late Breaking Results: Can You Hear Me? Towards an Ultra Low-Cost Hearing Screening Device 最新结果:你能听到我吗?迈向超低成本听力筛检装置
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218597
Nils Heitmann, Philipp H. Kindt, S. Chakraborty
Hearing screening devices emit an acoustic signal in the outer ear, which invokes a specific response from a healthy inner ear. However, the high cost of such devices prevents widely deploying them in schools or private homes, especially in developing countries. In this paper, we for the first time show that such tests are also feasible with a device that consists of only one speaker for emitting the signal and using the same speaker – now as a microphone – for also recording the response. Existing devices rely on a speaker and microphone pair, which makes them significantly more complex and costly. We further outline the embedded systems and signal processing challenges that such a setup entails. If successful, it has the potential to make hearing screening available to a much wider population in developing countries.
听力筛查设备在外耳发出声音信号,这引起健康内耳的特定反应。然而,这种设备的高成本阻碍了它们在学校或私人家庭的广泛部署,特别是在发展中国家。在这篇论文中,我们首次证明了这样的测试也是可行的,用一个只由一个扬声器组成的装置来发射信号,并使用同一个扬声器(现在作为麦克风)来记录响应。现有的设备依赖于一对扬声器和麦克风,这使得它们更加复杂和昂贵。我们进一步概述了嵌入式系统和信号处理的挑战,这种设置需要。如果成功,它有可能使发展中国家更广泛的人口获得听力筛查。
{"title":"Late Breaking Results: Can You Hear Me? Towards an Ultra Low-Cost Hearing Screening Device","authors":"Nils Heitmann, Philipp H. Kindt, S. Chakraborty","doi":"10.1109/DAC18072.2020.9218597","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218597","url":null,"abstract":"Hearing screening devices emit an acoustic signal in the outer ear, which invokes a specific response from a healthy inner ear. However, the high cost of such devices prevents widely deploying them in schools or private homes, especially in developing countries. In this paper, we for the first time show that such tests are also feasible with a device that consists of only one speaker for emitting the signal and using the same speaker – now as a microphone – for also recording the response. Existing devices rely on a speaker and microphone pair, which makes them significantly more complex and costly. We further outline the embedded systems and signal processing challenges that such a setup entails. If successful, it has the potential to make hearing screening available to a much wider population in developing countries.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122852437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Verification for Field-coupled Nanocomputing Circuits 场耦合纳米计算电路的验证
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218641
Marcel Walter, R. Wille, F. Sill, Daniel Große, R. Drechsler
With the decline of Moore’s Law, several post-CMOS technologies are currently under heavy consideration. Promising candidates can be found in the class of Field-coupled Nanocomputing (FCN) devices as they allow for highest processing performance with tremendously low energy dissipation. With upcoming design automation in this domain, the need for formal verification approaches arises. Unfortunately, FCN circuits come with certain domain-specific properties that render conventional methods for the verification non-applicable. In this paper, we investigate this issue and propose a verification approach for FCN circuits that addresses this problem. For the first time, this provides researchers and engineers with an automatic method that allows them to check whether an obtained FCN circuit design indeed implements the given/desired function. A prototype implementation demonstrates the applicability of the proposed approach.
随着摩尔定律的衰落,几种后cmos技术目前正受到重视。有希望的候选者可以在场耦合纳米计算(FCN)器件中找到,因为它们允许以极低的能量消耗获得最高的处理性能。随着这个领域中即将到来的设计自动化,出现了对正式验证方法的需求。不幸的是,FCN电路具有某些特定领域的特性,使得传统的验证方法不适用。在本文中,我们研究了这个问题,并提出了一种FCN电路的验证方法来解决这个问题。这首次为研究人员和工程师提供了一种自动方法,使他们能够检查获得的FCN电路设计是否确实实现了给定/期望的功能。一个原型实现证明了所提出方法的适用性。
{"title":"Verification for Field-coupled Nanocomputing Circuits","authors":"Marcel Walter, R. Wille, F. Sill, Daniel Große, R. Drechsler","doi":"10.1109/DAC18072.2020.9218641","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218641","url":null,"abstract":"With the decline of Moore’s Law, several post-CMOS technologies are currently under heavy consideration. Promising candidates can be found in the class of Field-coupled Nanocomputing (FCN) devices as they allow for highest processing performance with tremendously low energy dissipation. With upcoming design automation in this domain, the need for formal verification approaches arises. Unfortunately, FCN circuits come with certain domain-specific properties that render conventional methods for the verification non-applicable. In this paper, we investigate this issue and propose a verification approach for FCN circuits that addresses this problem. For the first time, this provides researchers and engineers with an automatic method that allows them to check whether an obtained FCN circuit design indeed implements the given/desired function. A prototype implementation demonstrates the applicability of the proposed approach.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131378803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Runtime Trust Evaluation and Hardware Trojan Detection Using On-Chip EM Sensors 基于片上电磁传感器的运行时信任评估和硬件木马检测
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218514
Jiaji He, Xiaolong Guo, Haocheng Ma, Yanjiang Liu, Yiqiang Zhao, Yier Jin
It has been widely demonstrated that the utilization of postdeployment trust evaluation approaches, such as side-channel measurements, along with statistical analysis methods is effective for detecting hardware Trojans in fabricated integrated circuits (ICs). However, more sophisticated Trojans proposed recently invalidate these methods with stealthy triggers and very-low side-channel signatures. Upon these challenges, in this paper, we propose an electromagnetic (EM) side-channel based post-fabrication trust evaluation framework which monitors EM radiations at runtime. The key component of the runtime trust evaluation framework is an on-chip EM sensor which can constantly measure and collect EM side-channel information of the target circuit. The simulation results validate the capability of the proposed framework in detecting stealthy hardware Trojans. Further, we fabricate an AES circuit protected by the proposed trust evaluation framework along with four different types of hardware Trojans. The measurements on the fabricated chips prove two key findings. First, the on-chip EM sensor can achieve a higher signal to noise ratio (SNR) and thus facilitate a better Trojan detection accuracy. Second, the trust evaluation framework can help detect different hardware Trojans at runtime.
已经广泛证明,利用部署后信任评估方法,如侧信道测量,以及统计分析方法,可以有效地检测制造集成电路(ic)中的硬件木马。然而,最近提出的更复杂的木马程序通过隐形触发器和非常低的侧信道签名使这些方法无效。针对这些挑战,在本文中,我们提出了一个基于电磁(EM)侧信道的制造后信任评估框架,该框架在运行时监测电磁辐射。运行时信任评估框架的关键部件是片上电磁传感器,该传感器能够持续测量和采集目标电路的电磁侧信道信息。仿真结果验证了该框架检测隐身硬件木马的能力。此外,我们制作了一个AES电路,该电路由所提出的信任评估框架以及四种不同类型的硬件木马保护。对制造芯片的测量证明了两个关键发现。首先,片上电磁传感器可以实现更高的信噪比(SNR),从而提高特洛伊木马的检测精度。其次,信任评估框架可以帮助在运行时检测不同的硬件木马。
{"title":"Runtime Trust Evaluation and Hardware Trojan Detection Using On-Chip EM Sensors","authors":"Jiaji He, Xiaolong Guo, Haocheng Ma, Yanjiang Liu, Yiqiang Zhao, Yier Jin","doi":"10.1109/DAC18072.2020.9218514","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218514","url":null,"abstract":"It has been widely demonstrated that the utilization of postdeployment trust evaluation approaches, such as side-channel measurements, along with statistical analysis methods is effective for detecting hardware Trojans in fabricated integrated circuits (ICs). However, more sophisticated Trojans proposed recently invalidate these methods with stealthy triggers and very-low side-channel signatures. Upon these challenges, in this paper, we propose an electromagnetic (EM) side-channel based post-fabrication trust evaluation framework which monitors EM radiations at runtime. The key component of the runtime trust evaluation framework is an on-chip EM sensor which can constantly measure and collect EM side-channel information of the target circuit. The simulation results validate the capability of the proposed framework in detecting stealthy hardware Trojans. Further, we fabricate an AES circuit protected by the proposed trust evaluation framework along with four different types of hardware Trojans. The measurements on the fabricated chips prove two key findings. First, the on-chip EM sensor can achieve a higher signal to noise ratio (SNR) and thus facilitate a better Trojan detection accuracy. Second, the trust evaluation framework can help detect different hardware Trojans at runtime.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132251344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Access Characteristic Guided Partition for Read Performance Improvement on Solid State Drives 提高固态硬盘读性能的访问特性引导分区
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218540
Yina Lv, Liang Shi, Qiao Li, C. Xue, E. Sha
Solid state drives (SSDs) are now widely deployed due to the development of high-density and low-cost NAND flash memories. Previous works have identified that the read performance of SSDs is degrading along with the development. One of the most critical reasons is the access interference between reads and writes, as the latest NAND flash memories have significant latency gap between reads and writes. This paper addresses this issue with the assistance of access characteristic guided SSD partitioning. First, several server workloads are studied and it is shown that reads and writes can be separated based on their access characteristics. Second, a set of techniques is proposed to place data judiciously for requests separation. Finally, a workload based SSD partitioning scheme is proposed to improve the read performance. The experimental results show that the proposed solution can improve read performance by 36% on average compared with the state-of-the-art solutions.
由于高密度和低成本NAND闪存的发展,固态硬盘(ssd)现在被广泛部署。以往的研究表明,随着技术的发展,ssd的读性能逐渐下降。其中一个最关键的原因是读写之间的访问干扰,因为最新的NAND闪存具有明显的读写延迟差距。本文借助访问特性引导的SSD分区解决了这个问题。首先,研究了几种服务器工作负载,并表明可以根据其访问特征将读取和写入分开。其次,提出了一组技术来明智地放置数据以实现请求分离。最后,提出了一种基于工作负载的SSD分区方案,以提高SSD的读性能。实验结果表明,与现有方案相比,该方案可将读取性能平均提高36%。
{"title":"Access Characteristic Guided Partition for Read Performance Improvement on Solid State Drives","authors":"Yina Lv, Liang Shi, Qiao Li, C. Xue, E. Sha","doi":"10.1109/DAC18072.2020.9218540","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218540","url":null,"abstract":"Solid state drives (SSDs) are now widely deployed due to the development of high-density and low-cost NAND flash memories. Previous works have identified that the read performance of SSDs is degrading along with the development. One of the most critical reasons is the access interference between reads and writes, as the latest NAND flash memories have significant latency gap between reads and writes. This paper addresses this issue with the assistance of access characteristic guided SSD partitioning. First, several server workloads are studied and it is shown that reads and writes can be separated based on their access characteristics. Second, a set of techniques is proposed to place data judiciously for requests separation. Finally, a workload based SSD partitioning scheme is proposed to improve the read performance. The experimental results show that the proposed solution can improve read performance by 36% on average compared with the state-of-the-art solutions.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132350540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Invited: Software Defined Accelerators From Learning Tools Environment 诚邀:来自学习工具环境的软件定义加速器
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218489
Antonino Tumeo, Marco Minutoli, Vito Giovanni Castellana, J. Manzano, Vinay C. Amatya, D. Brooks, Gu-Yeon Wei
Next generation systems, such as edge devices, will need to provide efficient processing of machine learning (ML) algorithms along several metrics, including energy, performance, area, and latency. However, the quickly evolving field of ML makes it extremely difficult to generate accelerators able to support a wide variety of algorithms. At the same time, designing accelerators in hardware description languages (HDLs) by hand is hard and time consuming, and does not allow quick exploration of the design space. In this paper we present the Software Defined Accelerators From Learning Tools Environment (SODALITE), an automated open source high-level ML framework-to-verilog compiler targeting ML Application-Specific Integrated Circuits (ASICs) chiplets. The SODALITE approach will implement optimal designs by seamlessly combining custom components generated through high-level synthesis (HLS) with templated and fully tunable Intellectual Properties (IPs) and macros, integrated in an extendable resource library. Through a closed loop design space exploration engine, developers will be able to quickly explore their hardware designs along different dimensions.
下一代系统,如边缘设备,将需要根据几个指标,包括能源、性能、面积和延迟,提供有效的机器学习(ML)算法处理。然而,快速发展的机器学习领域使得生成能够支持各种算法的加速器变得极其困难。同时,用硬件描述语言(hdl)手工设计加速器既困难又耗时,而且不允许快速探索设计空间。在本文中,我们介绍了来自学习工具环境(SODALITE)的软件定义加速器,这是一个自动化的开源高级ML框架到verilog编译器,针对ML专用集成电路(asic)小芯片。SODALITE方法通过无缝地将高级合成(HLS)生成的定制组件与模板和完全可调的知识产权(ip)和宏相结合,实现最佳设计,并集成在可扩展资源库中。通过闭环设计空间探索引擎,开发人员将能够沿着不同的维度快速探索他们的硬件设计。
{"title":"Invited: Software Defined Accelerators From Learning Tools Environment","authors":"Antonino Tumeo, Marco Minutoli, Vito Giovanni Castellana, J. Manzano, Vinay C. Amatya, D. Brooks, Gu-Yeon Wei","doi":"10.1109/DAC18072.2020.9218489","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218489","url":null,"abstract":"Next generation systems, such as edge devices, will need to provide efficient processing of machine learning (ML) algorithms along several metrics, including energy, performance, area, and latency. However, the quickly evolving field of ML makes it extremely difficult to generate accelerators able to support a wide variety of algorithms. At the same time, designing accelerators in hardware description languages (HDLs) by hand is hard and time consuming, and does not allow quick exploration of the design space. In this paper we present the Software Defined Accelerators From Learning Tools Environment (SODALITE), an automated open source high-level ML framework-to-verilog compiler targeting ML Application-Specific Integrated Circuits (ASICs) chiplets. The SODALITE approach will implement optimal designs by seamlessly combining custom components generated through high-level synthesis (HLS) with templated and fully tunable Intellectual Properties (IPs) and macros, integrated in an extendable resource library. Through a closed loop design space exploration engine, developers will be able to quickly explore their hardware designs along different dimensions.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122882004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Centaur: Hybrid Processing in On/Off-chip Memory Architecture for Graph Analytics 半人马:用于图形分析的片上/片外内存架构中的混合处理
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218624
Abraham Addisie, V. Bertacco
The increased use of graph algorithms in diverse fields has highlighted their inefficiencies in current chip-multiprocessor (CMP) architectures, primarily due to their seemingly random-access patterns to off-chip memory. Recently, two families of solutions have been proposed: 1) solutions that offload operations generated by all vertices from the processor cores to off-chip memory; and 2) solutions that offload only operations generated by high-degree vertices to dedicated on-chip memory, while the cores continue to process the work related to the remaining vertices. Neither approach is optimal over the full range of vertex’s degrees. Thus, in this work, we propose Centaur, a novel architecture that processes operations on vertex data in on- and off-chip memory. Centaur utilizes a vertex’s degree as a proxy to determine whether to process related operations in on- or off-chip memory. Centaur manages to provide up to 4.0× improvement in performance and 3.8× in energy benefits, compared to a baseline CMP, and up to a 2.0× performance boost over state-of-the-art specialized solutions.
图算法在不同领域的使用越来越多,这突出了它们在当前芯片多处理器(CMP)架构中的低效率,主要是由于它们对片外存储器的看似随机的访问模式。最近,提出了两类解决方案:1)将所有顶点产生的操作从处理器内核卸载到片外存储器;2)只将高度顶点产生的操作卸载到专用片上存储器的解决方案,而内核继续处理与剩余顶点相关的工作。这两种方法在顶点度的整个范围内都不是最优的。因此,在这项工作中,我们提出了Centaur,这是一种新颖的架构,可以在片内和片外存储器中处理顶点数据的操作。Centaur利用顶点的度作为代理来确定是否在片内或片外内存中处理相关操作。与基准CMP相比,Centaur能够提供高达4.0倍的性能提升和3.8倍的能源效益,并且比最先进的专业解决方案提供高达2.0倍的性能提升。
{"title":"Centaur: Hybrid Processing in On/Off-chip Memory Architecture for Graph Analytics","authors":"Abraham Addisie, V. Bertacco","doi":"10.1109/DAC18072.2020.9218624","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218624","url":null,"abstract":"The increased use of graph algorithms in diverse fields has highlighted their inefficiencies in current chip-multiprocessor (CMP) architectures, primarily due to their seemingly random-access patterns to off-chip memory. Recently, two families of solutions have been proposed: 1) solutions that offload operations generated by all vertices from the processor cores to off-chip memory; and 2) solutions that offload only operations generated by high-degree vertices to dedicated on-chip memory, while the cores continue to process the work related to the remaining vertices. Neither approach is optimal over the full range of vertex’s degrees. Thus, in this work, we propose Centaur, a novel architecture that processes operations on vertex data in on- and off-chip memory. Centaur utilizes a vertex’s degree as a proxy to determine whether to process related operations in on- or off-chip memory. Centaur manages to provide up to 4.0× improvement in performance and 3.8× in energy benefits, compared to a baseline CMP, and up to a 2.0× performance boost over state-of-the-art specialized solutions.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133736255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Developing Privacy-preserving AI Systems: The Lessons learned 开发保护隐私的人工智能系统:经验教训
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218662
Huili Chen, S. Hussain, Fabian Boemer, Emmanuel Stapf, A. Sadeghi, F. Koushanfar, Rosario Cammarota
Advances in customers' data privacy laws create pressures and pain points across the entire lifecycle of AI products. Working figures such as data scientists and data engineers need to account for the correct use of privacy-enhancing technologies such as homomorphic encryption, secure multi-party computation, and trusted execution environment when they develop, test and deploy products embedding AI models while providing data protection guarantees. In this work, we share the lessons learned during the development of frameworks to aid data scientists and data engineers to map their optimized workloads onto privacy-enhancing technologies seamlessly and correctly.
客户数据隐私法的进步给人工智能产品的整个生命周期带来了压力和痛点。数据科学家和数据工程师等工作人员在开发、测试和部署嵌入人工智能模型的产品时,在提供数据保护保证的同时,需要考虑到正确使用同态加密、安全多方计算、可信执行环境等增强隐私的技术。在这项工作中,我们分享了在框架开发过程中获得的经验教训,以帮助数据科学家和数据工程师将其优化的工作负载无缝且正确地映射到隐私增强技术上。
{"title":"Developing Privacy-preserving AI Systems: The Lessons learned","authors":"Huili Chen, S. Hussain, Fabian Boemer, Emmanuel Stapf, A. Sadeghi, F. Koushanfar, Rosario Cammarota","doi":"10.1109/DAC18072.2020.9218662","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218662","url":null,"abstract":"Advances in customers' data privacy laws create pressures and pain points across the entire lifecycle of AI products. Working figures such as data scientists and data engineers need to account for the correct use of privacy-enhancing technologies such as homomorphic encryption, secure multi-party computation, and trusted execution environment when they develop, test and deploy products embedding AI models while providing data protection guarantees. In this work, we share the lessons learned during the development of frameworks to aid data scientists and data engineers to map their optimized workloads onto privacy-enhancing technologies seamlessly and correctly.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132863040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
CryptoPIM: In-memory Acceleration for Lattice-based Cryptographic Hardware 基于格的加密硬件的内存加速
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218730
Hamid Nejatollahi, Saransh Gupta, M. Imani, T. Simunic, Rosario Cammarota, N. Dutt
Quantum computers promise to solve hard mathematical problems such as integer factorization and discrete logarithms in polynomial time, making standardized public-key cryptosystems insecure. Lattice-Based Cryptography (LBC) is a promising post-quantum public key cryptographic protocol that could replace standardized public key cryptography, thanks to the inherent post-quantum resistant properties, efficiency, and versatility. A key mathematical tool in LBC is the Number Theoretic Transform (NTT), a common method to compute polynomial multiplication. It is the most compute-intensive routine and requires acceleration for practical deployment of LBC protocols. In this paper, we propose CryptoPIM, a high-throughput Processing In-Memory (PIM) accelerator for NTT-based polynomial multiplier with the support of polynomials with degrees up to 32k. Compared to the fastest FPGA implementation of an NTT-based multiplier, CryptoPIM achieves on average 31x throughput improvement with the same energy and only 28% performance reduction, thereby showing promise for practical deployment of LBC.
量子计算机有望在多项式时间内解决整数分解和离散对数等数学难题,使标准化的公钥密码系统变得不安全。基于Lattice-Based Cryptography (LBC)是一种很有前途的后量子公钥加密协议,由于其固有的抗后量子特性、效率和通用性,它可以取代标准化的公钥加密。数论变换(NTT)是LBC中一个重要的数学工具,它是一种计算多项式乘法的常用方法。它是计算最密集的例程,需要加速LBC协议的实际部署。在本文中,我们提出了CryptoPIM,一个基于ntt的多项式乘子的高吞吐量内存处理(PIM)加速器,支持度高达32k的多项式。与基于ntt的乘法器的最快FPGA实现相比,CryptoPIM在相同的能量下实现了平均31倍的吞吐量提高,而性能仅降低了28%,因此显示出LBC实际部署的希望。
{"title":"CryptoPIM: In-memory Acceleration for Lattice-based Cryptographic Hardware","authors":"Hamid Nejatollahi, Saransh Gupta, M. Imani, T. Simunic, Rosario Cammarota, N. Dutt","doi":"10.1109/DAC18072.2020.9218730","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218730","url":null,"abstract":"Quantum computers promise to solve hard mathematical problems such as integer factorization and discrete logarithms in polynomial time, making standardized public-key cryptosystems insecure. Lattice-Based Cryptography (LBC) is a promising post-quantum public key cryptographic protocol that could replace standardized public key cryptography, thanks to the inherent post-quantum resistant properties, efficiency, and versatility. A key mathematical tool in LBC is the Number Theoretic Transform (NTT), a common method to compute polynomial multiplication. It is the most compute-intensive routine and requires acceleration for practical deployment of LBC protocols. In this paper, we propose CryptoPIM, a high-throughput Processing In-Memory (PIM) accelerator for NTT-based polynomial multiplier with the support of polynomials with degrees up to 32k. Compared to the fastest FPGA implementation of an NTT-based multiplier, CryptoPIM achieves on average 31x throughput improvement with the same energy and only 28% performance reduction, thereby showing promise for practical deployment of LBC.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133527901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
[Copyright notice] (版权)
Pub Date : 2020-07-01 DOI: 10.1109/dac18072.2020.9218731
{"title":"[Copyright notice]","authors":"","doi":"10.1109/dac18072.2020.9218731","DOIUrl":"https://doi.org/10.1109/dac18072.2020.9218731","url":null,"abstract":"","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120958458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2020 57th ACM/IEEE Design Automation Conference (DAC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1