首页 > 最新文献

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)最新文献

英文 中文
Ensemble Learning for Effective Run-Time Hardware-Based Malware Detection: A Comprehensive Analysis and Classification 基于硬件的有效运行时恶意软件检测集成学习:综合分析与分类
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196047
H. Sayadi, Nisarg Patel, Sai Manoj P D, Avesta Sasan, S. Rafatirad, H. Homayoun
Malware detection at the hardware level has emerged recently as a promising solution to improve the security of computing systems. Hardware-based malware detectors take advantage of Machine Learning (ML) classifiers to detect pattern of malicious applications at run-time. These ML classifiers are trained using low-level features such as processor Hardware Performance Counters (HPCs) data which are captured at run-time to appropriately represent the application behaviour. Recent studies show the potential of standard ML-based classifiers for detecting malware using analysis of large number of microarchitectural events, more than the very limited number of HPC registers available in today’s microprocessors which varies from 2 to 8. This results in executing the application more than once to collect the required data, which in turn makes the solution less practical for effective run-time malware detection. Our results show a clear trade-off between the performance of standard ML classifiers and the number and diversity of HPCs available in modern microprocessors. This paper proposes a machine learning-based solution to break this trade-off to realize effective run-time detection of malware. We propose ensemble learning techniques to improve the performance of the hardware-based malware detectors despite using a very small number of microarchitectural events that are captured at run-time by existing HPCs, eliminating the need to run an application several times. For this purpose, eight robust machine learning models and two well-known ensemble learning classifiers applied on all studied ML models (sixteen in total) are implemented for malware detection and precisely compared and characterized in terms of detection accuracy, robustness, performance (accuracy × robustness), and hardware overheads. The experimental results show that the proposed ensemble learning-based malware detection with just 2 HPCs using ensemble technique outperforms standard classifiers with 8 HPCs by up to 17%. In addition, it can match the robustness and performance of standard ML-based detectors with 16 HPCs while using only 4 HPCs allowing effective run-time detection of malware.
最近,硬件级别的恶意软件检测作为一种有前途的解决方案出现,以提高计算系统的安全性。基于硬件的恶意软件检测器利用机器学习(ML)分类器在运行时检测恶意应用程序的模式。这些机器学习分类器使用低级特征进行训练,例如处理器硬件性能计数器(hpc)数据,这些数据在运行时捕获,以适当地表示应用程序行为。最近的研究表明,标准的基于ml的分类器通过分析大量微架构事件来检测恶意软件的潜力,超过了当今微处理器中可用的HPC寄存器的非常有限的数量,从2到8不等。这导致多次执行应用程序来收集所需的数据,这反过来又使解决方案在有效的运行时恶意软件检测方面变得不那么实用。我们的结果显示了标准ML分类器的性能与现代微处理器中可用的hpc的数量和多样性之间的明显权衡。本文提出了一种基于机器学习的解决方案来打破这种权衡,以实现有效的恶意软件运行时检测。我们提出集成学习技术,以提高基于硬件的恶意软件检测器的性能,尽管使用非常少量的微架构事件,这些事件是由现有的高性能计算机在运行时捕获的,从而消除了多次运行应用程序的需要。为此,实现了八个鲁棒机器学习模型和两个众所周知的集成学习分类器,应用于所有研究的ML模型(总共16个),用于恶意软件检测,并在检测精度、鲁棒性、性能(准确性×鲁棒性)和硬件开销方面进行了精确的比较和表征。实验结果表明,使用集成技术的基于集成学习的恶意软件检测仅使用2个hpc,比使用8个hpc的标准分类器性能高出17%。此外,它可以匹配标准的基于ml的检测器的鲁棒性和性能与16个hpc,而仅使用4个hpc允许有效的恶意软件运行时检测。
{"title":"Ensemble Learning for Effective Run-Time Hardware-Based Malware Detection: A Comprehensive Analysis and Classification","authors":"H. Sayadi, Nisarg Patel, Sai Manoj P D, Avesta Sasan, S. Rafatirad, H. Homayoun","doi":"10.1145/3195970.3196047","DOIUrl":"https://doi.org/10.1145/3195970.3196047","url":null,"abstract":"Malware detection at the hardware level has emerged recently as a promising solution to improve the security of computing systems. Hardware-based malware detectors take advantage of Machine Learning (ML) classifiers to detect pattern of malicious applications at run-time. These ML classifiers are trained using low-level features such as processor Hardware Performance Counters (HPCs) data which are captured at run-time to appropriately represent the application behaviour. Recent studies show the potential of standard ML-based classifiers for detecting malware using analysis of large number of microarchitectural events, more than the very limited number of HPC registers available in today’s microprocessors which varies from 2 to 8. This results in executing the application more than once to collect the required data, which in turn makes the solution less practical for effective run-time malware detection. Our results show a clear trade-off between the performance of standard ML classifiers and the number and diversity of HPCs available in modern microprocessors. This paper proposes a machine learning-based solution to break this trade-off to realize effective run-time detection of malware. We propose ensemble learning techniques to improve the performance of the hardware-based malware detectors despite using a very small number of microarchitectural events that are captured at run-time by existing HPCs, eliminating the need to run an application several times. For this purpose, eight robust machine learning models and two well-known ensemble learning classifiers applied on all studied ML models (sixteen in total) are implemented for malware detection and precisely compared and characterized in terms of detection accuracy, robustness, performance (accuracy × robustness), and hardware overheads. The experimental results show that the proposed ensemble learning-based malware detection with just 2 HPCs using ensemble technique outperforms standard classifiers with 8 HPCs by up to 17%. In addition, it can match the robustness and performance of standard ML-based detectors with 16 HPCs while using only 4 HPCs allowing effective run-time detection of malware.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"42 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85086347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
Edge-Cloud Collaborative Processing for Intelligent Internet of Things: A Case Study on Smart Surveillance 面向智能物联网的边缘云协同处理:以智能监控为例
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196036
B. Mudassar, Jong Hwan Ko, S. Mukhopadhyay
Limited processing power and memory prevent realization of state of the art algorithms on the edge level. Offloading computations to the cloud comes with tradeoffs as compression techniques employed to conserve transmission bandwidth and energy adversely impact accuracy of the algorithm. In this paper, we propose collaborative processing to actively guide the output of the sensor to improve performance on the end application. We apply this methodology to smart surveillance specifically the task of object detection from video. Perceptual quality and object detection performance is characterized and improved under a variety of channel conditions.
有限的处理能力和内存阻碍了在边缘水平上实现最先进的算法。将计算卸载到云端需要权衡,因为为了节省传输带宽和能量而采用的压缩技术会对算法的准确性产生不利影响。在本文中,我们提出协同处理来积极引导传感器的输出,以提高最终应用的性能。我们将这种方法应用于智能监控,特别是从视频中检测目标的任务。在各种信道条件下,对感知质量和目标检测性能进行了表征和改进。
{"title":"Edge-Cloud Collaborative Processing for Intelligent Internet of Things: A Case Study on Smart Surveillance","authors":"B. Mudassar, Jong Hwan Ko, S. Mukhopadhyay","doi":"10.1145/3195970.3196036","DOIUrl":"https://doi.org/10.1145/3195970.3196036","url":null,"abstract":"Limited processing power and memory prevent realization of state of the art algorithms on the edge level. Offloading computations to the cloud comes with tradeoffs as compression techniques employed to conserve transmission bandwidth and energy adversely impact accuracy of the algorithm. In this paper, we propose collaborative processing to actively guide the output of the sensor to improve performance on the end application. We apply this methodology to smart surveillance specifically the task of object detection from video. Perceptual quality and object detection performance is characterized and improved under a variety of channel conditions.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"18 4 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90780502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Canonical Computation without Canonical Representation 没有规范表示的规范计算
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196006
A. Mishchenko, R. Brayton, A. Petkovska, Mathias Soeken, L. Amarù, A. Domic
A representation of a Boolean function is canonical if, given a variable order, only one instance of the representation is possible for the function. A computation is canonical if the result depends only on the Boolean function and a variable order, and does not depend on how the function is represented and how the computation is implemented.In the context of Boolean satisfiability (SAT), canonicity of the computation implies that the result (a satisfying assignment for satisfiable instances and an abstraction of the unsat core for unsatisfiable instances) does not depend on the functional representation and the SAT solver used.This paper shows that SAT-based computations can be made canonical, even though the SAT solver is not using a canonical data structure. This brings advantages in EDA applications, such as irredundant sum of product (ISOP) computation, counter-example minimization, etc, where the uniqueness of solutions and/or improved quality of results justify a runtime overhead.
如果给定一个变量顺序,布尔函数的表示只有一个可能的实例,则该函数的表示是规范的。如果结果仅依赖于布尔函数和变量顺序,而不依赖于函数的表示方式和计算的实现方式,则计算是规范的。在布尔可满足性(SAT)的背景下,计算的规定性意味着结果(可满足实例的令人满意的赋值和不可满足实例的非SAT核心的抽象)不依赖于所使用的功能表示和SAT求解器。本文表明,基于SAT的计算可以规范化,即使SAT求解器不使用规范化数据结构。这在EDA应用程序中带来了优势,例如无冗余乘积和(ISOP)计算,反例最小化等,其中解决方案的唯一性和/或改进的结果质量证明了运行时开销是合理的。
{"title":"Canonical Computation without Canonical Representation","authors":"A. Mishchenko, R. Brayton, A. Petkovska, Mathias Soeken, L. Amarù, A. Domic","doi":"10.1145/3195970.3196006","DOIUrl":"https://doi.org/10.1145/3195970.3196006","url":null,"abstract":"A representation of a Boolean function is canonical if, given a variable order, only one instance of the representation is possible for the function. A computation is canonical if the result depends only on the Boolean function and a variable order, and does not depend on how the function is represented and how the computation is implemented.In the context of Boolean satisfiability (SAT), canonicity of the computation implies that the result (a satisfying assignment for satisfiable instances and an abstraction of the unsat core for unsatisfiable instances) does not depend on the functional representation and the SAT solver used.This paper shows that SAT-based computations can be made canonical, even though the SAT solver is not using a canonical data structure. This brings advantages in EDA applications, such as irredundant sum of product (ISOP) computation, counter-example minimization, etc, where the uniqueness of solutions and/or improved quality of results justify a runtime overhead.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"126 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75827793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Context-Aware Dataflow Adaptation Technique for Low-Power Multi-Core Embedded Systems 低功耗多核嵌入式系统的上下文感知数据流自适应技术
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196015
Hyeonseok Jung, Hoeseok Yang
Today’s embedded systems operate under increasingly dynamic conditions. First, computational workloads can be either fluctuating or adjustable. Moreover, as many devices are battery-powered, it is common to have runtime power management technique, which results in dynamic power budget. This paper presents a design methodology for multi-core systems, based on dataflow specification, that can deal with various contexts. We optimize the original dataflow considering various working conditions, then, autonomously adapt it to a pre-defined optimal form in response to context changes. We show the effectiveness of the proposed technique with a real-life case study and synthetic benchmarks.
当今的嵌入式系统在越来越动态的条件下运行。首先,计算工作负载可以是波动的,也可以是可调整的。此外,由于许多设备都是电池供电的,因此通常会采用运行时电源管理技术,从而导致动态电源预算。本文提出了一种基于数据流规范的多核系统设计方法,可以处理不同的上下文环境。我们考虑各种工作条件对原始数据流进行优化,然后根据上下文变化自主地将其调整为预定义的最佳形式。我们通过实际案例研究和综合基准来展示所提出技术的有效性。
{"title":"Context-Aware Dataflow Adaptation Technique for Low-Power Multi-Core Embedded Systems","authors":"Hyeonseok Jung, Hoeseok Yang","doi":"10.1145/3195970.3196015","DOIUrl":"https://doi.org/10.1145/3195970.3196015","url":null,"abstract":"Today’s embedded systems operate under increasingly dynamic conditions. First, computational workloads can be either fluctuating or adjustable. Moreover, as many devices are battery-powered, it is common to have runtime power management technique, which results in dynamic power budget. This paper presents a design methodology for multi-core systems, based on dataflow specification, that can deal with various contexts. We optimize the original dataflow considering various working conditions, then, autonomously adapt it to a pre-defined optimal form in response to context changes. We show the effectiveness of the proposed technique with a real-life case study and synthetic benchmarks.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"15 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82915863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture 自动加速器生成和优化与可组合,并行和管道架构
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3195999
J. Cong, Peng Wei, Cody Hao Yu, Peng Zhang
CPU-FPGA heterogeneous architectures feature flexible acceleration of many workloads to advance computational capabilities and energy efficiency in today’s datacenters. This advantage, however, is often overshadowed by the poor programmability of FPGAs. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. In this paper we propose the composable, parallel and pipeline (CPP) microarchitecture as an accelerator design template to substantially reduce the design space. Also, by introducing the CPP analytical model to capture the performance-resource trade-offs, we achieve efficient, analytical-based design space exploration. Furthermore, we develop the AutoAccel framework to automate the entire accelerator generation process. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels.
CPU-FPGA异构架构可灵活加速许多工作负载,从而提高当今数据中心的计算能力和能源效率。然而,这种优势往往被fpga糟糕的可编程性所掩盖。尽管近年来在高级综合(high-level synthesis, HLS)方面的进展显著提高了FPGA的可编程性,但程序员仍然面临着在巨大的设计空间中确定最佳设计配置的挑战。本文提出了可组合、并行和流水线(CPP)微体系结构作为加速器设计模板,大大减少了设计空间。此外,通过引入CPP分析模型来捕获性能-资源权衡,我们实现了高效的、基于分析的设计空间探索。此外,我们开发了AutoAccel框架,使整个加速器生成过程自动化。我们的实验表明,在广泛的计算内核类别中,autoaccelerator生成的加速器的性能比相应的软件实现平均高出72倍。
{"title":"Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture","authors":"J. Cong, Peng Wei, Cody Hao Yu, Peng Zhang","doi":"10.1145/3195970.3195999","DOIUrl":"https://doi.org/10.1145/3195970.3195999","url":null,"abstract":"CPU-FPGA heterogeneous architectures feature flexible acceleration of many workloads to advance computational capabilities and energy efficiency in today’s datacenters. This advantage, however, is often overshadowed by the poor programmability of FPGAs. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. In this paper we propose the composable, parallel and pipeline (CPP) microarchitecture as an accelerator design template to substantially reduce the design space. Also, by introducing the CPP analytical model to capture the performance-resource trade-offs, we achieve efficient, analytical-based design space exploration. Furthermore, we develop the AutoAccel framework to automate the entire accelerator generation process. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"6 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79520681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Automated Interpretation and Reduction of In-Vehicle Network Traces at a Large Scale 大规模车载网络轨迹的自动解释和缩减
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196000
Artur Mrowca, Thomas Pramsohler, S. Steinhorst, U. Baumgarten
In modern vehicles, high communication complexity requires cost-effective integration tests such as data-driven system verification with in-vehicle network traces. With the growing amount of traces, distributable Big Data solutions for analyses become essential to inspect massive amounts of traces. Such traces need to be processed systematically using automated procedures, as manual steps become infeasible due to loading and processing times in existing tools. Further, trace analyses require multiple domains to verify the system in terms of different aspects (e.g., specific functions) and thus, require solutions that can be parameterized towards respective domains. Existing solutions are not able to process such trace amounts in a flexible and automated manner. To overcome this, we introduce a fully automated and parallelizable end-to-end preprocessing framework that allows to analyze massive in-vehicle network traces. Being parameterized per domain, trace data is systematically reduced and extended with domain knowledge, yielding a representation targeted towards domain-specific system analyses. We show that our approach outperforms existing solutions in terms of execution time and extensibility by evaluating our approach on three real-world data sets from the automotive industry.
在现代车辆中,高通信复杂性需要具有成本效益的集成测试,例如带有车载网络轨迹的数据驱动系统验证。随着痕迹数量的增加,用于分析的分布式大数据解决方案对于检查大量痕迹变得至关重要。由于现有工具的加载和处理时间,手动步骤变得不可行,因此需要使用自动化过程系统地处理这些跟踪。此外,跟踪分析需要多个领域来根据不同的方面(例如,特定的功能)验证系统,因此,需要可以对各自领域参数化的解决方案。现有的解决方案无法以灵活和自动化的方式处理这种痕量。为了克服这个问题,我们引入了一个完全自动化和可并行的端到端预处理框架,允许分析大量的车载网络痕迹。在每个领域参数化之后,跟踪数据可以用领域知识系统地简化和扩展,从而产生针对特定领域系统分析的表示。通过在汽车行业的三个真实数据集上评估我们的方法,我们证明我们的方法在执行时间和可扩展性方面优于现有的解决方案。
{"title":"Automated Interpretation and Reduction of In-Vehicle Network Traces at a Large Scale","authors":"Artur Mrowca, Thomas Pramsohler, S. Steinhorst, U. Baumgarten","doi":"10.1145/3195970.3196000","DOIUrl":"https://doi.org/10.1145/3195970.3196000","url":null,"abstract":"In modern vehicles, high communication complexity requires cost-effective integration tests such as data-driven system verification with in-vehicle network traces. With the growing amount of traces, distributable Big Data solutions for analyses become essential to inspect massive amounts of traces. Such traces need to be processed systematically using automated procedures, as manual steps become infeasible due to loading and processing times in existing tools. Further, trace analyses require multiple domains to verify the system in terms of different aspects (e.g., specific functions) and thus, require solutions that can be parameterized towards respective domains. Existing solutions are not able to process such trace amounts in a flexible and automated manner. To overcome this, we introduce a fully automated and parallelizable end-to-end preprocessing framework that allows to analyze massive in-vehicle network traces. Being parameterized per domain, trace data is systematically reduced and extended with domain knowledge, yielding a representation targeted towards domain-specific system analyses. We show that our approach outperforms existing solutions in terms of execution time and extensibility by evaluating our approach on three real-world data sets from the automotive industry.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"100 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72873714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Design and Architectural Co-optimization of Monolithic 3D Liquid State Machine-based Neuromorphic Processor 基于单片三维液态机的神经形态处理器设计与架构协同优化
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196024
B. W. Ku, Yu Liu, Yingyezhe Jin, S. Samal, Peng Li, S. Lim
A liquid state machine (LSM) is a powerful recurrent spiking neural network shown to be effective in various learning tasks including speech recognition. In this work, we investigate design and architectural co-optimization to further improve the area-energy efficiency of LSM-based speech recognition processors with monolithic 3D IC (M3D) technology. We conduct fine-grained tier partitioning, where individual neurons are folded, and explore the impact of shared memory architecture and synaptic model complexity on the power-performance-area-accuracy (PPA) benefit of M3D LSM-based speech recognition. In training and classification tasks using spoken English letters, we obtain up to 70.0% PPAA savings over 2D ICs.
液态机(LSM)是一种强大的循环尖峰神经网络,在包括语音识别在内的各种学习任务中表现出有效的效果。在这项工作中,我们研究了设计和架构的协同优化,以进一步提高基于lsm的单片3D IC (M3D)技术的语音识别处理器的面积能源效率。我们进行了细粒度的层划分,其中单个神经元折叠,并探讨了共享内存架构和突触模型复杂性对基于M3D lsm的语音识别功率-性能-面积-精度(PPA)效益的影响。在使用英语口语字母的训练和分类任务中,我们比2D ic节省了高达70.0%的PPAA。
{"title":"Design and Architectural Co-optimization of Monolithic 3D Liquid State Machine-based Neuromorphic Processor","authors":"B. W. Ku, Yu Liu, Yingyezhe Jin, S. Samal, Peng Li, S. Lim","doi":"10.1145/3195970.3196024","DOIUrl":"https://doi.org/10.1145/3195970.3196024","url":null,"abstract":"A liquid state machine (LSM) is a powerful recurrent spiking neural network shown to be effective in various learning tasks including speech recognition. In this work, we investigate design and architectural co-optimization to further improve the area-energy efficiency of LSM-based speech recognition processors with monolithic 3D IC (M3D) technology. We conduct fine-grained tier partitioning, where individual neurons are folded, and explore the impact of shared memory architecture and synaptic model complexity on the power-performance-area-accuracy (PPA) benefit of M3D LSM-based speech recognition. In training and classification tasks using spoken English letters, we obtain up to 70.0% PPAA savings over 2D ICs.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"4324 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76565523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
TAO: Techniques for Algorithm-Level Obfuscation during High-Level Synthesis 高级合成过程中的算法级混淆技术
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196126
C. Pilato, F. Regazzoni, R. Karri, S. Garg
Intellectual Property (IP) theft costs semiconductor design companies billions of dollars every year. Unauthorized IP copies start from reverse engineering the given chip. Existing techniques to protect against IP theft aim to hide the IC’s functionality, but focus on manipulating the HDL descriptions. We propose TAO as a comprehensive solution based on high-level synthesis to raise the abstraction level and apply algorithmic obfuscation automatically. TAO includes several transformations that make the component hard to reverse engineer during chip fabrication, while a key is later inserted to unlock the functionality. Finally, this is a promising approach to obfuscate large-scale designs despite the hardware overhead needed to implement the obfuscation.
知识产权(IP)盗窃每年使半导体设计公司损失数十亿美元。未经授权的IP拷贝从对给定芯片进行逆向工程开始。现有的防止知识产权盗窃的技术旨在隐藏IC的功能,但重点是操纵HDL描述。本文提出了一种基于高级综合的综合解决方案,以提高抽象层次和自动应用算法混淆。TAO包括几个转换,使组件在芯片制造过程中难以逆向工程,而稍后插入钥匙以解锁功能。最后,这是一种很有前途的方法来混淆大规模设计,尽管实现混淆需要硬件开销。
{"title":"TAO: Techniques for Algorithm-Level Obfuscation during High-Level Synthesis","authors":"C. Pilato, F. Regazzoni, R. Karri, S. Garg","doi":"10.1145/3195970.3196126","DOIUrl":"https://doi.org/10.1145/3195970.3196126","url":null,"abstract":"Intellectual Property (IP) theft costs semiconductor design companies billions of dollars every year. Unauthorized IP copies start from reverse engineering the given chip. Existing techniques to protect against IP theft aim to hide the IC’s functionality, but focus on manipulating the HDL descriptions. We propose TAO as a comprehensive solution based on high-level synthesis to raise the abstraction level and apply algorithmic obfuscation automatically. TAO includes several transformations that make the component hard to reverse engineer during chip fabrication, while a key is later inserted to unlock the functionality. Finally, this is a promising approach to obfuscate large-scale designs despite the hardware overhead needed to implement the obfuscation.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"18 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74429328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
A Collaborative Defense Against Wear Out Attacks in Non-Volatile Processors 针对非易失性处理器损耗攻击的协同防御
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196825
P. Cronin, Chengmo Yang, Yongpan Liu
While the Internet of Things (IoT) keeps advancing, its full adoption is continually blocked by power delivery problems. One promising solution is Non-Volatile (NV) processors, which harvest energy for themselves and employ a NV memory hierarchy. This allows them to perform computations when power is available, checkpoint and hibernate when power is scarce, and resume their work at a later time. However, utilizing NV memory creates new security vulnerabilities in the form of wear out attacks in the register file. This paper explores the dangers of this design oversight and proposes a mitigation strategy that takes advantage of the unique properties and operating characteristics of NV processors. The proposed defense integrates the power management unit and a two-level register rotation approach, which improves NV processor endurance by 30.1× in attack situations and an average of 7.1× in standard workloads.
虽然物联网(IoT)不断发展,但其全面采用一直受到电力输送问题的阻碍。一个有希望的解决方案是非易失性(NV)处理器,它为自己收集能量并采用NV内存层次结构。这允许它们在有电力可用时执行计算,在电力不足时执行检查点和休眠,并在稍后的时间恢复工作。然而,利用NV内存会在寄存器文件中产生新的安全漏洞,即损耗攻击。本文探讨了这种设计疏忽的危险,并提出了一种利用NV处理器独特属性和操作特性的缓解策略。提出的防御集成了电源管理单元和两级寄存器旋转方法,在攻击情况下将NV处理器的耐用性提高30.1倍,在标准工作负载下平均提高7.1倍。
{"title":"A Collaborative Defense Against Wear Out Attacks in Non-Volatile Processors","authors":"P. Cronin, Chengmo Yang, Yongpan Liu","doi":"10.1145/3195970.3196825","DOIUrl":"https://doi.org/10.1145/3195970.3196825","url":null,"abstract":"While the Internet of Things (IoT) keeps advancing, its full adoption is continually blocked by power delivery problems. One promising solution is Non-Volatile (NV) processors, which harvest energy for themselves and employ a NV memory hierarchy. This allows them to perform computations when power is available, checkpoint and hibernate when power is scarce, and resume their work at a later time. However, utilizing NV memory creates new security vulnerabilities in the form of wear out attacks in the register file. This paper explores the dangers of this design oversight and proposes a mitigation strategy that takes advantage of the unique properties and operating characteristics of NV processors. The proposed defense integrates the power management unit and a two-level register rotation approach, which improves NV processor endurance by 30.1× in attack situations and an average of 7.1× in standard workloads.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"63 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80513502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Ultra-Low Energy Internally Analog, Externally Digital Vector-Matrix Multiplier Based on NOR Flash Memory Technology 基于NOR闪存技术的超低能量内模拟外数字矢量矩阵乘法器
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3195989
M. Mahmoodi, D. Strukov
Vector-matrix multiplication (VMM) is a core operation in many signal and data processing algorithms. Previous work showed that analog multipliers based on nonvolatile memories have superior energy efficiency as compared to digital counterparts at low-to-medium computing precision. In this paper, we propose extremely energy efficient analog mode VMM circuit with digital input/output interface and configurable precision. Similar to some previous work, the computation is performed by gate-coupled circuit utilizing embedded floating gate (FG) memories. The main novelty of our approach is an ultra-low power sensing circuitry, which is designed based on translinear Gilbert cell in topological combination with a floating resistor and a low-gain amplifier. Additionally, the digital-to-analog input conversion is merged with VMM, while current-mode algorithmic analog-to-digital circuit is employed at the circuit backend. Such implementations of conversion and sensing allow for circuit operation entirely in a current domain, resulting in high performance and energy efficiency. For example, post-layout simulation results for 400 × 400 5-bit VMM circuit designed in 55 nm process with embedded NOR flash memory, show up to 400 MHz operation, 1.68 POps/J energy efficiency, and 39.45 TOps/mm2 computing throughput. Moreover, the circuit is robust against process-voltage-temperature variations, in part due to inclusion of additional FG cells that are utilized for offset compensation.1
向量矩阵乘法(VMM)是许多信号和数据处理算法中的核心运算。先前的研究表明,与数字乘法器相比,基于非易失性存储器的模拟乘法器在中低计算精度方面具有优越的能源效率。本文提出了一种具有数字输入/输出接口和可配置精度的极节能模拟模VMM电路。与以前的一些工作类似,计算是通过利用嵌入式浮门(FG)存储器的门耦合电路完成的。该方法的主要新颖之处在于超低功耗传感电路,该电路是基于横向吉尔伯特单元与浮动电阻和低增益放大器的拓扑组合而设计的。此外,数模输入转换与VMM合并,而电流模式算法模数电路在电路后端使用。这种转换和传感的实现允许电路完全在电流域中运行,从而实现高性能和能效。例如,采用嵌入式NOR闪存设计的55 nm工艺的400 × 400 5位VMM电路的布局后仿真结果显示,高达400 MHz的工作,1.68 POps/J的能量效率和39.45 TOps/mm2的计算吞吐量。此外,该电路对过程电压-温度变化具有鲁棒性,部分原因是包含了用于偏移补偿的额外FG单元
{"title":"An Ultra-Low Energy Internally Analog, Externally Digital Vector-Matrix Multiplier Based on NOR Flash Memory Technology","authors":"M. Mahmoodi, D. Strukov","doi":"10.1145/3195970.3195989","DOIUrl":"https://doi.org/10.1145/3195970.3195989","url":null,"abstract":"Vector-matrix multiplication (VMM) is a core operation in many signal and data processing algorithms. Previous work showed that analog multipliers based on nonvolatile memories have superior energy efficiency as compared to digital counterparts at low-to-medium computing precision. In this paper, we propose extremely energy efficient analog mode VMM circuit with digital input/output interface and configurable precision. Similar to some previous work, the computation is performed by gate-coupled circuit utilizing embedded floating gate (FG) memories. The main novelty of our approach is an ultra-low power sensing circuitry, which is designed based on translinear Gilbert cell in topological combination with a floating resistor and a low-gain amplifier. Additionally, the digital-to-analog input conversion is merged with VMM, while current-mode algorithmic analog-to-digital circuit is employed at the circuit backend. Such implementations of conversion and sensing allow for circuit operation entirely in a current domain, resulting in high performance and energy efficiency. For example, post-layout simulation results for 400 × 400 5-bit VMM circuit designed in 55 nm process with embedded NOR flash memory, show up to 400 MHz operation, 1.68 POps/J energy efficiency, and 39.45 TOps/mm2 computing throughput. Moreover, the circuit is robust against process-voltage-temperature variations, in part due to inclusion of additional FG cells that are utilized for offset compensation.1","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"52 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85722992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
期刊
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1