首页 > 最新文献

Proceedings of the 56th Annual Design Automation Conference 2019最新文献

英文 中文
RFTC
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317899
Darshana Jayasinghe, A. Ignjatović, S. Parameswaran
Random execution time-based countermeasures against power analysis attacks have reduced resource overheads when compared to balancing power dissipation and masking counter-measures. The previous countermeasures on randomization use either a small number of clock frequencies or delays to randomize the execution. This paper presents a novel random frequency countermeasure (referred to as RFTC) using the dynamic reconfiguration ability of clock managers of Field-Programmable Gate Arrays – FPGAs (such as Xilinx Mixed-Mode Clock Manager – MMCM) which can change the frequency of operation at runtime. We show for the first time how Advanced Encryption Standard (AES) block cipher algorithm can be executed using randomly selected clock frequencies (amongst thousands of frequencies carefully chosen) generated within the FPGA to mitigate power analysis attack vulnerabilities. To test the effectiveness of the proposed clock randomization, Correlation Power analysis (CPA) attacks are performed on the collected power traces. Preprocessing methods, such as Dynamic Time Warping (DTW), Principal Component Analysis (PCA) and Fast Fourier Transform (FFT), based power analysis attacks are performed on the collected traces to test the effective removal of random execution. Compared to the state of the art, where there were 83 distinct finishing times for each encryption, the method described in this paper can have more than 60,000 distinct finishing times for each encryption, making it resistant against power analysis attacks when preprocessed and demonstrated to be secure up to four million traces.
{"title":"RFTC","authors":"Darshana Jayasinghe, A. Ignjatović, S. Parameswaran","doi":"10.1145/3316781.3317899","DOIUrl":"https://doi.org/10.1145/3316781.3317899","url":null,"abstract":"Random execution time-based countermeasures against power analysis attacks have reduced resource overheads when compared to balancing power dissipation and masking counter-measures. The previous countermeasures on randomization use either a small number of clock frequencies or delays to randomize the execution. This paper presents a novel random frequency countermeasure (referred to as RFTC) using the dynamic reconfiguration ability of clock managers of Field-Programmable Gate Arrays – FPGAs (such as Xilinx Mixed-Mode Clock Manager – MMCM) which can change the frequency of operation at runtime. We show for the first time how Advanced Encryption Standard (AES) block cipher algorithm can be executed using randomly selected clock frequencies (amongst thousands of frequencies carefully chosen) generated within the FPGA to mitigate power analysis attack vulnerabilities. To test the effectiveness of the proposed clock randomization, Correlation Power analysis (CPA) attacks are performed on the collected power traces. Preprocessing methods, such as Dynamic Time Warping (DTW), Principal Component Analysis (PCA) and Fast Fourier Transform (FFT), based power analysis attacks are performed on the collected traces to test the effective removal of random execution. Compared to the state of the art, where there were 83 distinct finishing times for each encryption, the method described in this paper can have more than 60,000 distinct finishing times for each encryption, making it resistant against power analysis attacks when preprocessed and demonstrated to be secure up to four million traces.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125103212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
BitBlade
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317784
Sungju Ryu, Hyungjun Kim, Wooseok Yi, Jae-Joon Kim
Deep Neural Networks (DNNs) have various performance requirements and power constraints depending on applications. To maximize the energy-efficiency of hardware accelerators for different applications, the accelerators need to support various bit-width configurations. When designing bit-reconfigurable accelerators, each PE must have variable shift-addition logic, which takes a large amount of area and power. This paper introduces an area and energy efficient precision-scalable neural network accelerator (BitBlade), which reduces the control overhead for variable shift-addition using bitwise summation method. The proposed BitBlade, when synthesized in a 28nm CMOS technology, showed reduction in area by 41% and in energy by 36-46% compared to the state-of-the-art precision-scalable architecture [14].
{"title":"BitBlade","authors":"Sungju Ryu, Hyungjun Kim, Wooseok Yi, Jae-Joon Kim","doi":"10.1145/3316781.3317784","DOIUrl":"https://doi.org/10.1145/3316781.3317784","url":null,"abstract":"Deep Neural Networks (DNNs) have various performance requirements and power constraints depending on applications. To maximize the energy-efficiency of hardware accelerators for different applications, the accelerators need to support various bit-width configurations. When designing bit-reconfigurable accelerators, each PE must have variable shift-addition logic, which takes a large amount of area and power. This paper introduces an area and energy efficient precision-scalable neural network accelerator (BitBlade), which reduces the control overhead for variable shift-addition using bitwise summation method. The proposed BitBlade, when synthesized in a 28nm CMOS technology, showed reduction in area by 41% and in energy by 36-46% compared to the state-of-the-art precision-scalable architecture [14].","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125178340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
LSIM
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317856
Yu-Chuan Chang, Wei-Ming Chen, P. Hsiu, Yen-Yu Lin, Tei-Wei Kuo
Perceptual similarity measurement allows mobile applications to eliminate unnecessary computations without compromising visual experience. Existing pixel-wise measures incur significant overhead with increasing display resolutions and frame rates. This paper presents an ultra lightweight similarity measure called LSIM, which assesses the similarity between frames based on the transformation matrices of graphics objects. To evaluate its efficacy, we integrate LSIM into the Open Graphics Library and conduct experiments on an Android smartphone with various mobile 3D games. The results show that LSIM is highly correlated with the most widely used pixel-wise measure SSIM, yet three to five orders of magnitude faster. We also apply LSIM to a CPU-GPU governor to suppress the rendering of similar frames, thereby further reducing computation energy consumption by up to 27.3% while maintaining satisfactory visual quality. CCS CONCEPTS • Computer systems organization → Embedded software; • Computing methodologies → Graphics processors; Perception;
{"title":"LSIM","authors":"Yu-Chuan Chang, Wei-Ming Chen, P. Hsiu, Yen-Yu Lin, Tei-Wei Kuo","doi":"10.1145/3316781.3317856","DOIUrl":"https://doi.org/10.1145/3316781.3317856","url":null,"abstract":"Perceptual similarity measurement allows mobile applications to eliminate unnecessary computations without compromising visual experience. Existing pixel-wise measures incur significant overhead with increasing display resolutions and frame rates. This paper presents an ultra lightweight similarity measure called LSIM, which assesses the similarity between frames based on the transformation matrices of graphics objects. To evaluate its efficacy, we integrate LSIM into the Open Graphics Library and conduct experiments on an Android smartphone with various mobile 3D games. The results show that LSIM is highly correlated with the most widely used pixel-wise measure SSIM, yet three to five orders of magnitude faster. We also apply LSIM to a CPU-GPU governor to suppress the rendering of similar frames, thereby further reducing computation energy consumption by up to 27.3% while maintaining satisfactory visual quality. CCS CONCEPTS • Computer systems organization → Embedded software; • Computing methodologies → Graphics processors; Perception;","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"947 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116436361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
DREAMPlace
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317803
Yibo Lin, Shounak Dhar, Wuxi Li, Haoxing Ren, Brucek Khailany, D. Pan
Placement for very-large-scale integrated (VLSI) circuits is one of the most important steps for design closure. This paper proposes a novel GPU-accelerated placement framework DREAMPlace, by casting the analytical placement problem equivalently to training a neural network. Implemented on top of a widely-adopted deep learning toolkit PyTorch, with customized key kernels for wirelength and density computations, DREAMPlace can achieve over $ 30times $ speedup in global placement without quality degradation compared to the state-of-the-art multi-threaded placer RePlAce. We believe this work shall open up new directions for revisiting classical EDA problems with advancement in AI hardware and software.
{"title":"DREAMPlace","authors":"Yibo Lin, Shounak Dhar, Wuxi Li, Haoxing Ren, Brucek Khailany, D. Pan","doi":"10.1145/3316781.3317803","DOIUrl":"https://doi.org/10.1145/3316781.3317803","url":null,"abstract":"Placement for very-large-scale integrated (VLSI) circuits is one of the most important steps for design closure. This paper proposes a novel GPU-accelerated placement framework DREAMPlace, by casting the analytical placement problem equivalently to training a neural network. Implemented on top of a widely-adopted deep learning toolkit PyTorch, with customized key kernels for wirelength and density computations, DREAMPlace can achieve over $ 30times $ speedup in global placement without quality degradation compared to the state-of-the-art multi-threaded placer RePlAce. We believe this work shall open up new directions for revisiting classical EDA problems with advancement in AI hardware and software.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115772960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 141
Enabling Complex Stimuli in Accelerated Mixed-Signal Simulation 在加速混合信号仿真中实现复杂刺激
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317815
S. Divanbeigi, E. Aditya, Zhongpin Wang, M. Olbrich
In the era of advancing technology, increasing circuit complexity requires faster simulators for the verification step. The piece-wise linear simulation approach provides an efficient and accurate solution. In this paper, a state-of-the-art mixed-signal simulator is explained. The approach is extended to new exponential and quadratic stimuli. This requires a comprehensive derivation of mathematical equations, which remove the need for computationally expensive evaluation. The new stimuli are simulated in several circuits and compared to a conventional simulator. The result shows significant run-time acceleration with high accuracy. Therefore, it meets the industrial requirement, which demands simulation with various input forms and non-linear components.
在技术进步的时代,电路复杂性的增加需要更快的模拟器来进行验证步骤。分段线性仿真方法提供了一种高效、准确的解决方案。本文介绍了一种先进的混合信号模拟器。将该方法推广到新的指数和二次刺激。这需要对数学方程进行全面的推导,从而消除了计算代价高昂的求值的需要。新的刺激在几个电路中进行模拟,并与传统的模拟器进行比较。结果表明,该方法具有显著的运行时加速和高精度。因此,它符合工业要求,需要多种输入形式和非线性元件的仿真。
{"title":"Enabling Complex Stimuli in Accelerated Mixed-Signal Simulation","authors":"S. Divanbeigi, E. Aditya, Zhongpin Wang, M. Olbrich","doi":"10.1145/3316781.3317815","DOIUrl":"https://doi.org/10.1145/3316781.3317815","url":null,"abstract":"In the era of advancing technology, increasing circuit complexity requires faster simulators for the verification step. The piece-wise linear simulation approach provides an efficient and accurate solution. In this paper, a state-of-the-art mixed-signal simulator is explained. The approach is extended to new exponential and quadratic stimuli. This requires a comprehensive derivation of mathematical equations, which remove the need for computationally expensive evaluation. The new stimuli are simulated in several circuits and compared to a conventional simulator. The result shows significant run-time acceleration with high accuracy. Therefore, it meets the industrial requirement, which demands simulation with various input forms and non-linear components.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132667247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Filianore
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317850
S. Bian, Masayuki Hiromoto, Takashi Sato
The (ring) learning with errors (RLWE/LWE) problem is one of the most promising candidates for constructing quantum-secure key exchange protocols. In this work, we design and implement specialized hardware multiplier units for both LWE and RLWE key exchange schemes to maximize their computational efficiency. By exploiting the algebraic structure with aggressive parameter sets, we show that the design and implementation of LWE key exchange on hardware is considerably easier and more flexible than RLWE. Using the proposed architectures, we show that client-side energy-efficiency of LWE-based key exchange can be on the same order, or even (slightly) better than RLWE-based schemes, making LWE an attractive option for designing post-quantum cryptographic suite.
{"title":"Filianore","authors":"S. Bian, Masayuki Hiromoto, Takashi Sato","doi":"10.1145/3316781.3317850","DOIUrl":"https://doi.org/10.1145/3316781.3317850","url":null,"abstract":"The (ring) learning with errors (RLWE/LWE) problem is one of the most promising candidates for constructing quantum-secure key exchange protocols. In this work, we design and implement specialized hardware multiplier units for both LWE and RLWE key exchange schemes to maximize their computational efficiency. By exploiting the algebraic structure with aggressive parameter sets, we show that the design and implementation of LWE key exchange on hardware is considerably easier and more flexible than RLWE. Using the proposed architectures, we show that client-side energy-efficiency of LWE-based key exchange can be on the same order, or even (slightly) better than RLWE-based schemes, making LWE an attractive option for designing post-quantum cryptographic suite.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114772535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
ACCESS 访问
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317756
M. Schwarz, Raphael Stahl, Daniel Müller-Gritschneder, Ulf Schlichtmann, D. Stoffel, W. Kunz
This book is a comprehensive overview of invited contributions on Helicobacter pylori infection in gastritis and gastric carcinogenesis. The first part of the book covers topics related to the pathophysiology of gastric mucosal defense system and gastritis including the gastroprotective function of the mucus, the capsaicin-sensitive afferent nerves and the oxidative stress pathway involved in inflammation, apoptosis and autophagy in H. pylori related gastritis. The next chapters deal with molecular pathogenesis and treatment, which consider the role of neuroendocrine cells in gastric disease, DNA methylation in H. pylori infection, the role of antioxidants and phytotherapy in gastric disease. The final part presents the effects of cancer risk factors associated with H. pylori infection. These chapters discuss the serum pepsinogen test, K-ras mutations, cell kinetics, and H. pylori lipopolysaccharide, as well as the roles of several bacterial genes (cagA, cagT, vacA and dupA) as virulence factors in gastric cancer, and the gastrokine-1 protein in cancer progression.
这本书是对胃炎和胃癌发生中幽门螺杆菌感染的邀请贡献的全面概述。本书的第一部分涵盖了胃粘膜防御系统和胃炎的病理生理学相关主题,包括粘液的胃保护功能、辣椒素敏感传入神经和幽门螺杆菌相关胃炎中涉及炎症、细胞凋亡和自噬的氧化应激途径。接下来的章节涉及分子发病机制和治疗,其中考虑神经内分泌细胞在胃病中的作用,DNA甲基化在幽门螺杆菌感染中的作用,抗氧化剂和植物治疗在胃病中的作用。最后一部分介绍了与幽门螺杆菌感染相关的癌症危险因素的影响。这些章节讨论了血清胃蛋白酶原试验、K-ras突变、细胞动力学和幽门螺杆菌脂多糖,以及几种细菌基因(cagA、cagT、vacA和dupA)作为胃癌毒力因子的作用,以及胃因子-1蛋白在癌症进展中的作用。
{"title":"ACCESS","authors":"M. Schwarz, Raphael Stahl, Daniel Müller-Gritschneder, Ulf Schlichtmann, D. Stoffel, W. Kunz","doi":"10.1145/3316781.3317756","DOIUrl":"https://doi.org/10.1145/3316781.3317756","url":null,"abstract":"This book is a comprehensive overview of invited contributions on Helicobacter pylori infection in gastritis and gastric carcinogenesis. The first part of the book covers topics related to the pathophysiology of gastric mucosal defense system and gastritis including the gastroprotective function of the mucus, the capsaicin-sensitive afferent nerves and the oxidative stress pathway involved in inflammation, apoptosis and autophagy in H. pylori related gastritis. The next chapters deal with molecular pathogenesis and treatment, which consider the role of neuroendocrine cells in gastric disease, DNA methylation in H. pylori infection, the role of antioxidants and phytotherapy in gastric disease. The final part presents the effects of cancer risk factors associated with H. pylori infection. These chapters discuss the serum pepsinogen test, K-ras mutations, cell kinetics, and H. pylori lipopolysaccharide, as well as the roles of several bacterial genes (cagA, cagT, vacA and dupA) as virulence factors in gastric cancer, and the gastrokine-1 protein in cancer progression.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"15 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126159492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
DeePattern DeePattern
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317795
Haoyu Yang, P. Pathak, Frank Gennari, Ya-Chieh Lai, Bei Yu
VLSI layout patterns provide critic resources in various design for manufacturability researches, from early technology node development to back-end design and sign-off flows. However, a diverse layout pattern library is not always available due to long logic-to-chip design cycle, which slows down the technology node development procedure. To address this issue, in this paper, we explore the capability of generative machine learning models to synthesize layout patterns. A transforming convolutional auto-encoder is developed to learn vector-based instantiations of squish pattern topologies. We show our framework can capture simple design rules and contributes to enlarging the existing squish topology space under certain transformations. Geometry information of each squish topology is obtained from an associated linear system derived from design rule constraints. Experiments on 7 nm EUV designs show that our framework can more effectively generate diverse pattern libraries with DRC-clean patterns compared to a state-of-the-art industrial layout pattern generator.
{"title":"DeePattern","authors":"Haoyu Yang, P. Pathak, Frank Gennari, Ya-Chieh Lai, Bei Yu","doi":"10.1145/3316781.3317795","DOIUrl":"https://doi.org/10.1145/3316781.3317795","url":null,"abstract":"VLSI layout patterns provide critic resources in various design for manufacturability researches, from early technology node development to back-end design and sign-off flows. However, a diverse layout pattern library is not always available due to long logic-to-chip design cycle, which slows down the technology node development procedure. To address this issue, in this paper, we explore the capability of generative machine learning models to synthesize layout patterns. A transforming convolutional auto-encoder is developed to learn vector-based instantiations of squish pattern topologies. We show our framework can capture simple design rules and contributes to enlarging the existing squish topology space under certain transformations. Geometry information of each squish topology is obtained from an associated linear system derived from design rule constraints. Experiments on 7 nm EUV designs show that our framework can more effectively generate diverse pattern libraries with DRC-clean patterns compared to a state-of-the-art industrial layout pattern generator.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"2023 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124656552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
ARGA
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317776
Daniel Peroni, M. Imani, Hamid Nejatollahi, N. Dutt, Tajana Rosing
Many data-driven applications including computer vision, speech recognition, and medical diagnostics show tolerance to error during computation. These applications are often accelerated on GPUs, but high computational costs limit performance and increase energy usage. In this paper, we present ARGA, an approximate computing technique capable of accelerating GPGPU applications. ARGA provides an approximate lookup table to GPGPU cores to avoid recomputing instructions with identical or similar values. We propose multi-table parallel lookupwhich enables computational reuse to significantly speed-up GPGPU computation by checking incoming instructions in parallel. The inputs of each operation are searched for in a lookup table. Matches resulting in an exact or low error are removed from the floating point pipeline and used directly as output. Matches producing highly inaccurate results are computed on exact hardware to minimize application error. We simulate our design by placing ARGA within each core of an Nvidia Kepler Architecture Titan and an AMD Southern Island 7970. We show our design improves performance throughput by up to $2.7 times$ and improves EDP by $5.3 times$ for 6 GPGPU applications while maintaining less than 5% output error. We also show ARGA accelerates inference of a LeNet NN by $2.1 times$ and improves EDP by $3.7 times$ without significantly impacting classification accuracy. CCS CONCEPTS •Computer systems organization $rightarrow$ Multicore architectures; •Computing methodologies $rightarrow$ Machine learning approaches.
{"title":"ARGA","authors":"Daniel Peroni, M. Imani, Hamid Nejatollahi, N. Dutt, Tajana Rosing","doi":"10.1145/3316781.3317776","DOIUrl":"https://doi.org/10.1145/3316781.3317776","url":null,"abstract":"Many data-driven applications including computer vision, speech recognition, and medical diagnostics show tolerance to error during computation. These applications are often accelerated on GPUs, but high computational costs limit performance and increase energy usage. In this paper, we present ARGA, an approximate computing technique capable of accelerating GPGPU applications. ARGA provides an approximate lookup table to GPGPU cores to avoid recomputing instructions with identical or similar values. We propose multi-table parallel lookupwhich enables computational reuse to significantly speed-up GPGPU computation by checking incoming instructions in parallel. The inputs of each operation are searched for in a lookup table. Matches resulting in an exact or low error are removed from the floating point pipeline and used directly as output. Matches producing highly inaccurate results are computed on exact hardware to minimize application error. We simulate our design by placing ARGA within each core of an Nvidia Kepler Architecture Titan and an AMD Southern Island 7970. We show our design improves performance throughput by up to $2.7 times$ and improves EDP by $5.3 times$ for 6 GPGPU applications while maintaining less than 5% output error. We also show ARGA accelerates inference of a LeNet NN by $2.1 times$ and improves EDP by $3.7 times$ without significantly impacting classification accuracy. CCS CONCEPTS •Computer systems organization $rightarrow$ Multicore architectures; •Computing methodologies $rightarrow$ Machine learning approaches.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125022904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Tumbler 滚筒
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317927
Yue Xu, H. Lee, Yujuan Tan, Yu Wu, Xianzhang Chen, Liang Liang, Lei Qiao, Duo Liu
Energy harvesting technology has been popularly adopted in embedded systems. However, unstable energy source results in unsteady operation. In this paper, we devise a long-term energy efficient task scheduling targeting for solar-powered sensor nodes. The proposed method exploits a reinforcement learning with a solar energy prediction method to maximize the energy efficiency, which finally enhances the long-term quality of services (QoS) of the sensor nodes. Experimental results show that the proposed scheduling improves the energy efficiency by 6.0%, on average and achieves the better QoS level by 54.0%, compared with a state-of the-art task scheduling algorithm.
{"title":"Tumbler","authors":"Yue Xu, H. Lee, Yujuan Tan, Yu Wu, Xianzhang Chen, Liang Liang, Lei Qiao, Duo Liu","doi":"10.1145/3316781.3317927","DOIUrl":"https://doi.org/10.1145/3316781.3317927","url":null,"abstract":"Energy harvesting technology has been popularly adopted in embedded systems. However, unstable energy source results in unsteady operation. In this paper, we devise a long-term energy efficient task scheduling targeting for solar-powered sensor nodes. The proposed method exploits a reinforcement learning with a solar energy prediction method to maximize the energy efficiency, which finally enhances the long-term quality of services (QoS) of the sensor nodes. Experimental results show that the proposed scheduling improves the energy efficiency by 6.0%, on average and achieves the better QoS level by 54.0%, compared with a state-of the-art task scheduling algorithm.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"01 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129136934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
Proceedings of the 56th Annual Design Automation Conference 2019
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1