Speech recognition technology combined with artificial intelligence represents a quantum leap more accurate than past pattern recognition methods. And server-based system support for scalability, virtualization and huge amounts of unlimited storage resources that greatly contributed to the improvement of the accuracy of its prediction. However, the implementation of server-oriented reforms led to enormous latency and connectivity problems. Therefore, we propose a novel client-edge speech recognition system to enhance latency by using what we call semi-offloading technology This proposal is promising big performance gains by offloading computing power-dependent tasks to edge nodes and processing throughput-dependent tasks by a client. The merit of semi-offloading as well as a division of workload allows for parallelism and re-ordering among the process. The experimental results show that, 23%~62% improvement in response time.
{"title":"Computation offloading of acoustic model for client-edge-based speech-recognition: work-in-progress","authors":"Young-Min Lee, Joon-Sung Yang","doi":"10.1145/3349569.3351534","DOIUrl":"https://doi.org/10.1145/3349569.3351534","url":null,"abstract":"Speech recognition technology combined with artificial intelligence represents a quantum leap more accurate than past pattern recognition methods. And server-based system support for scalability, virtualization and huge amounts of unlimited storage resources that greatly contributed to the improvement of the accuracy of its prediction. However, the implementation of server-oriented reforms led to enormous latency and connectivity problems. Therefore, we propose a novel client-edge speech recognition system to enhance latency by using what we call semi-offloading technology This proposal is promising big performance gains by offloading computing power-dependent tasks to edge nodes and processing throughput-dependent tasks by a client. The merit of semi-offloading as well as a division of workload allows for parallelism and re-ordering among the process. The experimental results show that, 23%~62% improvement in response time.","PeriodicalId":306252,"journal":{"name":"Proceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion","volume":"248 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124727074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rakibul Hassan, S. Rafatirad, H. Homayoun, Sai Manoj Pudukotai Dinakarrao
Logic obfuscation emerged as an efficient solution to strengthen the security of integrated circuits (ICs) from multiple threats including reverse engineering and intellectual property (IP) theft. Emergence of Boolean Satisfiability (SAT) attacks and its variants have shown to circumvent the security mechanisms such as obfuscation and a plethora of its variants. Considering the size of ICs and the amount of time it takes to validate a defense i.e., obfuscation against SAT attack could range from few ms to days. In contrast, our current work focuses on devising an iterative, dynamic and intelligent SAT-hard clause generator for a given SAT-prone problem. The proposed Machine Learning (ML)-based SAT to unSAT clause translator is a SAT-hard clause generator that utilizes a bipartite propagation based neural network model. The utilized model comprises multiple layers of artificial neural networks to extract the dependencies of literals and variables, followed by long short term memory (LSTM) networks to validate the SAT hardness. The proposed ML-based SAT to unSAT clause translator is trained with conjunctive normal form (CNF) of the IC netlist that are both SAT solvable and SAT-hard. Further, the model is also trained to convert a CNF from satisfiable (SAT) to unsatisfiable (unSAT) form with minor perturbation (which translates to minor overheads) so that the SAT-attack cannot decrypt the keys. To the best of our knowledge, no previous work has been reported on neural network based SAT-hard clause or CNF translator for circuit obfuscation. We evaluate our proposed models's empirical performance against MiniSAT with 300 CNFs.
{"title":"SAT to SAT-hard clause translator: work-in-progress","authors":"Rakibul Hassan, S. Rafatirad, H. Homayoun, Sai Manoj Pudukotai Dinakarrao","doi":"10.1145/3349569.3351542","DOIUrl":"https://doi.org/10.1145/3349569.3351542","url":null,"abstract":"Logic obfuscation emerged as an efficient solution to strengthen the security of integrated circuits (ICs) from multiple threats including reverse engineering and intellectual property (IP) theft. Emergence of Boolean Satisfiability (SAT) attacks and its variants have shown to circumvent the security mechanisms such as obfuscation and a plethora of its variants. Considering the size of ICs and the amount of time it takes to validate a defense i.e., obfuscation against SAT attack could range from few ms to days. In contrast, our current work focuses on devising an iterative, dynamic and intelligent SAT-hard clause generator for a given SAT-prone problem. The proposed Machine Learning (ML)-based SAT to unSAT clause translator is a SAT-hard clause generator that utilizes a bipartite propagation based neural network model. The utilized model comprises multiple layers of artificial neural networks to extract the dependencies of literals and variables, followed by long short term memory (LSTM) networks to validate the SAT hardness. The proposed ML-based SAT to unSAT clause translator is trained with conjunctive normal form (CNF) of the IC netlist that are both SAT solvable and SAT-hard. Further, the model is also trained to convert a CNF from satisfiable (SAT) to unsatisfiable (unSAT) form with minor perturbation (which translates to minor overheads) so that the SAT-attack cannot decrypt the keys. To the best of our knowledge, no previous work has been reported on neural network based SAT-hard clause or CNF translator for circuit obfuscation. We evaluate our proposed models's empirical performance against MiniSAT with 300 CNFs.","PeriodicalId":306252,"journal":{"name":"Proceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132996039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phase Change Memory (PCM) is seen as a potential candidate that can replace DRAM as main memory, due to its better scalability. However, writing `0s' in PCM cells requires high-temperature RESET operations, which induce write disturbance errors in neighboring idle PCM cells due to excessive heat dissipation. This paper introduces low-temperature partial-RESET operations for writing `0s' in PCM cells. Compared to traditional RESET operations, partial-RESET operations dissipate negligible heat, and therefore, do not cause disturbance errors in neighboring cells during PCM writes.
{"title":"Mitigating write disturbance in phase change memory architectures: work-in-progress","authors":"Chao H. Huang, Ishan G. Thakkar","doi":"10.1145/3349569.3351539","DOIUrl":"https://doi.org/10.1145/3349569.3351539","url":null,"abstract":"Phase Change Memory (PCM) is seen as a potential candidate that can replace DRAM as main memory, due to its better scalability. However, writing `0s' in PCM cells requires high-temperature RESET operations, which induce write disturbance errors in neighboring idle PCM cells due to excessive heat dissipation. This paper introduces low-temperature partial-RESET operations for writing `0s' in PCM cells. Compared to traditional RESET operations, partial-RESET operations dissipate negligible heat, and therefore, do not cause disturbance errors in neighboring cells during PCM writes.","PeriodicalId":306252,"journal":{"name":"Proceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126929809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhijitt Dhavlle, S. Bhat, S. Rafatirad, H. Homayoun, Sai Manoj Pudukotai Dinakarrao
The hardware security domain in recent years has experienced a plethora of side-channel attacks (SCAs) with cache-based SCAs being one of the dominant threats. These SCAs function by exploiting the side-channels which invariably leak important data during an application's execution. Shutting down the side channels is not a feasible approach due to various restrictions it would pose to system performance. To overcome such concerns and protect the data integrity, we introduce Sequence-Crafter (SC) in this work. The proposed Sequence-Crafter (SC) aims to minimize the entropy in the side channel leaked information rather than attempting to close the side-channels. To achieve this, we introduce carefully crafted perturbations into the victim application which will be randomly activated to introduce perturbations, thus resulting in misleading information which looks legit that will be observed by the attacker. This methodology has been successfully tested for Flush+Reload attack and the key information observed by the attacker is seen to be completely futile, indicating the success of proposed method.
{"title":"Sequence-crafter: side-channel entropy minimization to thwart timing-based side-channel attacks: work-in-progress","authors":"Abhijitt Dhavlle, S. Bhat, S. Rafatirad, H. Homayoun, Sai Manoj Pudukotai Dinakarrao","doi":"10.1145/3349569.3351543","DOIUrl":"https://doi.org/10.1145/3349569.3351543","url":null,"abstract":"The hardware security domain in recent years has experienced a plethora of side-channel attacks (SCAs) with cache-based SCAs being one of the dominant threats. These SCAs function by exploiting the side-channels which invariably leak important data during an application's execution. Shutting down the side channels is not a feasible approach due to various restrictions it would pose to system performance. To overcome such concerns and protect the data integrity, we introduce Sequence-Crafter (SC) in this work. The proposed Sequence-Crafter (SC) aims to minimize the entropy in the side channel leaked information rather than attempting to close the side-channels. To achieve this, we introduce carefully crafted perturbations into the victim application which will be randomly activated to introduce perturbations, thus resulting in misleading information which looks legit that will be observed by the attacker. This methodology has been successfully tested for Flush+Reload attack and the key information observed by the attacker is seen to be completely futile, indicating the success of proposed method.","PeriodicalId":306252,"journal":{"name":"Proceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124622534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. F. Vázquez, Anup Saha, Rafael Medina Morillas, Miguel Chavarrías Lapastora, Fernando Pescador del Oso
The sustained increase in the video digital features and traffic requirements across the networks is demanding more efficiency from both, video coding standards and platforms. In this context, a first version of the future de facto standard, Versatile Video Coding (VVC), is partially migrated to a GPU-based architecture integrated into a heterogeneous platform. Results show an improvement of 11 times for the new Adaptive Multiple Transform (AMT) transforms.
{"title":"Porting new versatile video coding transforms to a heterogeneous GPU-based technology: work-in-progress","authors":"M. F. Vázquez, Anup Saha, Rafael Medina Morillas, Miguel Chavarrías Lapastora, Fernando Pescador del Oso","doi":"10.1145/3349569.3351540","DOIUrl":"https://doi.org/10.1145/3349569.3351540","url":null,"abstract":"The sustained increase in the video digital features and traffic requirements across the networks is demanding more efficiency from both, video coding standards and platforms. In this context, a first version of the future de facto standard, Versatile Video Coding (VVC), is partially migrated to a GPU-based architecture integrated into a heterogeneous platform. Results show an improvement of 11 times for the new Adaptive Multiple Transform (AMT) transforms.","PeriodicalId":306252,"journal":{"name":"Proceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122840259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To thwart the detection of malware through traditional and emerging approaches, malware development has seen a paradigm of embedding the malware into benign applications. This calls for a localized feature extraction scheme for detecting stealthy malware with more robustness. To address this challenge, we introduce a hybrid approach which utilizes the microarchitectural traces obtained through on-chip embedded hardware performance counters (HPCs) and the application binary for malware detection. The obtained HPCs are fed to multi-stage machine learning (ML) classifier for detecting and classifying the malware. To overcome the challenge of detecting the stealthy malware, image processing based approach is applied in parallel. In this approach, the malware binaries are converted into images, which is further converted into sequences and fed to recurrent neural networks to recognize patterns of stealthy malware. Based on the localized patterns, sequence classification is further applied to perform binary classification and further discover the variation of the identified malware family. Our proposed framework exhibits high resilience to popular obfuscation techniques such as code relocation.
{"title":"MicroArchitectural events and image processing-based hybrid approach for robust malware detection: work-in-progress","authors":"Sanket Shukla, Gaurav Kolhe, S. D, S. Rafatirad","doi":"10.1145/3349569.3351538","DOIUrl":"https://doi.org/10.1145/3349569.3351538","url":null,"abstract":"To thwart the detection of malware through traditional and emerging approaches, malware development has seen a paradigm of embedding the malware into benign applications. This calls for a localized feature extraction scheme for detecting stealthy malware with more robustness. To address this challenge, we introduce a hybrid approach which utilizes the microarchitectural traces obtained through on-chip embedded hardware performance counters (HPCs) and the application binary for malware detection. The obtained HPCs are fed to multi-stage machine learning (ML) classifier for detecting and classifying the malware. To overcome the challenge of detecting the stealthy malware, image processing based approach is applied in parallel. In this approach, the malware binaries are converted into images, which is further converted into sequences and fed to recurrent neural networks to recognize patterns of stealthy malware. Based on the localized patterns, sequence classification is further applied to perform binary classification and further discover the variation of the identified malware family. Our proposed framework exhibits high resilience to popular obfuscation techniques such as code relocation.","PeriodicalId":306252,"journal":{"name":"Proceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124776486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kwangbae Lee, Hoseung Kim, Hayun Lee, Dongkun Shin
Network pruning is a promising compression technique to reduce computation and memory access cost of deep neural networks. In this paper, we propose a novel group-level pruning method to accelerate deep neural networks on mobile GPUs, where several adjacent weights are pruned in a group while providing high accuracy. Although several group-level pruning techniques have been proposed, the previous techniques can not achieve the desired accuracy at high sparsity. In this paper, we propose a unaligned approach to improve the accuracy of compressed model.
{"title":"Flexible group-level pruning of deep neural networks for fast inference on mobile CPUs: work-in-progress","authors":"Kwangbae Lee, Hoseung Kim, Hayun Lee, Dongkun Shin","doi":"10.1145/3349569.3351537","DOIUrl":"https://doi.org/10.1145/3349569.3351537","url":null,"abstract":"Network pruning is a promising compression technique to reduce computation and memory access cost of deep neural networks. In this paper, we propose a novel group-level pruning method to accelerate deep neural networks on mobile GPUs, where several adjacent weights are pruned in a group while providing high accuracy. Although several group-level pruning techniques have been proposed, the previous techniques can not achieve the desired accuracy at high sparsity. In this paper, we propose a unaligned approach to improve the accuracy of compressed model.","PeriodicalId":306252,"journal":{"name":"Proceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125810145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The NAND flash memory has rapidly increased in storage capacity per unit area, and the rate of occurrence of errors per P/E cycle is also rapidly increasing accordingly. ECC modules such as LDPC have been added to flash controller for recovering from the errors. However, the system designs to increase the lifetime of the flash memory storage device are still in great demand. In this paper, we design the LDPC encoding and decoding scheme to get stepwise code rate according to the P/E cycle by applying rate-compatible LDPC, as well as the management scheme of excessive parity data. Through this, we can improve the error recovery rate of flash memory storage system and extend the lifetime of NAND flash storage system while reducing the system read and write overhead due to the increase in additional parity data.
{"title":"ECC management with rate compatible LDPC code for NAND flash storage: work-in-progress","authors":"Jae-Bin Lee, Geon-Myeong Kim, Seungho Lim","doi":"10.1145/3349569.3351535","DOIUrl":"https://doi.org/10.1145/3349569.3351535","url":null,"abstract":"The NAND flash memory has rapidly increased in storage capacity per unit area, and the rate of occurrence of errors per P/E cycle is also rapidly increasing accordingly. ECC modules such as LDPC have been added to flash controller for recovering from the errors. However, the system designs to increase the lifetime of the flash memory storage device are still in great demand. In this paper, we design the LDPC encoding and decoding scheme to get stepwise code rate according to the P/E cycle by applying rate-compatible LDPC, as well as the management scheme of excessive parity data. Through this, we can improve the error recovery rate of flash memory storage system and extend the lifetime of NAND flash storage system while reducing the system read and write overhead due to the increase in additional parity data.","PeriodicalId":306252,"journal":{"name":"Proceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117267816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Danielle Tchuinkou Kwadjo, Joel Mandebi Mbongue, C. Bobda
This work proposes a generic flow for designing application-specific FPGA overlays that can achieve bare metal performance while improving productivity, resulting in increased adoption of FPGAs by software developers. The proposed approach relies on automatic extraction of kernels in high-level language applications. Extracted Kernels are then systematically translated into optimized hardware circuits using RapidWright, which allows bypassing HDL design flows. Initial results show up to 19x productivity improvement over regular overlays, and higher Fmax compared to bare metal in several cases.
{"title":"Automatic generation of application-specific FPGA overlays: work-in-progress","authors":"Danielle Tchuinkou Kwadjo, Joel Mandebi Mbongue, C. Bobda","doi":"10.1145/3349569.3351533","DOIUrl":"https://doi.org/10.1145/3349569.3351533","url":null,"abstract":"This work proposes a generic flow for designing application-specific FPGA overlays that can achieve bare metal performance while improving productivity, resulting in increased adoption of FPGAs by software developers. The proposed approach relies on automatic extraction of kernels in high-level language applications. Extracted Kernels are then systematically translated into optimized hardware circuits using RapidWright, which allows bypassing HDL design flows. Initial results show up to 19x productivity improvement over regular overlays, and higher Fmax compared to bare metal in several cases.","PeriodicalId":306252,"journal":{"name":"Proceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123244750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Runtime Integrated Custom Execution (RICE) relocates traditional peripheral reconfigurable acceleration devices into the pipeline of the processor. This relocation unlocks fine-grained acceleration previously impeded by communication overhead to a peripheral accelerator. Preliminary simulation results on a subset of the PARSEC benchmark suite shows promise for RICE in HPC applications.
{"title":"Fine-grained acceleration using runtime integrated custom execution (RICE): work-in-progress","authors":"Leela Pakanati, John T. McMichen, Z. Estrada","doi":"10.1145/3349569.3351536","DOIUrl":"https://doi.org/10.1145/3349569.3351536","url":null,"abstract":"Runtime Integrated Custom Execution (RICE) relocates traditional peripheral reconfigurable acceleration devices into the pipeline of the processor. This relocation unlocks fine-grained acceleration previously impeded by communication overhead to a peripheral accelerator. Preliminary simulation results on a subset of the PARSEC benchmark suite shows promise for RICE in HPC applications.","PeriodicalId":306252,"journal":{"name":"Proceedings of the International Conference on Compliers, Architectures and Synthesis for Embedded Systems Companion","volume":"1967 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114912630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}