Drones are commonly used in mission-critical applications, and the accurate estimation of available battery capacity before flight is critical for reliable and efficient mission planning. To this end, the battery voltage should be predicted accurately prior to launching a drone. However, in drone applications, a rise in the battery's internal temperature changes the voltage significantly and leads to challenges in voltage prediction. In this paper, we propose a battery voltage prediction method that takes into account the battery's internal temperature to accurately estimate the available capacity of the drone battery. To this end, we devise a temporal temperature factor (TTF) metric that is calculated by accumulating time series data about the battery's discharge history. We employ a machine learning-based prediction model, reflecting the TTF metric, to achieve high prediction accuracy and low complexity. We validated the accuracy and complexity of our model with extensive evaluation. The results show that the proposed model is accurate with less than 1.5% error and readily operates on resource-constrained embedded devices.
{"title":"Voltage prediction of drone battery reflecting internal temperature","authors":"Jiwon Kim, Seunghyeok Jeon, Jaehyun Kim, H. Cha","doi":"10.1145/3489517.3530448","DOIUrl":"https://doi.org/10.1145/3489517.3530448","url":null,"abstract":"Drones are commonly used in mission-critical applications, and the accurate estimation of available battery capacity before flight is critical for reliable and efficient mission planning. To this end, the battery voltage should be predicted accurately prior to launching a drone. However, in drone applications, a rise in the battery's internal temperature changes the voltage significantly and leads to challenges in voltage prediction. In this paper, we propose a battery voltage prediction method that takes into account the battery's internal temperature to accurately estimate the available capacity of the drone battery. To this end, we devise a temporal temperature factor (TTF) metric that is calculated by accumulating time series data about the battery's discharge history. We employ a machine learning-based prediction model, reflecting the TTF metric, to achieve high prediction accuracy and low complexity. We validated the accuracy and complexity of our model with extensive evaluation. The results show that the proposed model is accurate with less than 1.5% error and readily operates on resource-constrained embedded devices.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133682089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Learning feasible representation from raw gate-level netlists is essential for incorporating machine learning techniques in logic synthesis, physical design, or verification. Existing message-passing-based graph learning methodologies focus merely on graph topology while overlooking gate functionality, which often fails to capture underlying semantic, thus limiting their generalizability. To address the concern, we propose a novel netlist representation learning framework that utilizes a contrastive scheme to acquire generic functional knowledge from netlists effectively. We also propose a customized graph neural network (GNN) architecture that learns a set of independent aggregators to better cooperate with the above framework. Comprehensive experiments on multiple complex real-world designs demonstrate that our proposed solution significantly outperforms state-of-the-art netlist feature learning flows.
{"title":"Functionality matters in netlist representation learning","authors":"Chen Bai, Zhuolun He, Guangliang Zhang, Qiang Xu, Tsung-Yi Ho, Bei Yu, Yu Huang","doi":"10.1145/3489517.3530410","DOIUrl":"https://doi.org/10.1145/3489517.3530410","url":null,"abstract":"Learning feasible representation from raw gate-level netlists is essential for incorporating machine learning techniques in logic synthesis, physical design, or verification. Existing message-passing-based graph learning methodologies focus merely on graph topology while overlooking gate functionality, which often fails to capture underlying semantic, thus limiting their generalizability. To address the concern, we propose a novel netlist representation learning framework that utilizes a contrastive scheme to acquire generic functional knowledge from netlists effectively. We also propose a customized graph neural network (GNN) architecture that learns a set of independent aggregators to better cooperate with the above framework. Comprehensive experiments on multiple complex real-world designs demonstrate that our proposed solution significantly outperforms state-of-the-art netlist feature learning flows.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133250409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
He Zhang, Linjun Jiang, Jianxin Wu, Ting-Yeh Chen, Junzhan Liu, W. Kang, Weisheng Zhao
SRAM-based computing-in-memory (SRAM-CIM) provides fast speed and good scalability with advanced process technology. However, the energy efficiency of the state-of-the-art current-domain SRAM-CIM bit-cell structure is limited and the peripheral circuitry (e.g., DAC/ADC) for high-precision is expensive. This paper proposes a charge-pulsation SRAM (CP-SRAM) structure to achieve ultra-high energy-efficiency thanks to its charge-domain mechanism. Furthermore, our proposed CP-SRAM CIM supports configurable precision (2/4/6-bit). The CP-SRAM CIM macro was designed in 180nm (with silicon verification) and 40nm (simulation) nodes. The simulation results in 40nm show that our macro can achieve energy efficiency of ~2950Tops/W at 2-bit precision, ~576.4 Tops/W at 4-bit precision and ~111.7 Tops/W at 6-bit precision, respectively.
{"title":"CP-SRAM: charge-pulsation SRAM marco for ultra-high energy-efficiency computing-in-memory","authors":"He Zhang, Linjun Jiang, Jianxin Wu, Ting-Yeh Chen, Junzhan Liu, W. Kang, Weisheng Zhao","doi":"10.1145/3489517.3530398","DOIUrl":"https://doi.org/10.1145/3489517.3530398","url":null,"abstract":"SRAM-based computing-in-memory (SRAM-CIM) provides fast speed and good scalability with advanced process technology. However, the energy efficiency of the state-of-the-art current-domain SRAM-CIM bit-cell structure is limited and the peripheral circuitry (e.g., DAC/ADC) for high-precision is expensive. This paper proposes a charge-pulsation SRAM (CP-SRAM) structure to achieve ultra-high energy-efficiency thanks to its charge-domain mechanism. Furthermore, our proposed CP-SRAM CIM supports configurable precision (2/4/6-bit). The CP-SRAM CIM macro was designed in 180nm (with silicon verification) and 40nm (simulation) nodes. The simulation results in 40nm show that our macro can achieve energy efficiency of ~2950Tops/W at 2-bit precision, ~576.4 Tops/W at 4-bit precision and ~111.7 Tops/W at 6-bit precision, respectively.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134644380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdullah Al Arafat, Sudharsan Vaidhun, Kurt M. Wilson, Jinghao Sun, Zhishan Guo
Robot Operating System (ROS) is the most popular framework for developing robotics software. Typically, robotics software is safety-critical and employed in real-time systems requiring timing guarantees. Since the first generation of ROS provides no timing guarantee, the recent release of its second generation, ROS2, is necessary and timely, and has since received immense attention from practitioners and researchers. Unfortunately, the existing analysis of ROS2 showed the peculiar scheduling strategy of ROS2 executor, which severely affects the response time of ROS2 applications. This paper proposes a deadline-based scheduling strategy for the ROS2 executor. It further presents an analysis for an end-to-end response time of ROS2 workload (processing chain) and an evaluation of the proposed scheduling strategy for real workloads.
{"title":"Response time analysis for dynamic priority scheduling in ROS2","authors":"Abdullah Al Arafat, Sudharsan Vaidhun, Kurt M. Wilson, Jinghao Sun, Zhishan Guo","doi":"10.1145/3489517.3530447","DOIUrl":"https://doi.org/10.1145/3489517.3530447","url":null,"abstract":"Robot Operating System (ROS) is the most popular framework for developing robotics software. Typically, robotics software is safety-critical and employed in real-time systems requiring timing guarantees. Since the first generation of ROS provides no timing guarantee, the recent release of its second generation, ROS2, is necessary and timely, and has since received immense attention from practitioners and researchers. Unfortunately, the existing analysis of ROS2 showed the peculiar scheduling strategy of ROS2 executor, which severely affects the response time of ROS2 applications. This paper proposes a deadline-based scheduling strategy for the ROS2 executor. It further presents an analysis for an end-to-end response time of ROS2 workload (processing chain) and an evaluation of the proposed scheduling strategy for real workloads.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115373432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vasudev Gohil, Satwik Patnaik, Hao Guo, D. Kalathil, J. Rajendran
Insertion of hardware Trojans (HTs) in integrated circuits is a pernicious threat. Since HTs are activated under rare trigger conditions, detecting them using random logic simulations is infeasible. In this work, we design a reinforcement learning (RL) agent that circumvents the exponential search space and returns a minimal set of patterns that is most likely to detect HTs. Experimental results on a variety of benchmarks demonstrate the efficacy and scalability of our RL agent, which obtains a significant reduction (169×) in the number of test patterns required while maintaining or improving coverage (95.75%) compared to the state-of-the-art techniques.
{"title":"DETERRENT: detecting trojans using reinforcement learning","authors":"Vasudev Gohil, Satwik Patnaik, Hao Guo, D. Kalathil, J. Rajendran","doi":"10.1145/3489517.3530518","DOIUrl":"https://doi.org/10.1145/3489517.3530518","url":null,"abstract":"Insertion of hardware Trojans (HTs) in integrated circuits is a pernicious threat. Since HTs are activated under rare trigger conditions, detecting them using random logic simulations is infeasible. In this work, we design a reinforcement learning (RL) agent that circumvents the exponential search space and returns a minimal set of patterns that is most likely to detect HTs. Experimental results on a variety of benchmarks demonstrate the efficacy and scalability of our RL agent, which obtains a significant reduction (169×) in the number of test patterns required while maintaining or improving coverage (95.75%) compared to the state-of-the-art techniques.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115779144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emotional AI or Affective Computing has been projected to grow rapidly in the upcoming years. Despite many existing developments in the application space, there has been a lack of hardware-level exploitation of the user's emotions. In this paper, we propose a deep collaboration between user's affects and the hardware system management on resource-limited edge devices. Based on classification results from efficient affect classifiers on smartphone devices, novel real-time management schemes for memory, and video processing are proposed to improve the energy efficiency of mobile devices. Case studies on H.264 / AVC video playback and Android smartphone usages are provided showing significant power saving of up to 23% and reduction of memory loading of up to 17% using the proposed affect adaptive architecture and system management schemes.
{"title":"Human emotion based real-time memory and computation management on resource-limited edge devices","authors":"Yijie Wei, Zhiwei Zhong, Jie Gu","doi":"10.1145/3489517.3530490","DOIUrl":"https://doi.org/10.1145/3489517.3530490","url":null,"abstract":"Emotional AI or Affective Computing has been projected to grow rapidly in the upcoming years. Despite many existing developments in the application space, there has been a lack of hardware-level exploitation of the user's emotions. In this paper, we propose a deep collaboration between user's affects and the hardware system management on resource-limited edge devices. Based on classification results from efficient affect classifiers on smartphone devices, novel real-time management schemes for memory, and video processing are proposed to improve the energy efficiency of mobile devices. Case studies on H.264 / AVC video playback and Android smartphone usages are provided showing significant power saving of up to 23% and reduction of memory loading of up to 17% using the proposed affect adaptive architecture and system management schemes.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115849955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hiago Mayk G. de A. Rocha, Janaina Schwarzrock, A. Lorenzon, A. C. S. Beck
This paper proposes PredG, a Machine Learning framework to enhance the graph processing performance by finding the ideal thread and data mapping on NUMA systems. PredG is agnostic to the input graph: it uses the available graphs' features to train an ANN to perform predictions as new graphs arrive - without any application execution after being trained. When evaluating PredG over representative graphs and algorithms on three NUMA systems, its solutions are up to 41% faster than the Linux OS Default and the Best Static - on average 2% far from the Oracle -, and it presents lower energy consumption.
本文提出了一种机器学习框架PredG,通过在NUMA系统上寻找理想的线程和数据映射来提高图形处理性能。PredG对输入图是不可知的:它使用可用图的特征来训练人工神经网络,在新图到达时执行预测——在训练后不执行任何应用程序。当在三个NUMA系统上对PredG的代表性图形和算法进行评估时,它的解决方案比Linux OS Default和Best Static快41%——平均比Oracle快2%——并且它表现出更低的能耗。
{"title":"Using machine learning to optimize graph execution on NUMA machines","authors":"Hiago Mayk G. de A. Rocha, Janaina Schwarzrock, A. Lorenzon, A. C. S. Beck","doi":"10.1145/3489517.3530581","DOIUrl":"https://doi.org/10.1145/3489517.3530581","url":null,"abstract":"This paper proposes PredG, a Machine Learning framework to enhance the graph processing performance by finding the ideal thread and data mapping on NUMA systems. PredG is agnostic to the input graph: it uses the available graphs' features to train an ANN to perform predictions as new graphs arrive - without any application execution after being trained. When evaluating PredG over representative graphs and algorithms on three NUMA systems, its solutions are up to 41% faster than the Linux OS Default and the Best Static - on average 2% far from the Oracle -, and it presents lower energy consumption.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124381842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linus Witschen, T. Wiersema, Lucas Reuter, M. Platzner
Approximate logic synthesis aims at trading off a circuit's quality to improve a target metric. Corresponding methods explore a search space by approximating circuit components and verifying the resulting quality of the overall circuit, which is costly. We propose a methodology that determines reasonable values for the component's local error bounds prior to search space exploration. Utilizing formal verification on a novel approximation miter guarantees the circuit's quality for such local error bounds, independent of employed approximation methods, resulting in reduced runtimes due to omitted verifications. Experiments show speed-ups of up to 3.7x for approximate logic synthesis using our method.
{"title":"Search space characterization for approximate logic synthesis","authors":"Linus Witschen, T. Wiersema, Lucas Reuter, M. Platzner","doi":"10.1145/3489517.3530463","DOIUrl":"https://doi.org/10.1145/3489517.3530463","url":null,"abstract":"Approximate logic synthesis aims at trading off a circuit's quality to improve a target metric. Corresponding methods explore a search space by approximating circuit components and verifying the resulting quality of the overall circuit, which is costly. We propose a methodology that determines reasonable values for the component's local error bounds prior to search space exploration. Utilizing formal verification on a novel approximation miter guarantees the circuit's quality for such local error bounds, independent of employed approximation methods, resulting in reduced runtimes due to omitted verifications. Experiments show speed-ups of up to 3.7x for approximate logic synthesis using our method.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124499871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quantum circuit placement (QCP) is the process of mapping the synthesized logical quantum programs on physical quantum machines, which introduces additional SWAP gates and affects the performance of quantum circuits. Nevertheless, determining the minimal number of SWAP gates has been demonstrated to be an NP-complete problem. Various heuristic approaches have been proposed to address QCP, but they suffer from suboptimality due to the lack of exploration. Although exact approaches can achieve higher optimality, they are not scalable for large quantum circuits due to the massive design space and expensive runtime. By formulating QCP as a bilevel optimization problem, this paper proposes a novel machine learning (ML)-based framework to tackle this challenge. To address the lower-level combinatorial optimization problem, we adopt a policy-based deep reinforcement learning (DRL) algorithm with knowledge transfer to enable the generalization ability of our framework. An evolutionary algorithm is then deployed to solve the upper-level discrete search problem, which optimizes the initial mapping with a lower SWAP cost. The proposed ML-based approach provides a new paradigm to overcome the drawbacks in both traditional heuristic and exact approaches while enabling the exploration of optimality-runtime trade-off. Compared with the leading heuristic approaches, our ML-based method significantly reduces the SWAP cost by up to 100%. In comparison with the leading exact search, our proposed algorithm achieves the same level of optimality while reducing the runtime cost by up to 40 times.
{"title":"Optimizing quantum circuit placement via machine learning","authors":"Hongxiang Fan, Ce Guo, W. Luk","doi":"10.1145/3489517.3530403","DOIUrl":"https://doi.org/10.1145/3489517.3530403","url":null,"abstract":"Quantum circuit placement (QCP) is the process of mapping the synthesized logical quantum programs on physical quantum machines, which introduces additional SWAP gates and affects the performance of quantum circuits. Nevertheless, determining the minimal number of SWAP gates has been demonstrated to be an NP-complete problem. Various heuristic approaches have been proposed to address QCP, but they suffer from suboptimality due to the lack of exploration. Although exact approaches can achieve higher optimality, they are not scalable for large quantum circuits due to the massive design space and expensive runtime. By formulating QCP as a bilevel optimization problem, this paper proposes a novel machine learning (ML)-based framework to tackle this challenge. To address the lower-level combinatorial optimization problem, we adopt a policy-based deep reinforcement learning (DRL) algorithm with knowledge transfer to enable the generalization ability of our framework. An evolutionary algorithm is then deployed to solve the upper-level discrete search problem, which optimizes the initial mapping with a lower SWAP cost. The proposed ML-based approach provides a new paradigm to overcome the drawbacks in both traditional heuristic and exact approaches while enabling the exploration of optimality-runtime trade-off. Compared with the leading heuristic approaches, our ML-based method significantly reduces the SWAP cost by up to 100%. In comparison with the leading exact search, our proposed algorithm achieves the same level of optimality while reducing the runtime cost by up to 40 times.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114834913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fan Zhang, Zhiyong Wang, Haoting Shen, Bolin Yang, Qianmei Wu, Kui Ren
With rapidly increasing demands for cloud computing, Field Programmable Gate Array (FPGA) has become popular in cloud datacenters. Although it improves computing performance through flexible hardware acceleration, new security concerns also come along. For example, unavoidable physical leakage from the Power Distribution Network (PDN) can be utilized by attackers to mount remote Side-Channel Attacks (SCA), such as Correlation Power Attacks (CPA). Remote Fault Attacks (FA) can also be successfully presented by malicious tenants in a cloud multi-tenant scenario, posing a significant threat to legal tenants. There are few hardware-based countermeasures to defeat both remote attacks that aforementioned. In this work, we exploit Time-to-Digital Converter (TDC) and propose a novel defense technique called DARPT (Defense Against Remote Physical attack based on TDC) to protect sensitive information from CPA and FA. Specifically, DARPT produces random clock jitters to reduce possible information leakage through the power side-channel and provides an early warning of FA by constantly monitoring the variation of the voltage drop across PDN. In comparison to the fact that 8k traces are enough for a successful CPA on FPGA without DARPT, our experimental results show that up to 800k traces (100 times) are not enough for the same FPGA protected by DARPT. Meanwhile, the TDC-based voltage monitor presents significant readout changes (by 51.82% or larger) under FA with ring oscillators, demonstrating sufficient sensitivities to voltage-drop-based FA.
{"title":"DARPT","authors":"Fan Zhang, Zhiyong Wang, Haoting Shen, Bolin Yang, Qianmei Wu, Kui Ren","doi":"10.1145/3489517.3530494","DOIUrl":"https://doi.org/10.1145/3489517.3530494","url":null,"abstract":"With rapidly increasing demands for cloud computing, Field Programmable Gate Array (FPGA) has become popular in cloud datacenters. Although it improves computing performance through flexible hardware acceleration, new security concerns also come along. For example, unavoidable physical leakage from the Power Distribution Network (PDN) can be utilized by attackers to mount remote Side-Channel Attacks (SCA), such as Correlation Power Attacks (CPA). Remote Fault Attacks (FA) can also be successfully presented by malicious tenants in a cloud multi-tenant scenario, posing a significant threat to legal tenants. There are few hardware-based countermeasures to defeat both remote attacks that aforementioned. In this work, we exploit Time-to-Digital Converter (TDC) and propose a novel defense technique called DARPT (Defense Against Remote Physical attack based on TDC) to protect sensitive information from CPA and FA. Specifically, DARPT produces random clock jitters to reduce possible information leakage through the power side-channel and provides an early warning of FA by constantly monitoring the variation of the voltage drop across PDN. In comparison to the fact that 8k traces are enough for a successful CPA on FPGA without DARPT, our experimental results show that up to 800k traces (100 times) are not enough for the same FPGA protected by DARPT. Meanwhile, the TDC-based voltage monitor presents significant readout changes (by 51.82% or larger) under FA with ring oscillators, demonstrating sufficient sensitivities to voltage-drop-based FA.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116024703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}