Sandeep Banik, Thiagarajan Ramachandran, A. Bhattacharya, S. D. Bopardikar
Security of cyber-physical systems (CPS) continues to pose new challenges due to the tight integration and operational complexity of the cyber and physical components. To address these challenges, this article presents a domain-aware, optimization-based approach to determine an effective defense strategy for CPS in an automated fashion—by emulating a strategic adversary in the loop that exploits system vulnerabilities, interconnection of the CPS, and the dynamics of the physical components. Our approach builds on an adversarial decision-making model based on a Markov Decision Process (MDP) that determines the optimal cyber (discrete) and physical (continuous) attack actions over a CPS attack graph. The defense planning problem is modeled as a non-zero-sum game between the adversary and defender. We use a model-free reinforcement learning method to solve the adversary’s problem as a function of the defense strategy. We then employ Bayesian optimization (BO) to find an approximate best-response for the defender to harden the network against the resulting adversary policy. This process is iterated multiple times to improve the strategy for both players. We demonstrate the effectiveness of our approach on a ransomware-inspired graph with a smart building system as the physical process. Numerical studies show that our method converges to a Nash equilibrium for various defender-specific costs of network hardening.
{"title":"Automated Adversary-in-the-Loop Cyber-Physical Defense Planning","authors":"Sandeep Banik, Thiagarajan Ramachandran, A. Bhattacharya, S. D. Bopardikar","doi":"10.1145/3596222","DOIUrl":"https://doi.org/10.1145/3596222","url":null,"abstract":"Security of cyber-physical systems (CPS) continues to pose new challenges due to the tight integration and operational complexity of the cyber and physical components. To address these challenges, this article presents a domain-aware, optimization-based approach to determine an effective defense strategy for CPS in an automated fashion—by emulating a strategic adversary in the loop that exploits system vulnerabilities, interconnection of the CPS, and the dynamics of the physical components. Our approach builds on an adversarial decision-making model based on a Markov Decision Process (MDP) that determines the optimal cyber (discrete) and physical (continuous) attack actions over a CPS attack graph. The defense planning problem is modeled as a non-zero-sum game between the adversary and defender. We use a model-free reinforcement learning method to solve the adversary’s problem as a function of the defense strategy. We then employ Bayesian optimization (BO) to find an approximate best-response for the defender to harden the network against the resulting adversary policy. This process is iterated multiple times to improve the strategy for both players. We demonstrate the effectiveness of our approach on a ransomware-inspired graph with a smart building system as the physical process. Numerical studies show that our method converges to a Nash equilibrium for various defender-specific costs of network hardening.","PeriodicalId":7055,"journal":{"name":"ACM Transactions on Cyber-Physical Systems","volume":" ","pages":"1 - 25"},"PeriodicalIF":2.3,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44954583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naomi Stricker, Yingzhao Lian, Yuning Jiang, Colin N. Jones, L. Thiele
Distributed embedded systems are pervasive components jointly operating in a wide range of applications. Moving toward energy harvesting powered systems enables their long-term, sustainable, scalable, and maintenance-free operation. When these systems are used as components of an automatic control system to sense a control plant, energy availability limits when and how often sensed data are obtainable and therefore when and how often control updates can be performed. The time-varying and non-deterministic availability of harvested energy and the necessity to plan the energy usage of the energy harvesting sensor nodes ahead of time, on the one hand, have to be balanced with the dynamically changing and complex demand for control updates from the automatic control plant and thus energy usage, on the other hand. We propose a hierarchical approach with which the resources of the energy harvesting sensor nodes are managed on a long time horizon and on a faster timescale, self-triggered model predictive control controls the plant. The controller of the harvesting-based nodes’ resources schedules the future energy usage ahead of time and the self-triggered model predictive control incorporates these time-varying energy constraints. For this novel combination of energy harvesting and automatic control systems, we derive provable properties in terms of correctness, feasibility, and performance. We evaluate the approach on a double integrator and demonstrate its usability and performance in a room temperature and air quality control case study.
{"title":"Self-triggered Control with Energy Harvesting Sensor Nodes","authors":"Naomi Stricker, Yingzhao Lian, Yuning Jiang, Colin N. Jones, L. Thiele","doi":"10.1145/3597311","DOIUrl":"https://doi.org/10.1145/3597311","url":null,"abstract":"Distributed embedded systems are pervasive components jointly operating in a wide range of applications. Moving toward energy harvesting powered systems enables their long-term, sustainable, scalable, and maintenance-free operation. When these systems are used as components of an automatic control system to sense a control plant, energy availability limits when and how often sensed data are obtainable and therefore when and how often control updates can be performed. The time-varying and non-deterministic availability of harvested energy and the necessity to plan the energy usage of the energy harvesting sensor nodes ahead of time, on the one hand, have to be balanced with the dynamically changing and complex demand for control updates from the automatic control plant and thus energy usage, on the other hand. We propose a hierarchical approach with which the resources of the energy harvesting sensor nodes are managed on a long time horizon and on a faster timescale, self-triggered model predictive control controls the plant. The controller of the harvesting-based nodes’ resources schedules the future energy usage ahead of time and the self-triggered model predictive control incorporates these time-varying energy constraints. For this novel combination of energy harvesting and automatic control systems, we derive provable properties in terms of correctness, feasibility, and performance. We evaluate the approach on a double integrator and demonstrate its usability and performance in a room temperature and air quality control case study.","PeriodicalId":7055,"journal":{"name":"ACM Transactions on Cyber-Physical Systems","volume":"7 1","pages":"1 - 31"},"PeriodicalIF":2.3,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46026757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In vision-based object recognition systems imaging sensors perceive the environment and then objects are detected and classified for decision-making purposes; e.g., to maneuver an automated vehicle around an obstacle or to raise alarms for intruders in surveillance settings. In this work we demonstrate how camera-based perception can be unobtrusively manipulated to enable an attacker to create spurious objects or alter an existing object, by remotely projecting adversarial patterns into cameras, exploiting two common effects in optical imaging systems, viz., lens flare/ghost effects and auto-exposure control. To improve the robustness of the attack, we generate optimal patterns by integrating adversarial machine learning techniques with a trained end-to-end channel model. We experimentally demonstrate our attacks using a low-cost projector on three different cameras, and under different environments. Results show that, depending on the attack distance, attack success rates can reach as high as 100%, including under targeted conditions. We develop a countermeasure that reduces the problem of detecting ghost-based attacks into verifying whether there is a ghost overlapping with a detected object. We leverage spatiotemporal consistency to eliminate false positives. Evaluation on experimental data provides a worst-case equal error rate of 5%.
{"title":"Remote Perception Attacks against Camera-based Object Recognition Systems and Countermeasures","authors":"Yanmao Man, Ming Li, Ryan M. Gerdes","doi":"10.1145/3596221","DOIUrl":"https://doi.org/10.1145/3596221","url":null,"abstract":"In vision-based object recognition systems imaging sensors perceive the environment and then objects are detected and classified for decision-making purposes; e.g., to maneuver an automated vehicle around an obstacle or to raise alarms for intruders in surveillance settings. In this work we demonstrate how camera-based perception can be unobtrusively manipulated to enable an attacker to create spurious objects or alter an existing object, by remotely projecting adversarial patterns into cameras, exploiting two common effects in optical imaging systems, viz., lens flare/ghost effects and auto-exposure control. To improve the robustness of the attack, we generate optimal patterns by integrating adversarial machine learning techniques with a trained end-to-end channel model. We experimentally demonstrate our attacks using a low-cost projector on three different cameras, and under different environments. Results show that, depending on the attack distance, attack success rates can reach as high as 100%, including under targeted conditions. We develop a countermeasure that reduces the problem of detecting ghost-based attacks into verifying whether there is a ghost overlapping with a detected object. We leverage spatiotemporal consistency to eliminate false positives. Evaluation on experimental data provides a worst-case equal error rate of 5%.","PeriodicalId":7055,"journal":{"name":"ACM Transactions on Cyber-Physical Systems","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49484259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Discrete event systems are increasingly used as a modeling tool to assess safety and cybersecurity of complex systems. In both cases, the analysis relies on the extraction of critical sequences. This approach proves to be very powerful. It suffers, however, from the combinatorial explosion of the number of sequences to look at. To push the limits of what is feasible with reasonable computational resources, extraction algorithms use cutoffs and minimality criteria. In this article, we review the principles of extraction algorithms, and we show that there are important differences between critical sequences extracted in the context of safety analyses and those extracted in the context of cybersecurity analyses. Based on this thorough comparison, we introduce a new cutoff criterion, so-called footprint, that aims at capturing the willfulness of an intruder performing a cyberattack. We illustrate our presentation by means of three case studies, one focused on the analysis of failures and two focused on the analysis of cyberattacks and their effects on safety. We show experimentally the interest of the footprint criterion.
{"title":"Minimal Critical Sequences in Model-based Safety and Security Analyses: Commonalities and Differences","authors":"Théo Serru, Nga Nguyen, M. Batteux, A. Rauzy","doi":"10.1145/3593811","DOIUrl":"https://doi.org/10.1145/3593811","url":null,"abstract":"Discrete event systems are increasingly used as a modeling tool to assess safety and cybersecurity of complex systems. In both cases, the analysis relies on the extraction of critical sequences. This approach proves to be very powerful. It suffers, however, from the combinatorial explosion of the number of sequences to look at. To push the limits of what is feasible with reasonable computational resources, extraction algorithms use cutoffs and minimality criteria. In this article, we review the principles of extraction algorithms, and we show that there are important differences between critical sequences extracted in the context of safety analyses and those extracted in the context of cybersecurity analyses. Based on this thorough comparison, we introduce a new cutoff criterion, so-called footprint, that aims at capturing the willfulness of an intruder performing a cyberattack. We illustrate our presentation by means of three case studies, one focused on the analysis of failures and two focused on the analysis of cyberattacks and their effects on safety. We show experimentally the interest of the footprint criterion.","PeriodicalId":7055,"journal":{"name":"ACM Transactions on Cyber-Physical Systems","volume":"7 1","pages":"1 - 20"},"PeriodicalIF":2.3,"publicationDate":"2023-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49085060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Mahmood, Quan Z. Sheng, W. Zhang, Yan Wang, S. Sagar
Recent considerable state-of-the-art advancements within the automotive sector, coupled with an evolution of the promising paradigms of vehicle-to-everything communication and the Internet of Vehicles (IoV), have facilitated vehicles to generate and, accordingly, disseminate an enormous amount of safety-critical and non-safety infotainment data in a bid to guarantee a highly safe, convenient, and congestion-aware road transport. These dynamic networks require intelligent security measures to ensure that the malicious messages, along with the vehicles that disseminate them, are identified and subsequently eliminated in a timely manner so that they are not in a position to harm other vehicles. Failing to do so could jeopardize the entire network, leading to fatalities and injuries amongst road users. Several researchers, over the years, have envisaged conventional cryptographic-based solutions employing certificates and the public key infrastructure for enhancing the security of vehicular networks. Nevertheless, cryptographic-based solutions are not optimum for an IoV network primarily, since the cryptographic schemes could be susceptible to compromised trust authorities and insider attacks that are highly deceptive in nature and cannot be noticed immediately and are, therefore, capable of causing catastrophic damage. Accordingly, in this article, a distributed trust management system has been proposed that ascertains the trust of all the reputation segments within an IoV network. The envisaged system takes into consideration the salient characteristics of familiarity, i.e., assessed via a subjective logic approach, similarity, and timeliness to ascertain the weights of all the reputation segments. Furthermore, an intelligent trust threshold mechanism has been developed for the identification and eviction of the misbehaving vehicles. The experimental results suggest the advantages of our proposed IoV-based trust management system in terms of optimizing the misbehavior detection and its resilience to various sorts of attacks.
{"title":"Toward a Distributed Trust Management System for Misbehavior Detection in the Internet of Vehicles","authors":"A. Mahmood, Quan Z. Sheng, W. Zhang, Yan Wang, S. Sagar","doi":"10.1145/3594637","DOIUrl":"https://doi.org/10.1145/3594637","url":null,"abstract":"Recent considerable state-of-the-art advancements within the automotive sector, coupled with an evolution of the promising paradigms of vehicle-to-everything communication and the Internet of Vehicles (IoV), have facilitated vehicles to generate and, accordingly, disseminate an enormous amount of safety-critical and non-safety infotainment data in a bid to guarantee a highly safe, convenient, and congestion-aware road transport. These dynamic networks require intelligent security measures to ensure that the malicious messages, along with the vehicles that disseminate them, are identified and subsequently eliminated in a timely manner so that they are not in a position to harm other vehicles. Failing to do so could jeopardize the entire network, leading to fatalities and injuries amongst road users. Several researchers, over the years, have envisaged conventional cryptographic-based solutions employing certificates and the public key infrastructure for enhancing the security of vehicular networks. Nevertheless, cryptographic-based solutions are not optimum for an IoV network primarily, since the cryptographic schemes could be susceptible to compromised trust authorities and insider attacks that are highly deceptive in nature and cannot be noticed immediately and are, therefore, capable of causing catastrophic damage. Accordingly, in this article, a distributed trust management system has been proposed that ascertains the trust of all the reputation segments within an IoV network. The envisaged system takes into consideration the salient characteristics of familiarity, i.e., assessed via a subjective logic approach, similarity, and timeliness to ascertain the weights of all the reputation segments. Furthermore, an intelligent trust threshold mechanism has been developed for the identification and eviction of the misbehaving vehicles. The experimental results suggest the advantages of our proposed IoV-based trust management system in terms of optimizing the misbehavior detection and its resilience to various sorts of attacks.","PeriodicalId":7055,"journal":{"name":"ACM Transactions on Cyber-Physical Systems","volume":"7 1","pages":"1 - 25"},"PeriodicalIF":2.3,"publicationDate":"2023-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43775819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammed Asiri, N. Saxena, Rigel Gjomemo, P. Burnap
Numerous sophisticated and nation-state attacks on Industrial Control Systems (ICSs) have increased in recent years, exemplified by Stuxnet and Ukrainian Power Grid. Measures to be taken post-incident are crucial to reduce damage, restore control, and identify attack actors involved. By monitoring Indicators of Compromise (IOCs), the incident responder can detect malicious activity triggers and respond quickly to a similar intrusion at an earlier stage. However, to implement IOCs in critical infrastructures, we need to understand their contexts and requirements. Unfortunately, there is no survey paper in the literature on IOC in the ICS environment, and only limited information is provided in research articles. In this article, we describe different standards for IOC representation and discuss the associated challenges that restrict security investigators from developing IOCs in the industrial sectors. We also discuss the potential IOCs against cyber-attacks in ICS systems. Furthermore, we conduct a critical analysis of existing works and available tools in this space. We evaluate the effectiveness of identified IOCs’ by mapping these indicators to the most frequently targeted attacks in the ICS environment. Finally, we highlight the lessons to be learned from the literature and the future problems in the domain along with the approaches that might be taken.
{"title":"Understanding Indicators of Compromise against Cyber-attacks in Industrial Control Systems: A Security Perspective","authors":"Mohammed Asiri, N. Saxena, Rigel Gjomemo, P. Burnap","doi":"10.1145/3587255","DOIUrl":"https://doi.org/10.1145/3587255","url":null,"abstract":"Numerous sophisticated and nation-state attacks on Industrial Control Systems (ICSs) have increased in recent years, exemplified by Stuxnet and Ukrainian Power Grid. Measures to be taken post-incident are crucial to reduce damage, restore control, and identify attack actors involved. By monitoring Indicators of Compromise (IOCs), the incident responder can detect malicious activity triggers and respond quickly to a similar intrusion at an earlier stage. However, to implement IOCs in critical infrastructures, we need to understand their contexts and requirements. Unfortunately, there is no survey paper in the literature on IOC in the ICS environment, and only limited information is provided in research articles. In this article, we describe different standards for IOC representation and discuss the associated challenges that restrict security investigators from developing IOCs in the industrial sectors. We also discuss the potential IOCs against cyber-attacks in ICS systems. Furthermore, we conduct a critical analysis of existing works and available tools in this space. We evaluate the effectiveness of identified IOCs’ by mapping these indicators to the most frequently targeted attacks in the ICS environment. Finally, we highlight the lessons to be learned from the literature and the future problems in the domain along with the approaches that might be taken.","PeriodicalId":7055,"journal":{"name":"ACM Transactions on Cyber-Physical Systems","volume":"7 1","pages":"1 - 33"},"PeriodicalIF":2.3,"publicationDate":"2023-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47967497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdulrahman Fahim, E. Papalexakis, S. Krishnamurthy, Amit K. Roy Chowdhury, L. Kaplan, T. Abdelzaher
Steerable cameras that can be controlled via a network, to retrieve telemetries of interest have become popular. In this paper, we develop a framework called AcTrak, to automate a camera’s motion to appropriately switch between (a) zoom ins on existing targets in a scene to track their activities, and (b) zoom out to search for new targets arriving to the area of interest. Specifically, we seek to achieve a good trade-off between the two tasks, i.e., we want to ensure that new targets are observed by the camera before they leave the scene, while also zooming in on existing targets frequently enough to monitor their activities. There exist prior control algorithms for steering cameras to optimize certain objectives; however, to the best of our knowledge, none have considered this problem, and do not perform well when target activity tracking is required. AcTrak automatically controls the camera’s PTZ configurations using reinforcement learning (RL), to select the best camera position given the current state. Via simulations using real datasets, we show that AcTrak detects newly arriving targets 30% faster than a non-adaptive baseline and rarely misses targets, unlike the baseline which can miss up to 5% of the targets. We also implement AcTrak to control a real camera and demonstrate that in comparison with the baseline, it acquires about 2× more high resolution images of targets.
{"title":"AcTrak: Controlling a Steerable Surveillance Camera using Reinforcement Learning","authors":"Abdulrahman Fahim, E. Papalexakis, S. Krishnamurthy, Amit K. Roy Chowdhury, L. Kaplan, T. Abdelzaher","doi":"10.1145/3585316","DOIUrl":"https://doi.org/10.1145/3585316","url":null,"abstract":"Steerable cameras that can be controlled via a network, to retrieve telemetries of interest have become popular. In this paper, we develop a framework called AcTrak, to automate a camera’s motion to appropriately switch between (a) zoom ins on existing targets in a scene to track their activities, and (b) zoom out to search for new targets arriving to the area of interest. Specifically, we seek to achieve a good trade-off between the two tasks, i.e., we want to ensure that new targets are observed by the camera before they leave the scene, while also zooming in on existing targets frequently enough to monitor their activities. There exist prior control algorithms for steering cameras to optimize certain objectives; however, to the best of our knowledge, none have considered this problem, and do not perform well when target activity tracking is required. AcTrak automatically controls the camera’s PTZ configurations using reinforcement learning (RL), to select the best camera position given the current state. Via simulations using real datasets, we show that AcTrak detects newly arriving targets 30% faster than a non-adaptive baseline and rarely misses targets, unlike the baseline which can miss up to 5% of the targets. We also implement AcTrak to control a real camera and demonstrate that in comparison with the baseline, it acquires about 2× more high resolution images of targets.","PeriodicalId":7055,"journal":{"name":"ACM Transactions on Cyber-Physical Systems","volume":" ","pages":"1 - 27"},"PeriodicalIF":2.3,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48191774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Chakraborty, S. Jha, Soheil Samii, Philipp Mundhenk
One might argue that automotive and allied domains like robotics serve as the best possible examples of what “cyber-physical systems” (CPS) are. Here, the correctness of the underlying electronics and software (or cyber) components are defined by the dynamics of the vehicle or the robot, viz., the physical components of the system. This shift in perspective on how electronics and software should be modeled and synthesized, and how their correctness should be defined, has led to a tremendous volume of research on CPS in recent times [7, 8, 43, 56]. At the same time, the volume of electronics and software in modern cars has also grown tremendously. Today, high-end cars have more than 100 control computers or electronic control units (ECUs) embedded in them, that run hundreds of millions of lines of software code implementing a range of diverse functions. These functions span across engine and brake control, to the body and entertainment domains. Cars are also equipped with a variety of cameras, radars, and lidar sensors that are used to perceive the external world and take the appropriate control actions as a part of driver assistance features that are common today. As such features continue to accelerate the evolution and adoption of fully autonomous vehicles, the role of electronics and software in the automotive domain is increasing at an unprecedented pace, and modern automobiles are now aptly referred
{"title":"Introduction to the Special Issue on Automotive CPS Safety & Security: Part 1","authors":"S. Chakraborty, S. Jha, Soheil Samii, Philipp Mundhenk","doi":"10.1145/3579986","DOIUrl":"https://doi.org/10.1145/3579986","url":null,"abstract":"One might argue that automotive and allied domains like robotics serve as the best possible examples of what “cyber-physical systems” (CPS) are. Here, the correctness of the underlying electronics and software (or cyber) components are defined by the dynamics of the vehicle or the robot, viz., the physical components of the system. This shift in perspective on how electronics and software should be modeled and synthesized, and how their correctness should be defined, has led to a tremendous volume of research on CPS in recent times [7, 8, 43, 56]. At the same time, the volume of electronics and software in modern cars has also grown tremendously. Today, high-end cars have more than 100 control computers or electronic control units (ECUs) embedded in them, that run hundreds of millions of lines of software code implementing a range of diverse functions. These functions span across engine and brake control, to the body and entertainment domains. Cars are also equipped with a variety of cameras, radars, and lidar sensors that are used to perceive the external world and take the appropriate control actions as a part of driver assistance features that are common today. As such features continue to accelerate the evolution and adoption of fully autonomous vehicles, the role of electronics and software in the automotive domain is increasing at an unprecedented pace, and modern automobiles are now aptly referred","PeriodicalId":7055,"journal":{"name":"ACM Transactions on Cyber-Physical Systems","volume":"7 1","pages":"1 - 6"},"PeriodicalIF":2.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48339430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eugene Vinitsky, Nathan Lichtlé, Kanaad Parvate, A. Bayen
We study the ability of autonomous vehicles to improve the throughput of a bottleneck using a fully decentralized control scheme in a mixed autonomy setting. We consider the problem of improving the throughput of a scaled model of the San Francisco–Oakland Bay Bridge: a two-stage bottleneck where four lanes reduce to two and then reduce to one. Although there is extensive work examining variants of bottleneck control in a centralized setting, there is less study of the challenging multi-agent setting where the large number of interacting AVs leads to significant optimization difficulties for reinforcement learning methods. We apply multi-agent reinforcement algorithms to this problem and demonstrate that significant improvements in bottleneck throughput, from 20% at a 5% penetration rate to 33% at a 40% penetration rate, can be achieved. We compare our results to a hand-designed feedback controller and demonstrate that our results sharply outperform the feedback controller despite extensive tuning. Additionally, we demonstrate that the RL-based controllers adopt a robust strategy that works across penetration rates whereas the feedback controllers degrade immediately upon penetration rate variation. We investigate the feasibility of both action and observation decentralization and demonstrate that effective strategies are possible using purely local sensing. Finally, we open-source our code at https://github.com/eugenevinitsky/decentralized_bottlenecks.
{"title":"Optimizing Mixed Autonomy Traffic Flow with Decentralized Autonomous Vehicles and Multi-Agent Reinforcement Learning","authors":"Eugene Vinitsky, Nathan Lichtlé, Kanaad Parvate, A. Bayen","doi":"10.1145/3582576","DOIUrl":"https://doi.org/10.1145/3582576","url":null,"abstract":"We study the ability of autonomous vehicles to improve the throughput of a bottleneck using a fully decentralized control scheme in a mixed autonomy setting. We consider the problem of improving the throughput of a scaled model of the San Francisco–Oakland Bay Bridge: a two-stage bottleneck where four lanes reduce to two and then reduce to one. Although there is extensive work examining variants of bottleneck control in a centralized setting, there is less study of the challenging multi-agent setting where the large number of interacting AVs leads to significant optimization difficulties for reinforcement learning methods. We apply multi-agent reinforcement algorithms to this problem and demonstrate that significant improvements in bottleneck throughput, from 20% at a 5% penetration rate to 33% at a 40% penetration rate, can be achieved. We compare our results to a hand-designed feedback controller and demonstrate that our results sharply outperform the feedback controller despite extensive tuning. Additionally, we demonstrate that the RL-based controllers adopt a robust strategy that works across penetration rates whereas the feedback controllers degrade immediately upon penetration rate variation. We investigate the feasibility of both action and observation decentralization and demonstrate that effective strategies are possible using purely local sensing. Finally, we open-source our code at https://github.com/eugenevinitsky/decentralized_bottlenecks.","PeriodicalId":7055,"journal":{"name":"ACM Transactions on Cyber-Physical Systems","volume":"7 1","pages":"1 - 22"},"PeriodicalIF":2.3,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48028055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruihang Wang, Zhi-Ying Cao, Xiaoxia Zhou, Yonggang Wen, Rui Tan
Deep reinforcement learning (DRL) has shown good performance in tackling Markov decision process (MDP) problems. As DRL optimizes a long-term reward, it is a promising approach to improving the energy efficiency of data center cooling. However, enforcement of thermal safety constraints during DRL’s state exploration is a main challenge. The widely adopted reward shaping approach adds negative reward when the exploratory action results in unsafety. Thus, it needs to experience sufficient unsafe states before it learns how to prevent unsafety. In this paper, we propose a safety-aware DRL framework for data center cooling control. It applies offline imitation learning and online post-hoc rectification to holistically prevent thermal unsafety during online DRL. In particular, the post-hoc rectification searches for the minimum modification to the DRL-recommended action such that the rectified action will not result in unsafety. The rectification is designed based on a thermal state transition model that is fitted using historical safe operation traces and able to extrapolate the transitions to unsafe states explored by DRL. Extensive evaluation for chilled water and direct expansion-cooled data centers in two climate conditions show that our approach saves 18% to 26.6% of total data center power compared with conventional control and reduces safety violations by 94.5% to 99% compared with reward shaping. We also extend the proposed framework to address data centers with non-uniform temperature distributions for detailed safety considerations. The evaluation shows that our approach saves 14% power usage compared with the PID control while addressing safety compliance during the training.
{"title":"Green Data Center Cooling Control via Physics-Guided Safe Reinforcement Learning","authors":"Ruihang Wang, Zhi-Ying Cao, Xiaoxia Zhou, Yonggang Wen, Rui Tan","doi":"10.1145/3582577","DOIUrl":"https://doi.org/10.1145/3582577","url":null,"abstract":"Deep reinforcement learning (DRL) has shown good performance in tackling Markov decision process (MDP) problems. As DRL optimizes a long-term reward, it is a promising approach to improving the energy efficiency of data center cooling. However, enforcement of thermal safety constraints during DRL’s state exploration is a main challenge. The widely adopted reward shaping approach adds negative reward when the exploratory action results in unsafety. Thus, it needs to experience sufficient unsafe states before it learns how to prevent unsafety. In this paper, we propose a safety-aware DRL framework for data center cooling control. It applies offline imitation learning and online post-hoc rectification to holistically prevent thermal unsafety during online DRL. In particular, the post-hoc rectification searches for the minimum modification to the DRL-recommended action such that the rectified action will not result in unsafety. The rectification is designed based on a thermal state transition model that is fitted using historical safe operation traces and able to extrapolate the transitions to unsafe states explored by DRL. Extensive evaluation for chilled water and direct expansion-cooled data centers in two climate conditions show that our approach saves 18% to 26.6% of total data center power compared with conventional control and reduces safety violations by 94.5% to 99% compared with reward shaping. We also extend the proposed framework to address data centers with non-uniform temperature distributions for detailed safety considerations. The evaluation shows that our approach saves 14% power usage compared with the PID control while addressing safety compliance during the training.","PeriodicalId":7055,"journal":{"name":"ACM Transactions on Cyber-Physical Systems","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47468330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}