Neural networks (NN) are gaining importance in sequential decision-making. Deep reinforcement learning (DRL), in particular, is extremely successful in learning action policies in complex and dynamic environments. Despite this success, however, DRL technology is not without its failures, especially in safety-critical applications: (i) the training objective maximizes average rewards, which may disregard rare but critical situations and hence lack local robustness; (ii) optimization objectives targeting safety typically yield degenerated reward structures which for DRL to work must be replaced with proxy objectives. Here we introduce a methodology that can help to address both deficiencies. We incorporate evaluation stages (ES) into DRL, leveraging recent work on deep statistical model checking (DSMC), which verifies NN policies in Markov decision processes. Our ES apply DSMC at regular intervals to determine state space regions with weak performance. We adapt the subsequent DRL training priorities based on the outcome, (i) focusing DRL on critical situations, and (ii) allowing to foster arbitrary objectives.
We run case studies on two benchmarks. One of them is the Racetrack, an abstraction of autonomous driving that requires navigating a map without crashing into a wall. The other is MiniGrid, a widely used benchmark in the AI community. Our results show that DSMC-based ES can significantly improve both (i) and (ii).
Stochastic hybrid automata (SHA) are a powerful tool to evaluate the dependability and safety of critical infrastructures. However, the resolution of nondeterminism, which is present in many purely hybrid models, is often only implicitly considered in SHA. This paper instead proposes algorithms for computing maximum and minimum reachability probabilities for singular automata with urgent transitions and random clocks which follow arbitrary continuous probability distributions. We borrow a well-known approach from hybrid systems reachability analysis, namely flowpipe construction, which is then extended to optimize nondeterminism in the presence of random variables. Firstly, valuations of random clocks which ensure reachability of specific goal states are extracted from the computed flowpipes and secondly, reachability probabilities are computed by integrating over these valuations. We compute maximum and minimum probabilities for history-dependent prophetic and non-prophetic schedulers using set-based methods. The implementation featuring the library HyPro and the complexity of the approach are discussed in detail. Two case studies featuring nondeterministic choices show the feasibility of the approach.
Computational fluid dynamics (CFD) models have been widely used for prototyping data centers. Evolving them into high-fidelity and real-time digital twins is desirable for online operations of data centers. However, CFD models often have unsatisfactory accuracy and high computation overhead. Manually calibrating the CFD model parameters is tedious and labor-intensive. Existing automatic calibration approaches apply heuristics to search the model configurations. However, each search step requires a long-lasting process of repeatedly solving the CFD model, rendering them impractical especially for complex CFD models. This paper presents Kalibre, a knowledge-based neural surrogate approach that calibrates a CFD model by iterating four steps of i) training a neural surrogate model, ii) finding the optimal parameters through neural surrogate retraining, iii) configuring the found parameters back to the CFD model, and iv) validating the CFD model using sensor-measured data. Thus, the parameter search is offloaded to the lightweight neural surrogate. To speed up Kalibre’s convergence, we incorporate prior knowledge in training data initialization and surrogate architecture design. With about ten hours computation on a 64-core processor, Kalibre achieves mean absolute errors (MAEs) of 0.57°C and 0.88°C in calibrating the CFD models of two production data halls hosting thousands of servers. To accelerate CFD-based simulation, we further propose Kalibreduce that incorporates the energy balance principle to reduce the order of the calibrated CFD model. Evaluation shows the model reduction only introduces 0.1°C to 0.27°C extra errors, while accelerating the CFD-based simulations by thousand times.
Adaptive systems manage and regulate the behavior of devices or other systems using control loops to automatically adjust the value of some measured variables to equal the value of a desired set-point. These systems normally interact with physical parts or operate in physical environments, where uncertainty is unavoidable. Traditional approaches to manage that uncertainty use either robust control algorithms that consider bounded variations of the uncertain variables and worst-case scenarios or adaptive control methods that estimate the parameters and change the control laws accordingly. In this article, we propose to include the sources of uncertainty in the system models as first-class entities using random variables to simulate adaptive and control systems more faithfully, including not only the use of random variables to represent and operate with uncertain values but also to represent decisions based on their comparisons. Two exemplar systems are used to illustrate and validate our proposal.