The expanding role of reinforcement learning (RL) in safety-critical system design has promoted ω-automata as a way to express learning requirements—often non-Markovian—with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: (1) weighted preference, where the decision maker provides scalar weights for various objectives, and (2) lexicographic preference, where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this article, we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple ω-regular objectives under weighted and lexicographic preferences. We provide a translation from multiple ω-regular objectives to a scalar reward signal that is both faithful (maximising reward means maximising probability of achieving the objectives under the corresponding preference) and effective (RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool, Mungojerrie, and we present an experimental evaluation of our technique on benchmark learning problems.
In this paper we present the novel verification of synchronous channel communication and channel alternation (choice) by considering the environment within which our primitives are executing. Our work is in exploring development of a multi-threaded scheduler for a cooperatively scheduled process-oriented language, ProcessJ. We use CSP to produce formal specifications for the implementation of the various parts of the language runtime (scheduler, runtime components, and generated code). We use established CSP specifications that model channel communication and choice as well as the formal verification tool FDR to formally prove that the implementations are correct and behave as expected, when executed by our scheduler (the execution environment). Our approach is novel and not seen in similar research, because we consider the behaviour of the systems we examine under the restrictions imposed by an execution environment (e.g., a runtime system, a scheduler, an operating system, etc.) and show that even with such restrictions the channel communication and alternation work. More specifically, we show correctness when a system is executed by the ProcessJ cooperative scheduler. The main contributions of this work are in the models defined and method undertaken to verify cooperatively channel communication and choice.
Recent years brought popularity and importance of complex event processing (CEP) and associated query languages. CEP systems can be hard to understand. It is often non-trivial to determine streams of events matched by a query, and sometimes we may not notice important edge cases. Hence, the desirability of formal semantics permitting reasoning about resulting complex events and checking if actual matchings agree with our intentions follows. In the paper we introduce a pattern language ({sl PatLang} ) with some unique syntactic features related to variable binding. We provide two distinct denotational semantics for ({sl PatLang} ): Minimal semantics, sufficient to describe when patterns match, and tree semantics, which provides detailed information about subpatterns with which the matched events actually match, i.e., information about interpretation of matched events induced by the pattern matching. The tree semantics is unnecessary for verifying correctness of pattern matching execution. However, we show that neither minimal semantics, nor semantics from the prior work suffices to effectively locate errors in patterns with respect to their intended meaning, and that the additional information provided by the tree semantics is crucial for that purpose. We prove that tree semantics can be mapped to minimal semantics. Finally, we provide some practical evaluation.
Universal quantifiers occur frequently in proof obligations produced by program verifiers, for instance, to axiomatize uninterpreted functions and to statically express properties of arrays. SMT-based verifiers typically reason about them via E-matching, an SMT algorithm that requires syntactic matching patterns to guide the quantifier instantiations. Devising good matching patterns is challenging. In particular, overly restrictive patterns may lead to spurious verification errors if the quantifiers needed for proof are not instantiated; they may also conceal unsoundness caused by inconsistent axiomatizations. In this article, we present the first technique that identifies and helps the users and the developers of program verifiers remedy the effects of overly restrictive matching patterns. We designed a novel algorithm to synthesize missing triggering terms required to complete unsatisfiability proofs via E-matching. Tool developers can use this information to refine their matching patterns and prevent similar verification errors, or to fix a detected unsoundness.