Pub Date : 2024-12-03DOI: 10.1016/j.scico.2024.103254
Blair Archibald , Michele Sevegnani , Mengwei Xu
Belief-Desire-Intention (BDI) agents feature uncertain beliefs (e.g. sensor noise), probabilistic action outcomes (e.g. attempting and action and failing), and non-deterministic choices (e.g. what plan to execute next). To be safely applied in real-world scenarios we need reason about such agents, for example, we need probabilities of mission success and the strategies used to maximise this. Most agents do not currently consider uncertain beliefs, instead a belief either holds or does not. We show how to use epistemic states to model uncertain beliefs, and define a Markov Decision Process for the semantics of the Conceptual Agent Notation (Can) agent language allowing support for uncertain beliefs, non-deterministic event, plan, and intention selection, and probabilistic action outcomes. The model is executable using an automated tool—CAN-verify—that supports error checking, agent simulation, and exhaustive exploration via an encoding to Bigraphs that produces transition systems for probabilistic model checkers such as PRISM. These model checkers allow reasoning over quantitative properties and strategy synthesis. Using the example of an autonomous submarine and drone surveillance together with scalability experiments, we demonstrate our approach supports uncertain belief modelling, quantitative model checking, and strategy synthesis in practice.
{"title":"Modelling and verifying BDI agents under uncertainty","authors":"Blair Archibald , Michele Sevegnani , Mengwei Xu","doi":"10.1016/j.scico.2024.103254","DOIUrl":"10.1016/j.scico.2024.103254","url":null,"abstract":"<div><div>Belief-Desire-Intention (BDI) agents feature uncertain beliefs (e.g. sensor noise), probabilistic action outcomes (e.g. attempting and action and failing), and non-deterministic choices (e.g. what plan to execute next). To be safely applied in real-world scenarios we need reason about such agents, for example, we need probabilities of mission success and the <em>strategies</em> used to maximise this. Most agents do not currently consider uncertain beliefs, instead a belief either holds or does not. We show how to use epistemic states to model uncertain beliefs, and define a Markov Decision Process for the semantics of the Conceptual Agent Notation (<span>Can</span>) agent language allowing support for uncertain beliefs, non-deterministic event, plan, and intention selection, and probabilistic action outcomes. The model is executable using an automated tool—<span>CAN-verify</span>—that supports error checking, agent simulation, and exhaustive exploration via an encoding to Bigraphs that produces transition systems for probabilistic model checkers such as PRISM. These model checkers allow reasoning over quantitative properties and strategy synthesis. Using the example of an autonomous submarine and drone surveillance together with scalability experiments, we demonstrate our approach supports uncertain belief modelling, quantitative model checking, and strategy synthesis in practice.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"242 ","pages":"Article 103254"},"PeriodicalIF":1.5,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143167308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-02DOI: 10.1016/j.scico.2024.103252
Trey Woodlief, Felipe Toledo, Sebastian Elbaum, Matthew B. Dwyer
As autonomous vehicles (AVs) become mainstream, assuring that they operate in accordance with safe driving properties becomes paramount. The ability to specify and monitor driving properties is at the center of such assurance. Yet, the mismatch between the semantic space over which typical driving properties are asserted (e.g., vehicles, pedestrians) and the sensed inputs of AVs (e.g., images, point clouds) poses a significant assurance gap. Related efforts bypass this gap by either assuming that data at the right semantic level is available, or they develop bespoke methods for capturing such data. Our recent Scene Graph Safety Monitoring (SGSM) framework addresses this challenge by extracting scene graphs (SGs) from sensor inputs to capture the entities related to the AV, specifying driving properties using a domain-specific language that enables building propositions over those graphs and composing them through temporal logic, and synthesizing monitors to detect property violations. Through this paper we further explain, formalize, analyze, and extend the SGSM framework, producing SGSM++. This extension is significant in that it incorporates the ability for the framework to encode the semantics of resetting a property violation, enabling the framework to count the quantity and duration of violations.
We implemented SGSM++ to monitor for violations of 9 properties of 3 AVs from the CARLA Autonomous Driving Leaderboard, confirming the viability of the framework, which found that the AVs violated 71% of properties during at least one test including almost 1400 unique violations over 30 total test executions, with violations lasting up to 9.25 minutes. Artifact available at https://github.com/less-lab-uva/ExtendingSGSM.
{"title":"The SGSM framework: Enabling the specification and monitor synthesis of safe driving properties through scene graphs","authors":"Trey Woodlief, Felipe Toledo, Sebastian Elbaum, Matthew B. Dwyer","doi":"10.1016/j.scico.2024.103252","DOIUrl":"10.1016/j.scico.2024.103252","url":null,"abstract":"<div><div>As autonomous vehicles (AVs) become mainstream, assuring that they operate in accordance with safe driving properties becomes paramount. The ability to specify and monitor driving properties is at the center of such assurance. Yet, the mismatch between the semantic space over which typical driving properties are asserted (e.g., vehicles, pedestrians) and the sensed inputs of AVs (e.g., images, point clouds) poses a significant assurance gap. Related efforts bypass this gap by either assuming that data at the right semantic level is available, or they develop bespoke methods for capturing such data. Our recent Scene Graph Safety Monitoring (SGSM) framework addresses this challenge by extracting scene graphs (SGs) from sensor inputs to capture the entities related to the AV, specifying driving properties using a domain-specific language that enables building propositions over those graphs and composing them through temporal logic, and synthesizing monitors to detect property violations. Through this paper we further explain, formalize, analyze, and extend the SGSM framework, producing SGSM++. This extension is significant in that it incorporates the ability for the framework to encode the semantics of <em>resetting</em> a property violation, enabling the framework to count the quantity and duration of violations.</div><div>We implemented SGSM++ to monitor for violations of 9 properties of 3 AVs from the CARLA Autonomous Driving Leaderboard, confirming the viability of the framework, which found that the AVs violated 71% of properties during at least one test including almost 1400 unique violations over 30 total test executions, with violations lasting up to 9.25 minutes. Artifact available at <span><span>https://github.com/less-lab-uva/ExtendingSGSM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"242 ","pages":"Article 103252"},"PeriodicalIF":1.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143167306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present novel empirical assessments of prominent finite state machine (FSM) conformance test derivation methods against their coverage of code faults. We consider a number of realistic extended FSM examples with their related Java implementations and derive for these examples complete test suites using the W method and its HSI and H derivatives considering the case when the implementation under test (IUT) has the same number of states as the specification FSM. We also consider , , and test suites derived considering the case when the IUT can have one more extra state. For each pair of considered test suites, we determine if there is a difference between the pair in covering the implementations faults. If the difference is significant, we determine which test suite outperforms the other. We run two other assessments which show that the obtained results are not due to the size or length of the test suites. In addition, we conduct assessments to determine whether each of the methods has better coverage of certain classes of faults than others and whether the W outperforms the HSI and H methods over only certain classes of faults. The results and outcomes of conducted experiments are summarized. Major artifacts used in the assessments are provided as benchmarks for further studies.
我们针对代码故障的覆盖范围,对著名的有限状态机(FSM)一致性测试推导方法进行了新颖的实证评估。我们考虑了一些现实的扩展 FSM 示例及其相关的 Java 实现,并使用 W 方法及其 HSI 和 H 派生方法为这些示例推导出完整的测试套件,其中考虑了被测实现(IUT)与规范 FSM 具有相同状态数的情况。我们还考虑了 W++、HSI++ 和 H++ 测试套件,它们是在 IUT 可以多一个状态的情况下产生的。对于每一对测试套件,我们都要确定它们在覆盖实现故障方面是否存在差异。如果差异显著,我们将确定哪个测试套件优于另一个。我们还进行了另外两项评估,结果表明所获得的结果与测试套件的大小或长度无关。此外,我们还进行了评估,以确定每种方法对某些类别故障的覆盖率是否优于其他方法,以及 W 方法是否仅在某些类别的故障上优于 HSI 和 H 方法。现对实验结果和成果进行总结。评估中使用的主要工件将作为进一步研究的基准。
{"title":"Assessing the coverage of W-based conformance testing methods over code faults","authors":"Khaled El-Fakih , Faiz Hassan , Ayman Alzaatreh , Nina Yevtushenko","doi":"10.1016/j.scico.2024.103234","DOIUrl":"10.1016/j.scico.2024.103234","url":null,"abstract":"<div><div>We present novel empirical assessments of prominent finite state machine (FSM) conformance test derivation methods against their coverage of code faults. We consider a number of realistic extended FSM examples with their related Java implementations and derive for these examples complete test suites using the <em>W</em> method and its <em>HSI</em> and <em>H</em> derivatives considering the case when the implementation under test (IUT) has the same number of states as the specification FSM. We also consider <span><math><msup><mrow><mi>W</mi></mrow><mrow><mo>+</mo><mo>+</mo></mrow></msup></math></span>, <span><math><mi>H</mi><mi>S</mi><msup><mrow><mi>I</mi></mrow><mrow><mo>+</mo><mo>+</mo></mrow></msup></math></span>, and <span><math><msup><mrow><mi>H</mi></mrow><mrow><mo>+</mo><mo>+</mo></mrow></msup></math></span> test suites derived considering the case when the IUT can have one more extra state. For each pair of considered test suites, we determine if there is a difference between the pair in covering the implementations faults. If the difference is significant, we determine which test suite outperforms the other. We run two other assessments which show that the obtained results are not due to the size or length of the test suites. In addition, we conduct assessments to determine whether each of the methods has better coverage of certain classes of faults than others and whether the <em>W</em> outperforms the <em>HSI</em> and <em>H</em> methods over only certain classes of faults. The results and outcomes of conducted experiments are summarized. Major artifacts used in the assessments are provided as benchmarks for further studies.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103234"},"PeriodicalIF":1.5,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142720058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-17DOI: 10.1016/j.scico.2024.103232
Andy S. Tatman , Hans-Dieter A. Hiep , Stijn de Gouw
This artifact [1] (accompanying our iFM 2023 paper [2]) describes the software we developed that contributed towards our analysis of OpenJDK's BitSet class. This class represents a vector of bits that grows as needed. Our analysis exposed numerous bugs. In our paper, we proposed and compared a number of solutions supported by formal specifications. Full mechanical verification of the BitSet class is not yet possible due to limited support for bitwise operations in KeY and bugs in BitSet. Our artifact contains proofs for a subset of the methods and new proof rules to support bitwise operators.
{"title":"Analysis and formal specification of OpenJDK's BitSet: Proof files","authors":"Andy S. Tatman , Hans-Dieter A. Hiep , Stijn de Gouw","doi":"10.1016/j.scico.2024.103232","DOIUrl":"10.1016/j.scico.2024.103232","url":null,"abstract":"<div><div>This artifact <span><span>[1]</span></span> (accompanying our iFM 2023 paper <span><span>[2]</span></span>) describes the software we developed that contributed towards our analysis of OpenJDK's <span>BitSet</span> class. This class represents a vector of bits that grows as needed. Our analysis exposed numerous bugs. In our paper, we proposed and compared a number of solutions supported by formal specifications. Full mechanical verification of the <span>BitSet</span> class is not yet possible due to limited support for bitwise operations in KeY and bugs in BitSet. Our artifact contains proofs for a subset of the methods and new proof rules to support bitwise operators.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103232"},"PeriodicalIF":1.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142702235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-16DOI: 10.1016/j.scico.2024.103231
Achim D. Brucker , Idir Ait-Sadoune , Nicolas Méric , Burkhart Wolff
Isabelle/DOF is an ontology framework on top of Isabelle/HOL. It allows for the formal development of ontologies and continuous conformity-checking of integrated documents, including the tracing of typed meta-data of documents. Isabelle/DOF deeply integrates into the Isabelle/HOL ecosystem, allowing to write documents containing (informal) text, executable code, (formal and semiformal) definitions, and proofs. Users of Isabelle/DOF can either use HOL or one of the many formal methods that have been embedded into Isabelle/HOL to express formal parts of their documents.
In this paper, we extend Isabelle/DOF with annotations of -terms, a pervasive data-structure underlying Isabelle to syntactically represent expressions and formulas. We achieve this by using Higher-order Logic (HOL) itself for query-expressions and data-constraints (ontological invariants) executed via code-generation and reflection. Moreover, we add support for parametric ontological classes, thus exploiting HOL's polymorphic type system.
The benefits are: First, the HOL representation allows for flexible and efficient run-time checking of abstract properties of formal content under evolution. Second, it is possible to prove properties over generic ontological classes. We demonstrate these new features by a number of smaller ontologies from various domains and a case study using a substantial ontology for formal system development targeting certification according to CENELEC 50128.
{"title":"Parametric ontologies in formal software engineering","authors":"Achim D. Brucker , Idir Ait-Sadoune , Nicolas Méric , Burkhart Wolff","doi":"10.1016/j.scico.2024.103231","DOIUrl":"10.1016/j.scico.2024.103231","url":null,"abstract":"<div><div>Isabelle/DOF is an ontology framework on top of Isabelle/HOL. It allows for the formal development of ontologies and continuous conformity-checking of integrated documents, including the tracing of typed meta-data of documents. Isabelle/DOF deeply integrates into the Isabelle/HOL ecosystem, allowing to write documents containing (informal) text, executable code, (formal and semiformal) definitions, and proofs. Users of Isabelle/DOF can either use HOL or one of the many formal methods that have been embedded into Isabelle/HOL to express formal parts of their documents.</div><div>In this paper, we extend Isabelle/DOF with annotations of <figure><img></figure>-terms, a pervasive data-structure underlying Isabelle to syntactically represent expressions and formulas. We achieve this by using Higher-order Logic (HOL) itself for query-expressions and data-constraints (ontological invariants) executed via code-generation and reflection. Moreover, we add support for <em>parametric</em> ontological classes, thus exploiting HOL's polymorphic type system.</div><div>The benefits are: First, the HOL representation allows for flexible and efficient run-time checking of abstract properties of formal content under evolution. Second, it is possible to prove properties over generic ontological classes. We demonstrate these new features by a number of smaller ontologies from various domains and a case study using a substantial ontology for formal system development targeting certification according to CENELEC 50128.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103231"},"PeriodicalIF":1.5,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142702236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.scico.2024.103233
Mengwei Xu , Blair Archibald , Michele Sevegnani
We present CAN-Verify, an automated tool for analysing BDI agents written in the Conceptual Agent Notation (Can) language. CAN-Verify includes support for syntactic error detection before agent execution, agent program interpretation (running agents), and model-checking of agent programs (analysing agents). The model checking supports verifying the correctness of agents against both generic agent requirements, such as if a task is accomplished, and user-defined requirements, such as certain beliefs eventually holding. The latter can be expressed in structured natural language, allowing the tool to be used by agent programmers without formal training in the underlying verification techniques.
{"title":"CAN-Verify: Automated analysis for BDI agents","authors":"Mengwei Xu , Blair Archibald , Michele Sevegnani","doi":"10.1016/j.scico.2024.103233","DOIUrl":"10.1016/j.scico.2024.103233","url":null,"abstract":"<div><div>We present <span>CAN-Verify</span>, an automated tool for analysing BDI agents written in the Conceptual Agent Notation (<span>Can</span>) language. <span>CAN-Verify</span> includes support for syntactic error detection before agent execution, agent program interpretation (running agents), and model-checking of agent programs (analysing agents). The model checking supports verifying the correctness of agents against both generic agent requirements, such as if a task is accomplished, and user-defined requirements, such as certain beliefs eventually holding. The latter can be expressed in structured natural language, allowing the tool to be used by agent programmers without formal training in the underlying verification techniques.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103233"},"PeriodicalIF":1.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142702234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-13DOI: 10.1016/j.scico.2024.103230
Erwan Mahe , Boutheina Bannour , Christophe Gaston , Pascale Le Gall
Runtime Verification (RV) refers to a family of techniques in which system executions are observed and confronted to formal specifications, with the aim of identifying faults. In offline RV, observation and verification are done in two separate and successive steps. In this paper, we define an approach to offline RV of Distributed Systems (DS) against interactions. Interactions are formal models describing communications within a DS. A DS is composed of subsystems deployed on different machines and interacting via message passing to achieve common goals. Therefore, observing executions of a DS entails logging a collection of local execution traces, one for each subsystem, collected on its host machine. We call multi-trace such observational artifacts. A major challenge in analyzing multi-traces is that there are no practical means to synchronize the ends of observations of all the local traces. We address this via an operation called lifeline removal, which we apply on-the-fly to the specification during the verification of a multi-trace once a local trace has been entirely analyzed. This operation removes from the interaction the specification of actions occurring on the subsystem that is no longer observed. This may allow further execution of the specification by removing potential deadlock. We prove the correctness of the resulting RV algorithm and introduce two optimization techniques, which we also prove correct. We implement a Partial Order Reduction (POR) technique by selecting a one-unambiguous action (as a unique first step to a linearization) whose existence is determined via the lifeline removal operator. Additionally, Local Analyses (LOC), i.e., the verification of local traces, can be leveraged during the global multi-trace analysis to prove failure more quickly. Experiments illustrate the application of our RV approach and the benefits of our optimizations.
{"title":"Efficient interaction-based offline runtime verification of distributed systems with lifeline removal","authors":"Erwan Mahe , Boutheina Bannour , Christophe Gaston , Pascale Le Gall","doi":"10.1016/j.scico.2024.103230","DOIUrl":"10.1016/j.scico.2024.103230","url":null,"abstract":"<div><div>Runtime Verification (RV) refers to a family of techniques in which system executions are observed and confronted to formal specifications, with the aim of identifying faults. In offline RV, observation and verification are done in two separate and successive steps. In this paper, we define an approach to offline RV of Distributed Systems (DS) against interactions. Interactions are formal models describing communications within a DS. A DS is composed of subsystems deployed on different machines and interacting via message passing to achieve common goals. Therefore, observing executions of a DS entails logging a collection of local execution traces, one for each subsystem, collected on its host machine. We call <em>multi-trace</em> such observational artifacts. A major challenge in analyzing multi-traces is that there are no practical means to synchronize the ends of observations of all the local traces. We address this via an operation called lifeline removal, which we apply on-the-fly to the specification during the verification of a multi-trace once a local trace has been entirely analyzed. This operation removes from the interaction the specification of actions occurring on the subsystem that is no longer observed. This may allow further execution of the specification by removing potential deadlock. We prove the correctness of the resulting RV algorithm and introduce two optimization techniques, which we also prove correct. We implement a Partial Order Reduction (POR) technique by selecting a one-unambiguous action (as a unique first step to a linearization) whose existence is determined via the lifeline removal operator. Additionally, Local Analyses (LOC), i.e., the verification of local traces, can be leveraged during the global multi-trace analysis to prove failure more quickly. Experiments illustrate the application of our RV approach and the benefits of our optimizations.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103230"},"PeriodicalIF":1.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142702237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12DOI: 10.1016/j.scico.2024.103227
Gerhard Schellhorn, Stefan Bodenmüller, Wolfgang Reif
This paper presents a proof technique for proving refinements for general state-based models of concurrent systems that reduces proving forward simulations to thread-local, step-local proof obligations. The approach has been implemented in our theorem prover KIV, which translates imperative programs to a set of transition rules and generates proof obligations accordingly. Instances of this proof technique should also be applicable to systems specified with ASM rules, B events, or Z operations. To exemplify the proof methodology, we demonstrate it with two case studies. The first verifies linearizability of a lock-free implementation of concurrent hash sets by showing that it refines an abstract concurrent system with atomic operations. The second applies the proof technique to the verification of opacity of Transactional Mutex Locks (TML), a Software Transactional Memory algorithm. Compared to the standard approach of proving a forward simulation directly, both case studies show a significant reduction in proof effort.
本文提出了一种证明技术,用于证明基于状态的一般并发系统模型的完善性,该技术将证明前向模拟简化为线程本地、步本地证明义务。这种方法已在我们的定理证明器 KIV 中实现,该定理证明器将命令式程序转换为一组转换规则,并生成相应的证明义务。这种证明技术的实例也应适用于使用 ASM 规则、B 事件或 Z 操作指定的系统。为了举例说明这种证明方法,我们通过两个案例研究进行了演示。第一个案例验证了并发哈希集合无锁实现的线性化,证明它完善了具有原子操作的抽象并发系统。第二个案例将证明技术应用于验证软件事务内存算法事务互锁(TML)的不透明性。与直接证明前向模拟的标准方法相比,这两项案例研究都显示证明工作量大大减少。
{"title":"Verification of forward simulations with thread-local, step-local proof obligations","authors":"Gerhard Schellhorn, Stefan Bodenmüller, Wolfgang Reif","doi":"10.1016/j.scico.2024.103227","DOIUrl":"10.1016/j.scico.2024.103227","url":null,"abstract":"<div><div>This paper presents a proof technique for proving refinements for general state-based models of concurrent systems that reduces proving forward simulations to thread-local, step-local proof obligations. The approach has been implemented in our theorem prover KIV, which translates imperative programs to a set of transition rules and generates proof obligations accordingly. Instances of this proof technique should also be applicable to systems specified with ASM rules, B events, or Z operations. To exemplify the proof methodology, we demonstrate it with two case studies. The first verifies linearizability of a lock-free implementation of concurrent hash sets by showing that it refines an abstract concurrent system with atomic operations. The second applies the proof technique to the verification of opacity of Transactional Mutex Locks (TML), a Software Transactional Memory algorithm. Compared to the standard approach of proving a forward simulation directly, both case studies show a significant reduction in proof effort.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103227"},"PeriodicalIF":1.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1016/j.scico.2024.103228
Zhiqi Chen , Yuzhou Liu , Lei Liu , Huaxiao Liu , Ren Li , Peng Zhang
When comparing similar APIs, developers tend to distinguish them from the aspects of functional details. At the same time, some important non-functional factors (such as performance, usability, and security) may be ignored or noticed after using the API in the project. This may result in unnecessary errors or extra costs. API-related questions are common on Stack Overflow, and they can give a well-rounded picture of the APIs. This provides us with a rich resource for API comparison. However, although many methods are offered for mining Questions and Answers (Q&As) automatically, they often suffer from two main problems: 1) they only focus on the functional information of APIs; 2) they analyze each text in isolation but ignore the correlations among them. In this paper, we propose an approach based on the pre-training model BERT to mine the non-functional information of APIs from Stack Overflow: we first tease out the correlations among questions, answers as well as corresponding reviews, so that one Q&A can be analyzed as a whole; then, an information extraction model is constructed by fine-tuning BERT with three subtasks—entity identification, aspect classification, and sentiment analysis separately, and we use it to mine the texts in Q&As step by step; finally, we summarize and visualize the results in a user-friendly way, so that developers can understand the information intuitively at the beginning of API selection. We evaluate our approach on 4,456 Q&As collected from Stack Overflow. The results show our approach can identify the correlations among reviews with 90.1% precision, and such information can improve the performance of the data mining process. In addition, the survey on maturers and novices indicates the understandability and helpfulness of our method. Moreover, compared with language models, our method can provide more intuitive and brief information for API comparison in non-functional aspects.
在比较类似的应用程序接口时,开发人员往往会从功能细节方面进行区分。同时,在项目中使用 API 后,一些重要的非功能性因素(如性能、可用性和安全性)可能会被忽略或注意到。这可能会导致不必要的错误或额外成本。在 Stack Overflow 上,与 API 相关的问题很常见,这些问题可以让我们对 API 有一个全面的了解。这为我们提供了丰富的 API 比较资源。然而,尽管有很多方法可以自动挖掘问与答(Q&As),但它们往往存在两个主要问题:1)它们只关注 API 的功能信息;2)它们孤立地分析每个文本,却忽略了它们之间的关联性。在本文中,我们提出了一种基于预训练模型 BERT 的方法,从 Stack Overflow 中挖掘 API 的非功能性信息:首先,我们找出问题、答案以及相应评论之间的关联性,从而将一个 Q&A 作为一个整体进行分析;然后,通过对 BERT 进行微调,分别完成实体识别、方面分类和情感分析三个子任务,构建信息提取模型,并利用该模型逐步挖掘 Q&As 中的文本;最后,我们以用户友好的方式对结果进行总结和可视化,以便开发人员在开始选择 API 时就能直观地了解信息。我们对从 Stack Overflow 收集的 4,456 个 Q&As 进行了评估。结果表明,我们的方法能以 90.1% 的精度识别出评论之间的相关性,而这些信息能提高数据挖掘过程的性能。此外,对成熟用户和新用户的调查表明,我们的方法易于理解,而且很有帮助。此外,与语言模型相比,我们的方法能在非功能方面为 API 比较提供更直观、更简短的信息。
{"title":"API comparison based on the non-functional information mined from Stack Overflow","authors":"Zhiqi Chen , Yuzhou Liu , Lei Liu , Huaxiao Liu , Ren Li , Peng Zhang","doi":"10.1016/j.scico.2024.103228","DOIUrl":"10.1016/j.scico.2024.103228","url":null,"abstract":"<div><div>When comparing similar APIs, developers tend to distinguish them from the aspects of functional details. At the same time, some important non-functional factors (such as performance, usability, and security) may be ignored or noticed after using the API in the project. This may result in unnecessary errors or extra costs. API-related questions are common on Stack Overflow, and they can give a well-rounded picture of the APIs. This provides us with a rich resource for API comparison. However, although many methods are offered for mining Questions and Answers (Q&As) automatically, they often suffer from two main problems: 1) they only focus on the functional information of APIs; 2) they analyze each text in isolation but ignore the correlations among them. In this paper, we propose an approach based on the pre-training model BERT to mine the non-functional information of APIs from Stack Overflow: we first tease out the correlations among questions, answers as well as corresponding reviews, so that one Q&A can be analyzed as a whole; then, an information extraction model is constructed by fine-tuning BERT with three subtasks—entity identification, aspect classification, and sentiment analysis separately, and we use it to mine the texts in Q&As step by step; finally, we summarize and visualize the results in a user-friendly way, so that developers can understand the information intuitively at the beginning of API selection. We evaluate our approach on 4,456 Q&As collected from Stack Overflow. The results show our approach can identify the correlations among reviews with 90.1% precision, and such information can improve the performance of the data mining process. In addition, the survey on maturers and novices indicates the understandability and helpfulness of our method. Moreover, compared with language models, our method can provide more intuitive and brief information for API comparison in non-functional aspects.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103228"},"PeriodicalIF":1.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1016/j.scico.2024.103226
Jan Vermaelen, Tom Holvoet
As autonomous robotic systems integrate into various domains, ensuring their safe operation becomes increasingly crucial. A key challenge is guaranteeing safe decision making for cyber-physical systems, given the inherent complexity and uncertainty of real-world environments.
Tools like Gwendolen, vGOAL, and Tumato enable the use of formal methods to provide guarantees for correct and safe decision making. This paper concerns Tumato, a formal planning framework that generates complete behavior from a declarative specification. Tumato ensures safety by avoiding unsafe actions and states while achieving robustness by considering nondeterministic outcomes of actions. While formal methods claim to manage complexity, provide safety guarantees, and ensure robustness, empirical evaluation is necessary to validate these claims.
This work presents an empirical study comparing the characteristics of various ad hoc behavior planning implementations (developed by participants with diverse levels of experience in computer science), with implementations using Tumato. We investigate the usability of the different approaches and evaluate i) their effectiveness, ii) the achieved safety (guarantees), iii) their robustness in handling uncertainties, and iv) their adaptability, extensibility, and scalability. To our knowledge, this is the first participant-based empirical study of a formal approach for (safe and robust) autonomous behavior.
Our analysis confirms that while ad hoc methods offer some development flexibility, they lack the rigorous safety guarantees provided by formal methods. The study supports the hypothesis that formal methods, as implemented in Tumato, are effective tools for developing safe autonomous systems, particularly in managing complexity and ensuring robust decision making and planning.
{"title":"An empirical evaluation of a formal approach versus ad hoc implementations in robot behavior planning","authors":"Jan Vermaelen, Tom Holvoet","doi":"10.1016/j.scico.2024.103226","DOIUrl":"10.1016/j.scico.2024.103226","url":null,"abstract":"<div><div>As autonomous robotic systems integrate into various domains, ensuring their safe operation becomes increasingly crucial. A key challenge is guaranteeing safe decision making for cyber-physical systems, given the inherent complexity and uncertainty of real-world environments.</div><div>Tools like Gwendolen, vGOAL, and Tumato enable the use of formal methods to provide guarantees for correct and safe decision making. This paper concerns Tumato, a formal planning framework that generates complete behavior from a declarative specification. Tumato ensures safety by avoiding unsafe actions and states while achieving robustness by considering nondeterministic outcomes of actions. While formal methods claim to manage complexity, provide safety guarantees, and ensure robustness, empirical evaluation is necessary to validate these claims.</div><div>This work presents an empirical study comparing the characteristics of various ad hoc behavior planning implementations (developed by participants with diverse levels of experience in computer science), with implementations using Tumato. We investigate the usability of the different approaches and evaluate i) their effectiveness, ii) the achieved safety (guarantees), iii) their robustness in handling uncertainties, and iv) their adaptability, extensibility, and scalability. To our knowledge, this is the first participant-based empirical study of a formal approach for (safe and robust) autonomous behavior.</div><div>Our analysis confirms that while ad hoc methods offer some development flexibility, they lack the rigorous safety guarantees provided by formal methods. The study supports the hypothesis that formal methods, as implemented in Tumato, are effective tools for developing safe autonomous systems, particularly in managing complexity and ensuring robust decision making and planning.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103226"},"PeriodicalIF":1.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}