Science of Computer Programming最新文献_第2页

Modelling and verifying BDI agents under uncertainty

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-12-03 DOI: 10.1016/j.scico.2024.103254

Blair Archibald , Michele Sevegnani , Mengwei Xu

Belief-Desire-Intention (BDI) agents feature uncertain beliefs (e.g. sensor noise), probabilistic action outcomes (e.g. attempting and action and failing), and non-deterministic choices (e.g. what plan to execute next). To be safely applied in real-world scenarios we need reason about such agents, for example, we need probabilities of mission success and the strategies used to maximise this. Most agents do not currently consider uncertain beliefs, instead a belief either holds or does not. We show how to use epistemic states to model uncertain beliefs, and define a Markov Decision Process for the semantics of the Conceptual Agent Notation (Can) agent language allowing support for uncertain beliefs, non-deterministic event, plan, and intention selection, and probabilistic action outcomes. The model is executable using an automated tool—CAN-verify—that supports error checking, agent simulation, and exhaustive exploration via an encoding to Bigraphs that produces transition systems for probabilistic model checkers such as PRISM. These model checkers allow reasoning over quantitative properties and strategy synthesis. Using the example of an autonomous submarine and drone surveillance together with scalability experiments, we demonstrate our approach supports uncertain belief modelling, quantitative model checking, and strategy synthesis in practice.

{"title":"Modelling and verifying BDI agents under uncertainty","authors":"Blair Archibald , Michele Sevegnani , Mengwei Xu","doi":"10.1016/j.scico.2024.103254","DOIUrl":"10.1016/j.scico.2024.103254","url":null,"abstract":"<div><div>Belief-Desire-Intention (BDI) agents feature uncertain beliefs (e.g. sensor noise), probabilistic action outcomes (e.g. attempting and action and failing), and non-deterministic choices (e.g. what plan to execute next). To be safely applied in real-world scenarios we need reason about such agents, for example, we need probabilities of mission success and the <em>strategies</em> used to maximise this. Most agents do not currently consider uncertain beliefs, instead a belief either holds or does not. We show how to use epistemic states to model uncertain beliefs, and define a Markov Decision Process for the semantics of the Conceptual Agent Notation (<span>Can</span>) agent language allowing support for uncertain beliefs, non-deterministic event, plan, and intention selection, and probabilistic action outcomes. The model is executable using an automated tool—<span>CAN-verify</span>—that supports error checking, agent simulation, and exhaustive exploration via an encoding to Bigraphs that produces transition systems for probabilistic model checkers such as PRISM. These model checkers allow reasoning over quantitative properties and strategy synthesis. Using the example of an autonomous submarine and drone surveillance together with scalability experiments, we demonstrate our approach supports uncertain belief modelling, quantitative model checking, and strategy synthesis in practice.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"242 ","pages":"Article 103254"},"PeriodicalIF":1.5,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143167308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The SGSM framework: Enabling the specification and monitor synthesis of safe driving properties through scene graphs

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-12-02 DOI: 10.1016/j.scico.2024.103252

Trey Woodlief, Felipe Toledo, Sebastian Elbaum, Matthew B. Dwyer

As autonomous vehicles (AVs) become mainstream, assuring that they operate in accordance with safe driving properties becomes paramount. The ability to specify and monitor driving properties is at the center of such assurance. Yet, the mismatch between the semantic space over which typical driving properties are asserted (e.g., vehicles, pedestrians) and the sensed inputs of AVs (e.g., images, point clouds) poses a significant assurance gap. Related efforts bypass this gap by either assuming that data at the right semantic level is available, or they develop bespoke methods for capturing such data. Our recent Scene Graph Safety Monitoring (SGSM) framework addresses this challenge by extracting scene graphs (SGs) from sensor inputs to capture the entities related to the AV, specifying driving properties using a domain-specific language that enables building propositions over those graphs and composing them through temporal logic, and synthesizing monitors to detect property violations. Through this paper we further explain, formalize, analyze, and extend the SGSM framework, producing SGSM++. This extension is significant in that it incorporates the ability for the framework to encode the semantics of resetting a property violation, enabling the framework to count the quantity and duration of violations.

We implemented SGSM++ to monitor for violations of 9 properties of 3 AVs from the CARLA Autonomous Driving Leaderboard, confirming the viability of the framework, which found that the AVs violated 71% of properties during at least one test including almost 1400 unique violations over 30 total test executions, with violations lasting up to 9.25 minutes. Artifact available at https://github.com/less-lab-uva/ExtendingSGSM.

{"title":"The SGSM framework: Enabling the specification and monitor synthesis of safe driving properties through scene graphs","authors":"Trey Woodlief, Felipe Toledo, Sebastian Elbaum, Matthew B. Dwyer","doi":"10.1016/j.scico.2024.103252","DOIUrl":"10.1016/j.scico.2024.103252","url":null,"abstract":"<div><div>As autonomous vehicles (AVs) become mainstream, assuring that they operate in accordance with safe driving properties becomes paramount. The ability to specify and monitor driving properties is at the center of such assurance. Yet, the mismatch between the semantic space over which typical driving properties are asserted (e.g., vehicles, pedestrians) and the sensed inputs of AVs (e.g., images, point clouds) poses a significant assurance gap. Related efforts bypass this gap by either assuming that data at the right semantic level is available, or they develop bespoke methods for capturing such data. Our recent Scene Graph Safety Monitoring (SGSM) framework addresses this challenge by extracting scene graphs (SGs) from sensor inputs to capture the entities related to the AV, specifying driving properties using a domain-specific language that enables building propositions over those graphs and composing them through temporal logic, and synthesizing monitors to detect property violations. Through this paper we further explain, formalize, analyze, and extend the SGSM framework, producing SGSM++. This extension is significant in that it incorporates the ability for the framework to encode the semantics of <em>resetting</em> a property violation, enabling the framework to count the quantity and duration of violations.</div><div>We implemented SGSM++ to monitor for violations of 9 properties of 3 AVs from the CARLA Autonomous Driving Leaderboard, confirming the viability of the framework, which found that the AVs violated 71% of properties during at least one test including almost 1400 unique violations over 30 total test executions, with violations lasting up to 9.25 minutes. Artifact available at <span><span>https://github.com/less-lab-uva/ExtendingSGSM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"242 ","pages":"Article 103252"},"PeriodicalIF":1.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143167306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing the coverage of W-based conformance testing methods over code faults 评估基于 W 的一致性测试方法对代码故障的覆盖率

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-11-22 DOI: 10.1016/j.scico.2024.103234

Khaled El-Fakih , Faiz Hassan , Ayman Alzaatreh , Nina Yevtushenko

We present novel empirical assessments of prominent finite state machine (FSM) conformance test derivation methods against their coverage of code faults. We consider a number of realistic extended FSM examples with their related Java implementations and derive for these examples complete test suites using the W method and its HSI and H derivatives considering the case when the implementation under test (IUT) has the same number of states as the specification FSM. We also consider

W^{+ +}

,

H S I^{+ +}

, and

H^{+ +}

test suites derived considering the case when the IUT can have one more extra state. For each pair of considered test suites, we determine if there is a difference between the pair in covering the implementations faults. If the difference is significant, we determine which test suite outperforms the other. We run two other assessments which show that the obtained results are not due to the size or length of the test suites. In addition, we conduct assessments to determine whether each of the methods has better coverage of certain classes of faults than others and whether the W outperforms the HSI and H methods over only certain classes of faults. The results and outcomes of conducted experiments are summarized. Major artifacts used in the assessments are provided as benchmarks for further studies.

我们针对代码故障的覆盖范围，对著名的有限状态机（FSM）一致性测试推导方法进行了新颖的实证评估。我们考虑了一些现实的扩展 FSM 示例及其相关的 Java 实现，并使用 W 方法及其 HSI 和 H 派生方法为这些示例推导出完整的测试套件，其中考虑了被测实现（IUT）与规范 FSM 具有相同状态数的情况。我们还考虑了 W++、HSI++ 和 H++ 测试套件，它们是在 IUT 可以多一个状态的情况下产生的。对于每一对测试套件，我们都要确定它们在覆盖实现故障方面是否存在差异。如果差异显著，我们将确定哪个测试套件优于另一个。我们还进行了另外两项评估，结果表明所获得的结果与测试套件的大小或长度无关。此外，我们还进行了评估，以确定每种方法对某些类别故障的覆盖率是否优于其他方法，以及 W 方法是否仅在某些类别的故障上优于 HSI 和 H 方法。现对实验结果和成果进行总结。评估中使用的主要工件将作为进一步研究的基准。

{"title":"Assessing the coverage of W-based conformance testing methods over code faults","authors":"Khaled El-Fakih , Faiz Hassan , Ayman Alzaatreh , Nina Yevtushenko","doi":"10.1016/j.scico.2024.103234","DOIUrl":"10.1016/j.scico.2024.103234","url":null,"abstract":"<div><div>We present novel empirical assessments of prominent finite state machine (FSM) conformance test derivation methods against their coverage of code faults. We consider a number of realistic extended FSM examples with their related Java implementations and derive for these examples complete test suites using the <em>W</em> method and its <em>HSI</em> and <em>H</em> derivatives considering the case when the implementation under test (IUT) has the same number of states as the specification FSM. We also consider <span><math><msup><mrow><mi>W</mi></mrow><mrow><mo>+</mo><mo>+</mo></mrow></msup></math></span>, <span><math><mi>H</mi><mi>S</mi><msup><mrow><mi>I</mi></mrow><mrow><mo>+</mo><mo>+</mo></mrow></msup></math></span>, and <span><math><msup><mrow><mi>H</mi></mrow><mrow><mo>+</mo><mo>+</mo></mrow></msup></math></span> test suites derived considering the case when the IUT can have one more extra state. For each pair of considered test suites, we determine if there is a difference between the pair in covering the implementations faults. If the difference is significant, we determine which test suite outperforms the other. We run two other assessments which show that the obtained results are not due to the size or length of the test suites. In addition, we conduct assessments to determine whether each of the methods has better coverage of certain classes of faults than others and whether the <em>W</em> outperforms the <em>HSI</em> and <em>H</em> methods over only certain classes of faults. The results and outcomes of conducted experiments are summarized. Major artifacts used in the assessments are provided as benchmarks for further studies.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103234"},"PeriodicalIF":1.5,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142720058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analysis and formal specification of OpenJDK's BitSet: Proof files OpenJDK BitSet 的分析和形式规范：证明文件

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-11-17 DOI: 10.1016/j.scico.2024.103232

Andy S. Tatman , Hans-Dieter A. Hiep , Stijn de Gouw

This artifact [1] (accompanying our iFM 2023 paper [2]) describes the software we developed that contributed towards our analysis of OpenJDK's BitSet class. This class represents a vector of bits that grows as needed. Our analysis exposed numerous bugs. In our paper, we proposed and compared a number of solutions supported by formal specifications. Full mechanical verification of the BitSet class is not yet possible due to limited support for bitwise operations in KeY and bugs in BitSet. Our artifact contains proofs for a subset of the methods and new proof rules to support bitwise operators.

本文[1]（附带我们的 iFM 2023 论文[2]）介绍了我们为分析 OpenJDK 的 BitSet 类而开发的软件。该类表示根据需要增长的比特向量。我们的分析暴露了许多错误。在我们的论文中，我们提出并比较了一些由形式规范支持的解决方案。由于 KeY 对位运算的支持有限以及 BitSet 中的错误，我们还无法对 BitSet 类进行全面的机械验证。我们的成果包含对部分方法的证明，以及支持位操作的新证明规则。

引用次数: 0

Parametric ontologies in formal software engineering 正规软件工程中的参数本体论

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-11-16 DOI: 10.1016/j.scico.2024.103231

Achim D. Brucker , Idir Ait-Sadoune , Nicolas Méric , Burkhart Wolff

Isabelle/DOF is an ontology framework on top of Isabelle/HOL. It allows for the formal development of ontologies and continuous conformity-checking of integrated documents, including the tracing of typed meta-data of documents. Isabelle/DOF deeply integrates into the Isabelle/HOL ecosystem, allowing to write documents containing (informal) text, executable code, (formal and semiformal) definitions, and proofs. Users of Isabelle/DOF can either use HOL or one of the many formal methods that have been embedded into Isabelle/HOL to express formal parts of their documents.

In this paper, we extend Isabelle/DOF with annotations of

-terms, a pervasive data-structure underlying Isabelle to syntactically represent expressions and formulas. We achieve this by using Higher-order Logic (HOL) itself for query-expressions and data-constraints (ontological invariants) executed via code-generation and reflection. Moreover, we add support for parametric ontological classes, thus exploiting HOL's polymorphic type system.

The benefits are: First, the HOL representation allows for flexible and efficient run-time checking of abstract properties of formal content under evolution. Second, it is possible to prove properties over generic ontological classes. We demonstrate these new features by a number of smaller ontologies from various domains and a case study using a substantial ontology for formal system development targeting certification according to CENELEC 50128.

Isabelle/DOF 是建立在 Isabelle/HOL 基础上的本体框架。它允许本体的正式开发和集成文档的连续一致性检查，包括跟踪文档的类型元数据。Isabelle/DOF 与 Isabelle/HOL 生态系统深度集成，允许编写包含（非正式）文本、可执行代码、（形式化和半形式化）定义和证明的文档。Isabelle/DOF 的用户既可以使用 HOL，也可以使用已嵌入 Isabelle/HOL 的多种形式化方法之一来表达文档的形式化部分。在本文中，我们使用术语注释扩展了 Isabelle/DOF，术语注释是 Isabelle 底层的一种普遍数据结构，用于在语法上表示表达式和公式。为此，我们将高阶逻辑（HOL）本身用于查询表达式，并通过代码生成和反射执行数据约束（本体不变式）。此外，我们还增加了对参数本体类的支持，从而利用了 HOL 的多态类型系统：首先，HOL 表示法允许对演化中形式内容的抽象属性进行灵活高效的运行时检查。其次，可以证明通用本体类的属性。我们通过一些来自不同领域的小型本体论和一个案例研究来展示这些新功能，案例研究使用了一个大型本体论，该本体论用于形式系统开发，目标是根据 CENELEC 50128 进行认证。

{"title":"Parametric ontologies in formal software engineering","authors":"Achim D. Brucker , Idir Ait-Sadoune , Nicolas Méric , Burkhart Wolff","doi":"10.1016/j.scico.2024.103231","DOIUrl":"10.1016/j.scico.2024.103231","url":null,"abstract":"<div><div>Isabelle/DOF is an ontology framework on top of Isabelle/HOL. It allows for the formal development of ontologies and continuous conformity-checking of integrated documents, including the tracing of typed meta-data of documents. Isabelle/DOF deeply integrates into the Isabelle/HOL ecosystem, allowing to write documents containing (informal) text, executable code, (formal and semiformal) definitions, and proofs. Users of Isabelle/DOF can either use HOL or one of the many formal methods that have been embedded into Isabelle/HOL to express formal parts of their documents.</div><div>In this paper, we extend Isabelle/DOF with annotations of <figure><img></figure>-terms, a pervasive data-structure underlying Isabelle to syntactically represent expressions and formulas. We achieve this by using Higher-order Logic (HOL) itself for query-expressions and data-constraints (ontological invariants) executed via code-generation and reflection. Moreover, we add support for <em>parametric</em> ontological classes, thus exploiting HOL's polymorphic type system.</div><div>The benefits are: First, the HOL representation allows for flexible and efficient run-time checking of abstract properties of formal content under evolution. Second, it is possible to prove properties over generic ontological classes. We demonstrate these new features by a number of smaller ontologies from various domains and a case study using a substantial ontology for formal system development targeting certification according to CENELEC 50128.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103231"},"PeriodicalIF":1.5,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142702236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CAN-Verify: Automated analysis for BDI agents CAN-Verify：BDI 代理的自动分析

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-11-15 DOI: 10.1016/j.scico.2024.103233

Mengwei Xu , Blair Archibald , Michele Sevegnani

We present CAN-Verify, an automated tool for analysing BDI agents written in the Conceptual Agent Notation (Can) language. CAN-Verify includes support for syntactic error detection before agent execution, agent program interpretation (running agents), and model-checking of agent programs (analysing agents). The model checking supports verifying the correctness of agents against both generic agent requirements, such as if a task is accomplished, and user-defined requirements, such as certain beliefs eventually holding. The latter can be expressed in structured natural language, allowing the tool to be used by agent programmers without formal training in the underlying verification techniques.

我们介绍的 CAN-Verify 是一种自动工具，用于分析用概念代理符号（Can）语言编写的 BDI 代理。CAN-Verify 支持代理执行前的语法错误检测、代理程序解释（运行代理）和代理程序的模型检查（分析代理）。模型检查支持根据一般的代理要求（如是否完成任务）和用户定义的要求（如某些信念最终成立）来验证代理的正确性。后者可以用结构化的自然语言来表达，因此无需接受过底层验证技术正式培训的代理程序员也能使用该工具。

引用次数: 0

Efficient interaction-based offline runtime verification of distributed systems with lifeline removal 基于交互的高效分布式系统离线运行时验证与生命线移除

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-11-13 DOI: 10.1016/j.scico.2024.103230

Erwan Mahe , Boutheina Bannour , Christophe Gaston , Pascale Le Gall

Runtime Verification (RV) refers to a family of techniques in which system executions are observed and confronted to formal specifications, with the aim of identifying faults. In offline RV, observation and verification are done in two separate and successive steps. In this paper, we define an approach to offline RV of Distributed Systems (DS) against interactions. Interactions are formal models describing communications within a DS. A DS is composed of subsystems deployed on different machines and interacting via message passing to achieve common goals. Therefore, observing executions of a DS entails logging a collection of local execution traces, one for each subsystem, collected on its host machine. We call multi-trace such observational artifacts. A major challenge in analyzing multi-traces is that there are no practical means to synchronize the ends of observations of all the local traces. We address this via an operation called lifeline removal, which we apply on-the-fly to the specification during the verification of a multi-trace once a local trace has been entirely analyzed. This operation removes from the interaction the specification of actions occurring on the subsystem that is no longer observed. This may allow further execution of the specification by removing potential deadlock. We prove the correctness of the resulting RV algorithm and introduce two optimization techniques, which we also prove correct. We implement a Partial Order Reduction (POR) technique by selecting a one-unambiguous action (as a unique first step to a linearization) whose existence is determined via the lifeline removal operator. Additionally, Local Analyses (LOC), i.e., the verification of local traces, can be leveraged during the global multi-trace analysis to prove failure more quickly. Experiments illustrate the application of our RV approach and the benefits of our optimizations.

运行时验证（Runtime Verification，RV）指的是一系列技术，在这些技术中，对系统执行情况进行观察，并与正式规范进行对抗，目的是找出故障。在离线 RV 中，观察和验证是分两个步骤连续进行的。在本文中，我们定义了一种针对交互的分布式系统（DS）离线 RV 方法。交互是描述分布式系统内部通信的正式模型。分布式系统由部署在不同机器上的子系统组成，通过消息传递进行交互以实现共同目标。因此，观察 DS 的执行情况需要记录本地执行跟踪的集合，每个子系统在其主机上收集一个跟踪。我们称多跟踪为观察工件。分析多轨迹的一个主要挑战是，没有切实可行的方法来同步所有本地轨迹的观测结束时间。我们通过一种名为 "生命线移除 "的操作来解决这一问题，在验证多轨迹时，一旦本地轨迹完全分析完毕，我们就会对规范进行即时应用。该操作会从交互中删除不再观察到的子系统上发生的操作规范。这可以通过消除潜在的死锁来进一步执行规范。我们证明了由此产生的 RV 算法的正确性，并引入了两种优化技术，也证明了它们的正确性。我们通过选择一个不明确的动作（作为线性化的唯一第一步）来实现部分阶次缩减（POR）技术，该动作的存在是通过生命线移除算子确定的。此外，在全局多轨迹分析过程中，还可以利用局部分析（LOC），即局部轨迹验证，来更快地证明故障。实验说明了我们的 RV 方法的应用和优化的好处。

{"title":"Efficient interaction-based offline runtime verification of distributed systems with lifeline removal","authors":"Erwan Mahe , Boutheina Bannour , Christophe Gaston , Pascale Le Gall","doi":"10.1016/j.scico.2024.103230","DOIUrl":"10.1016/j.scico.2024.103230","url":null,"abstract":"<div><div>Runtime Verification (RV) refers to a family of techniques in which system executions are observed and confronted to formal specifications, with the aim of identifying faults. In offline RV, observation and verification are done in two separate and successive steps. In this paper, we define an approach to offline RV of Distributed Systems (DS) against interactions. Interactions are formal models describing communications within a DS. A DS is composed of subsystems deployed on different machines and interacting via message passing to achieve common goals. Therefore, observing executions of a DS entails logging a collection of local execution traces, one for each subsystem, collected on its host machine. We call <em>multi-trace</em> such observational artifacts. A major challenge in analyzing multi-traces is that there are no practical means to synchronize the ends of observations of all the local traces. We address this via an operation called lifeline removal, which we apply on-the-fly to the specification during the verification of a multi-trace once a local trace has been entirely analyzed. This operation removes from the interaction the specification of actions occurring on the subsystem that is no longer observed. This may allow further execution of the specification by removing potential deadlock. We prove the correctness of the resulting RV algorithm and introduce two optimization techniques, which we also prove correct. We implement a Partial Order Reduction (POR) technique by selecting a one-unambiguous action (as a unique first step to a linearization) whose existence is determined via the lifeline removal operator. Additionally, Local Analyses (LOC), i.e., the verification of local traces, can be leveraged during the global multi-trace analysis to prove failure more quickly. Experiments illustrate the application of our RV approach and the benefits of our optimizations.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103230"},"PeriodicalIF":1.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142702237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Verification of forward simulations with thread-local, step-local proof obligations 用线程本地、步骤本地证明义务验证前向模拟

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-11-12 DOI: 10.1016/j.scico.2024.103227

Gerhard Schellhorn, Stefan Bodenmüller, Wolfgang Reif

This paper presents a proof technique for proving refinements for general state-based models of concurrent systems that reduces proving forward simulations to thread-local, step-local proof obligations. The approach has been implemented in our theorem prover KIV, which translates imperative programs to a set of transition rules and generates proof obligations accordingly. Instances of this proof technique should also be applicable to systems specified with ASM rules, B events, or Z operations. To exemplify the proof methodology, we demonstrate it with two case studies. The first verifies linearizability of a lock-free implementation of concurrent hash sets by showing that it refines an abstract concurrent system with atomic operations. The second applies the proof technique to the verification of opacity of Transactional Mutex Locks (TML), a Software Transactional Memory algorithm. Compared to the standard approach of proving a forward simulation directly, both case studies show a significant reduction in proof effort.

本文提出了一种证明技术，用于证明基于状态的一般并发系统模型的完善性，该技术将证明前向模拟简化为线程本地、步本地证明义务。这种方法已在我们的定理证明器 KIV 中实现，该定理证明器将命令式程序转换为一组转换规则，并生成相应的证明义务。这种证明技术的实例也应适用于使用 ASM 规则、B 事件或 Z 操作指定的系统。为了举例说明这种证明方法，我们通过两个案例研究进行了演示。第一个案例验证了并发哈希集合无锁实现的线性化，证明它完善了具有原子操作的抽象并发系统。第二个案例将证明技术应用于验证软件事务内存算法事务互锁（TML）的不透明性。与直接证明前向模拟的标准方法相比，这两项案例研究都显示证明工作量大大减少。

引用次数: 0

API comparison based on the non-functional information mined from Stack Overflow 基于从 Stack Overflow 挖掘出的非功能信息进行 API 比较

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-11-06 DOI: 10.1016/j.scico.2024.103228

Zhiqi Chen , Yuzhou Liu , Lei Liu , Huaxiao Liu , Ren Li , Peng Zhang

When comparing similar APIs, developers tend to distinguish them from the aspects of functional details. At the same time, some important non-functional factors (such as performance, usability, and security) may be ignored or noticed after using the API in the project. This may result in unnecessary errors or extra costs. API-related questions are common on Stack Overflow, and they can give a well-rounded picture of the APIs. This provides us with a rich resource for API comparison. However, although many methods are offered for mining Questions and Answers (Q&As) automatically, they often suffer from two main problems: 1) they only focus on the functional information of APIs; 2) they analyze each text in isolation but ignore the correlations among them. In this paper, we propose an approach based on the pre-training model BERT to mine the non-functional information of APIs from Stack Overflow: we first tease out the correlations among questions, answers as well as corresponding reviews, so that one Q&A can be analyzed as a whole; then, an information extraction model is constructed by fine-tuning BERT with three subtasks—entity identification, aspect classification, and sentiment analysis separately, and we use it to mine the texts in Q&As step by step; finally, we summarize and visualize the results in a user-friendly way, so that developers can understand the information intuitively at the beginning of API selection. We evaluate our approach on 4,456 Q&As collected from Stack Overflow. The results show our approach can identify the correlations among reviews with 90.1% precision, and such information can improve the performance of the data mining process. In addition, the survey on maturers and novices indicates the understandability and helpfulness of our method. Moreover, compared with language models, our method can provide more intuitive and brief information for API comparison in non-functional aspects.

在比较类似的应用程序接口时，开发人员往往会从功能细节方面进行区分。同时，在项目中使用 API 后，一些重要的非功能性因素（如性能、可用性和安全性）可能会被忽略或注意到。这可能会导致不必要的错误或额外成本。在 Stack Overflow 上，与 API 相关的问题很常见，这些问题可以让我们对 API 有一个全面的了解。这为我们提供了丰富的 API 比较资源。然而，尽管有很多方法可以自动挖掘问与答（Q&As），但它们往往存在两个主要问题：1）它们只关注 API 的功能信息；2）它们孤立地分析每个文本，却忽略了它们之间的关联性。在本文中，我们提出了一种基于预训练模型 BERT 的方法，从 Stack Overflow 中挖掘 API 的非功能性信息：首先，我们找出问题、答案以及相应评论之间的关联性，从而将一个 Q&A 作为一个整体进行分析；然后，通过对 BERT 进行微调，分别完成实体识别、方面分类和情感分析三个子任务，构建信息提取模型，并利用该模型逐步挖掘 Q&As 中的文本；最后，我们以用户友好的方式对结果进行总结和可视化，以便开发人员在开始选择 API 时就能直观地了解信息。我们对从 Stack Overflow 收集的 4,456 个 Q&As 进行了评估。结果表明，我们的方法能以 90.1% 的精度识别出评论之间的相关性，而这些信息能提高数据挖掘过程的性能。此外，对成熟用户和新用户的调查表明，我们的方法易于理解，而且很有帮助。此外，与语言模型相比，我们的方法能在非功能方面为 API 比较提供更直观、更简短的信息。

{"title":"API comparison based on the non-functional information mined from Stack Overflow","authors":"Zhiqi Chen , Yuzhou Liu , Lei Liu , Huaxiao Liu , Ren Li , Peng Zhang","doi":"10.1016/j.scico.2024.103228","DOIUrl":"10.1016/j.scico.2024.103228","url":null,"abstract":"<div><div>When comparing similar APIs, developers tend to distinguish them from the aspects of functional details. At the same time, some important non-functional factors (such as performance, usability, and security) may be ignored or noticed after using the API in the project. This may result in unnecessary errors or extra costs. API-related questions are common on Stack Overflow, and they can give a well-rounded picture of the APIs. This provides us with a rich resource for API comparison. However, although many methods are offered for mining Questions and Answers (Q&As) automatically, they often suffer from two main problems: 1) they only focus on the functional information of APIs; 2) they analyze each text in isolation but ignore the correlations among them. In this paper, we propose an approach based on the pre-training model BERT to mine the non-functional information of APIs from Stack Overflow: we first tease out the correlations among questions, answers as well as corresponding reviews, so that one Q&A can be analyzed as a whole; then, an information extraction model is constructed by fine-tuning BERT with three subtasks—entity identification, aspect classification, and sentiment analysis separately, and we use it to mine the texts in Q&As step by step; finally, we summarize and visualize the results in a user-friendly way, so that developers can understand the information intuitively at the beginning of API selection. We evaluate our approach on 4,456 Q&As collected from Stack Overflow. The results show our approach can identify the correlations among reviews with 90.1% precision, and such information can improve the performance of the data mining process. In addition, the survey on maturers and novices indicates the understandability and helpfulness of our method. Moreover, compared with language models, our method can provide more intuitive and brief information for API comparison in non-functional aspects.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103228"},"PeriodicalIF":1.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An empirical evaluation of a formal approach versus ad hoc implementations in robot behavior planning 机器人行为规划中正式方法与临时实施的实证评估

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Science of Computer Programming

Pub Date : 2024-10-31 DOI: 10.1016/j.scico.2024.103226

Jan Vermaelen, Tom Holvoet

As autonomous robotic systems integrate into various domains, ensuring their safe operation becomes increasingly crucial. A key challenge is guaranteeing safe decision making for cyber-physical systems, given the inherent complexity and uncertainty of real-world environments.

Tools like Gwendolen, vGOAL, and Tumato enable the use of formal methods to provide guarantees for correct and safe decision making. This paper concerns Tumato, a formal planning framework that generates complete behavior from a declarative specification. Tumato ensures safety by avoiding unsafe actions and states while achieving robustness by considering nondeterministic outcomes of actions. While formal methods claim to manage complexity, provide safety guarantees, and ensure robustness, empirical evaluation is necessary to validate these claims.

This work presents an empirical study comparing the characteristics of various ad hoc behavior planning implementations (developed by participants with diverse levels of experience in computer science), with implementations using Tumato. We investigate the usability of the different approaches and evaluate i) their effectiveness, ii) the achieved safety (guarantees), iii) their robustness in handling uncertainties, and iv) their adaptability, extensibility, and scalability. To our knowledge, this is the first participant-based empirical study of a formal approach for (safe and robust) autonomous behavior.

Our analysis confirms that while ad hoc methods offer some development flexibility, they lack the rigorous safety guarantees provided by formal methods. The study supports the hypothesis that formal methods, as implemented in Tumato, are effective tools for developing safe autonomous systems, particularly in managing complexity and ensuring robust decision making and planning.

随着自主机器人系统融入各个领域，确保其安全运行变得越来越重要。考虑到现实世界环境固有的复杂性和不确定性，保证网络物理系统的安全决策是一项关键挑战。Gwendolen、vGOAL 和 Tumato 等工具能够使用形式化方法为正确、安全的决策提供保证。本文涉及的 Tumato 是一个形式化规划框架，它能从声明式规范中生成完整的行为。Tumato 通过避免不安全的行为和状态来确保安全性，同时通过考虑行为的非确定性结果来实现稳健性。虽然形式化方法声称可以管理复杂性、提供安全保证并确保稳健性，但要验证这些说法，实证评估是必要的。这项工作介绍了一项实证研究，比较了各种临时行为规划实现（由具有不同计算机科学经验水平的参与者开发）与使用 Tumato 的实现的特点。我们调查了不同方法的可用性，并评估了 i) 它们的有效性，ii) 所实现的安全性（保证），iii) 它们在处理不确定性时的稳健性，以及 iv) 它们的适应性、可扩展性和可伸缩性。我们的分析证实，虽然临时方法提供了一定的开发灵活性，但它们缺乏正式方法所提供的严格安全保证。这项研究支持这样的假设，即在 Tumato 中实施的形式化方法是开发安全自主系统的有效工具，特别是在管理复杂性和确保稳健的决策与规划方面。

{"title":"An empirical evaluation of a formal approach versus ad hoc implementations in robot behavior planning","authors":"Jan Vermaelen, Tom Holvoet","doi":"10.1016/j.scico.2024.103226","DOIUrl":"10.1016/j.scico.2024.103226","url":null,"abstract":"<div><div>As autonomous robotic systems integrate into various domains, ensuring their safe operation becomes increasingly crucial. A key challenge is guaranteeing safe decision making for cyber-physical systems, given the inherent complexity and uncertainty of real-world environments.</div><div>Tools like Gwendolen, vGOAL, and Tumato enable the use of formal methods to provide guarantees for correct and safe decision making. This paper concerns Tumato, a formal planning framework that generates complete behavior from a declarative specification. Tumato ensures safety by avoiding unsafe actions and states while achieving robustness by considering nondeterministic outcomes of actions. While formal methods claim to manage complexity, provide safety guarantees, and ensure robustness, empirical evaluation is necessary to validate these claims.</div><div>This work presents an empirical study comparing the characteristics of various ad hoc behavior planning implementations (developed by participants with diverse levels of experience in computer science), with implementations using Tumato. We investigate the usability of the different approaches and evaluate i) their effectiveness, ii) the achieved safety (guarantees), iii) their robustness in handling uncertainties, and iv) their adaptability, extensibility, and scalability. To our knowledge, this is the first participant-based empirical study of a formal approach for (safe and robust) autonomous behavior.</div><div>Our analysis confirms that while ad hoc methods offer some development flexibility, they lack the rigorous safety guarantees provided by formal methods. The study supports the hypothesis that formal methods, as implemented in Tumato, are effective tools for developing safe autonomous systems, particularly in managing complexity and ensuring robust decision making and planning.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103226"},"PeriodicalIF":1.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0