Pub Date : 2024-10-23DOI: 10.1016/j.scico.2024.103225
Lara Bargmann, Heike Wehrheim
Weak memory models describe the semantics of concurrent programs in modern multicore architectures. As these semantics deviate from the commonly assumed model of sequential consistency, reasoning techniques like Owicki-Gries-style proof calculi need to be adapted to specific memory models. To avoid having to design a new proof calculus for every new memory model, a uniform approach for axiomatic reasoning has recently been proposed. This approach bases reasoning on memory-model independent axioms about thread views and how they are changed by program actions like reads and writes. It allows to prove program correctness based on axioms only. Such proofs are valid for all memory models instantiating the axioms.
In this paper, we study instantiations of the axioms for two memory models, the Partial Store Order (PSO) and the Strong Release Acquire (SRA) model. We see that both models fulfil all but one axiom, a different one though. For PSO, the missing axiom refers to message-passing abilities of memory models; for SRA, the missing axiom refers to the independence of actions on executing threads. We discuss the consequences of these missing axioms and illustrate the reasoning technique on a specific litmus test.
{"title":"View-based axiomatic reasoning for the weak memory models PSO and SRA","authors":"Lara Bargmann, Heike Wehrheim","doi":"10.1016/j.scico.2024.103225","DOIUrl":"10.1016/j.scico.2024.103225","url":null,"abstract":"<div><div>Weak memory models describe the semantics of concurrent programs in modern multicore architectures. As these semantics deviate from the commonly assumed model of sequential consistency, reasoning techniques like Owicki-Gries-style proof calculi need to be adapted to specific memory models. To avoid having to design a new proof calculus for every new memory model, a uniform approach for <em>axiomatic</em> reasoning has recently been proposed. This approach bases reasoning on memory-model independent <em>axioms</em> about thread <em>views</em> and how they are changed by program actions like reads and writes. It allows to prove program correctness based on axioms only. Such proofs are valid for all memory models instantiating the axioms.</div><div>In this paper, we study instantiations of the axioms for two memory models, the <em>Partial Store Order</em> (PSO) and the <em>Strong Release Acquire</em> (SRA) model. We see that both models fulfil all but one axiom, a different one though. For PSO, the missing axiom refers to message-passing abilities of memory models; for SRA, the missing axiom refers to the independence of actions on executing threads. We discuss the consequences of these missing axioms and illustrate the reasoning technique on a specific litmus test.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"240 ","pages":"Article 103225"},"PeriodicalIF":1.5,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As chip designs become increasingly complex, the potential for errors and defects in circuits inevitably rises, posing significant challenges to chip security and reliability. This study investigates the use of the SAT-based bounded model checking (BMC) for Propositional Projection Temporal Logic (PPTL) to verify Verilog chip designs at the register transfer level (RTL). To this end, we propose an algorithm to implement automated extraction of state transfer relations from AIGER netlist and construction of Kripke structure. Additionally, we employ PPTL with the full regular expressiveness to describe the circuit properties to be verified, especially the periodic repetitive properties. This is not possible with Linear Temporal Logic (LTL) and Computational Tree Logic (CTL). By combining the PPTL properties with finite system paths and transforming them into conjunctive normal forms (CNFs), we utilize an SAT solver for verification. Experimental results demonstrate that our verification tool, SAT-BMC4PPTL, achieves higher verification efficiency and comprehensiveness.
{"title":"Verifying chip designs at RTL level","authors":"Nan Zhang, Zhijie Xu, Zhenhua Duan, Cong Tian, Wu Wang, Chaofeng Yu","doi":"10.1016/j.scico.2024.103224","DOIUrl":"10.1016/j.scico.2024.103224","url":null,"abstract":"<div><div>As chip designs become increasingly complex, the potential for errors and defects in circuits inevitably rises, posing significant challenges to chip security and reliability. This study investigates the use of the SAT-based bounded model checking (BMC) for Propositional Projection Temporal Logic (PPTL) to verify Verilog chip designs at the register transfer level (RTL). To this end, we propose an algorithm to implement automated extraction of state transfer relations from AIGER netlist and construction of Kripke structure. Additionally, we employ PPTL with the full regular expressiveness to describe the circuit properties to be verified, especially the periodic repetitive properties. This is not possible with Linear Temporal Logic (LTL) and Computational Tree Logic (CTL). By combining the PPTL properties with finite system paths and transforming them into conjunctive normal forms (CNFs), we utilize an SAT solver for verification. Experimental results demonstrate that our verification tool, SAT-BMC4PPTL, achieves higher verification efficiency and comprehensiveness.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"240 ","pages":"Article 103224"},"PeriodicalIF":1.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142532656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In our previous work, we have developed and tested different visualizations that help analyze fork ecosystems. Our goal is to contribute analyses and tools that support developers as well as researchers in obtaining a better understanding of what happens within such ecosystems. In this article, we focus on the tool implementation of our most recent visualizations, which can help users to better understand the relations between and activities within forks. Since fork ecosystems are widely used in practice and well established research subjects, we hope that our tooling constitutes a helpful means for other researchers, too.
{"title":"VisFork: Towards a toolsuite for visualizing fork ecosystems","authors":"Siyue Chen , Loek Cleophas , Sandro Schulze , Jacob Krüger","doi":"10.1016/j.scico.2024.103223","DOIUrl":"10.1016/j.scico.2024.103223","url":null,"abstract":"<div><div>In our previous work, we have developed and tested different visualizations that help analyze fork ecosystems. Our goal is to contribute analyses and tools that support developers as well as researchers in obtaining a better understanding of what happens within such ecosystems. In this article, we focus on the tool implementation of our most recent visualizations, which can help users to better understand the relations between and activities within forks. Since fork ecosystems are widely used in practice and well established research subjects, we hope that our tooling constitutes a helpful means for other researchers, too.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"241 ","pages":"Article 103223"},"PeriodicalIF":1.5,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-18DOI: 10.1016/j.scico.2024.103222
José Proença , Luc Edixhoven
We present : a programming framework for computer-aided design of structural operational semantics for formal models. This framework includes a set of Scala libraries and a workflow to produce visual and interactive diagrams that animate and provide insights over the structure and the semantics of a given abstract model with operational rules.
follows an approach where theoretical foundations and a practical tool are built together, as an alternative to foundations-first design (“tool justifies theory”) or tool-first design (“foundations justify practice”). The advantage of is that the tool-under-development can immediately be used to automatically run numerous and sizeable examples in order to identify subtle mistakes, unexpected outcomes, and unforeseen limitations in the foundations-under-development, as early as possible.
More concretely, supports the quick creation of interactive websites that help the end-users better understand a new language, structure, or analysis. End-users can be research colleagues trying to understand a companion paper or students learning about a new simple language or operational semantics. We include a list of open-source projects with a web frontend supported by that are used both in research and teaching contexts.
{"title":"The CAOS framework for Scala: Computer-aided design of SOS","authors":"José Proença , Luc Edixhoven","doi":"10.1016/j.scico.2024.103222","DOIUrl":"10.1016/j.scico.2024.103222","url":null,"abstract":"<div><div>We present <figure><img></figure>: a programming framework for <em>computer-aided design of structural operational semantics for formal models</em>. This framework includes a set of Scala libraries and a workflow to produce visual and interactive diagrams that animate and provide insights over the structure and the semantics of a given abstract model with operational rules.</div><div><figure><img></figure> follows an approach where theoretical foundations and a practical tool are built together, as an alternative to foundations-first design (“tool justifies theory”) or tool-first design (“foundations justify practice”). The advantage of <figure><img></figure> is that the tool-under-development can immediately be used to automatically run numerous and sizeable examples in order to identify subtle mistakes, unexpected outcomes, and unforeseen limitations in the foundations-under-development, as early as possible.</div><div>More concretely, <figure><img></figure> supports the quick creation of interactive websites that help the end-users better understand a new language, structure, or analysis. End-users can be research colleagues trying to understand a companion paper or students learning about a new simple language or operational semantics. We include a list of open-source projects with a web frontend supported by <figure><img></figure> that are used both in research and teaching contexts.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"240 ","pages":"Article 103222"},"PeriodicalIF":1.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142532754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-17DOI: 10.1016/j.scico.2024.103221
Juliane Päßler , Maurice H. ter Beek , Ferruccio Damiani , Einar Broch Johnsen , S. Lizeth Tapia Tarifa
Self-adaptation, meant to increase reliability, is a crucial feature of cyber-physical systems operating in uncertain physical environments. Ensuring safety properties of self-adaptive systems is of utter importance, especially when operating in remote environments where communication with a human operator is limited, like under water or in space. This paper presents a software model that allows the analysis of one such self-adaptive system, a configurable underwater robot used for pipeline inspection, by means of the probabilistic model checker ProFeat. Furthermore, it shows that the configurable software model is easily extensible to further, possibly more complex use cases and analyses.
{"title":"A Configurable Software Model of a Self-Adaptive Robotic System","authors":"Juliane Päßler , Maurice H. ter Beek , Ferruccio Damiani , Einar Broch Johnsen , S. Lizeth Tapia Tarifa","doi":"10.1016/j.scico.2024.103221","DOIUrl":"10.1016/j.scico.2024.103221","url":null,"abstract":"<div><div>Self-adaptation, meant to increase reliability, is a crucial feature of cyber-physical systems operating in uncertain physical environments. Ensuring safety properties of self-adaptive systems is of utter importance, especially when operating in remote environments where communication with a human operator is limited, like under water or in space. This paper presents a software model that allows the analysis of one such self-adaptive system, a configurable underwater robot used for pipeline inspection, by means of the probabilistic model checker ProFeat. Furthermore, it shows that the configurable software model is easily extensible to further, possibly more complex use cases and analyses.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"240 ","pages":"Article 103221"},"PeriodicalIF":1.5,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142532655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-17DOI: 10.1016/j.scico.2024.103220
Angelo Ferrando , Rafael C. Cardoso
Runtime Verification is a lightweight formal verification technique used to verify whether a system behaves as expected at runtime. Expected behaviour is typically formally specified using properties, which are used to automatically synthesise monitors. Properties that can be verified at runtime by a monitor are called monitorable, while those that cannot are termed non-monitorable. In this paper, we revisit the notion of monitorability and demonstrate how non-monitorable properties can still be used to generate partial monitors. We tackle this from two different perspectives: (i) by recognising that a monitor can give up on monitoring the property under analysis if it recognises that the monitoring will never conclude the satisfaction or violation of the property; (ii) by recognising that a monitor can give up on events that are not necessary for successful monitoring of the property under analysis. By considering these two aspects, we present how to achieve partial monitoring of Linear Temporal Logic properties by building upon the standard monitor construction. Finally, we present a prototype implementation of our approach and its application to a remote inspection case study, as well as a set of evaluation experiments to stress test our approach using synthetic properties.
{"title":"Towards partial monitoring: Never too early to give in","authors":"Angelo Ferrando , Rafael C. Cardoso","doi":"10.1016/j.scico.2024.103220","DOIUrl":"10.1016/j.scico.2024.103220","url":null,"abstract":"<div><div>Runtime Verification is a lightweight formal verification technique used to verify whether a system behaves as expected at runtime. Expected behaviour is typically formally specified using properties, which are used to automatically synthesise monitors. Properties that can be verified at runtime by a monitor are called <em>monitorable</em>, while those that cannot are termed <em>non-monitorable</em>. In this paper, we revisit the notion of monitorability and demonstrate how <em>non-monitorable</em> properties can still be used to generate <em>partial</em> monitors. We tackle this from two different perspectives: (i) by recognising that a monitor can give up on monitoring the property under analysis if it recognises that the monitoring will never conclude the satisfaction or violation of the property; (ii) by recognising that a monitor can give up on events that are not necessary for successful monitoring of the property under analysis. By considering these two aspects, we present how to achieve partial monitoring of Linear Temporal Logic properties by building upon the standard monitor construction. Finally, we present a prototype implementation of our approach and its application to a remote inspection case study, as well as a set of evaluation experiments to stress test our approach using synthetic properties.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"240 ","pages":"Article 103220"},"PeriodicalIF":1.5,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-10DOI: 10.1016/j.scico.2024.103219
Carlos Diego Nascimento Damasceno , Marie-Christine Jakobs , Leen Lambers , Sebastián Uchitel
{"title":"Preface for the special issue on “Selected Papers and Tools of the 26th International Conference on Fundamental Approaches to Software Engineering” (FASE 2023)","authors":"Carlos Diego Nascimento Damasceno , Marie-Christine Jakobs , Leen Lambers , Sebastián Uchitel","doi":"10.1016/j.scico.2024.103219","DOIUrl":"10.1016/j.scico.2024.103219","url":null,"abstract":"","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"240 ","pages":"Article 103219"},"PeriodicalIF":1.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142703567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-04DOI: 10.1016/j.scico.2024.103217
Felipe Ferreira , José Campos
As in the classical computing realm, quantum programming languages in quantum computing allow one to instruct a quantum computer to perform certain tasks. In the last 25 years, many imperative, functional, and multi-paradigm quantum programming languages with different features and goals have been developed. However, to the best of our knowledge, no study has investigated who uses quantum languages, how practitioners learn a quantum language, how experience are practitioners with quantum languages, what is the most used quantum languages, in which context practitioners use quantum languages, what are the challenges faced by quantum practitioners while using quantum languages, are program written with quantum languages tested, and what are quantum practitioners' perspectives on the variety of quantum languages and the potential need for new languages. In this paper, we first conduct a systematic survey to find and collect all quantum languages proposed in the literature and/or by organizations. Secondly, we identify and describe 37 quantum languages. Thirdly, we survey 251 quantum practitioners to answer several research questions about their quantum language usage. Fourthly, we conclude that (i) 58.2% of all practitioners are 25–44 years old, 63.0% have a master's or doctoral degree, and 86.2% have more than five years of experience using classical languages. (ii) 60.6% of practitioners learn quantum languages from the official documentation. (iii) Only 16.3% of practitioners have more than five years of experience with quantum languages. (iv) Qiskit (Python) is the most used quantum language, followed by Cirq (Python) and QDK (Q#). (v) 42.8% use quantum languages for research. (vi) Lack of documentation and usage examples are practitioners' most challenging issues. Practitioners prefer open-source quantum languages with an easy-to-learn syntax (e.g., based on an existing classical language), available documentation and examples, and an active community. (vii) 76.4% of all participants test their quantum programs, and 42.6% test them automatically. (viii) A standard quantum language, perhaps high-level language, for quantum computation could accelerate the development of quantum programs. Finally, we present a set of suggestions for developers and researchers on the development of new quantum languages or enhancement of existing ones.
{"title":"An exploratory study on the usage of quantum programming languages","authors":"Felipe Ferreira , José Campos","doi":"10.1016/j.scico.2024.103217","DOIUrl":"10.1016/j.scico.2024.103217","url":null,"abstract":"<div><div>As in the classical computing realm, quantum programming languages in quantum computing allow one to instruct a quantum computer to perform certain tasks. In the last 25 years, many imperative, functional, and multi-paradigm quantum programming languages with different features and goals have been developed. However, to the best of our knowledge, no study has investigated who uses quantum languages, how practitioners learn a quantum language, how experience are practitioners with quantum languages, what is the most used quantum languages, in which context practitioners use quantum languages, what are the challenges faced by quantum practitioners while using quantum languages, are program written with quantum languages tested, and what are quantum practitioners' perspectives on the variety of quantum languages and the potential need for new languages. In this paper, we first conduct a systematic survey to find and collect all quantum languages proposed in the literature and/or by organizations. Secondly, we identify and describe 37 quantum languages. Thirdly, we survey 251 quantum practitioners to answer several research questions about their quantum language usage. Fourthly, we conclude that (i) 58.2% of all practitioners are 25–44 years old, 63.0% have a master's or doctoral degree, and 86.2% have more than five years of experience using classical languages. (ii) 60.6% of practitioners learn quantum languages from the official documentation. (iii) Only 16.3% of practitioners have more than five years of experience with quantum languages. (iv) Qiskit (Python) is the most used quantum language, followed by Cirq (Python) and QDK (Q#). (v) 42.8% use quantum languages for research. (vi) Lack of documentation and usage examples are practitioners' most challenging issues. Practitioners prefer open-source quantum languages with an easy-to-learn syntax (e.g., based on an existing classical language), available documentation and examples, and an active community. (vii) 76.4% of all participants test their quantum programs, and 42.6% test them automatically. (viii) A standard quantum language, perhaps high-level language, for quantum computation could accelerate the development of quantum programs. Finally, we present a set of suggestions for developers and researchers on the development of new quantum languages or enhancement of existing ones.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"240 ","pages":"Article 103217"},"PeriodicalIF":1.5,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present JoT, a testing framework for Microservice Architectures (MSAs) based on technology agnosticism, a core principle of microservices. The main advantage of JoT is that it reduces the amount of work for a) testing for MSAs whose services use different technology stacks, b) writing tests that involve multiple services, and c) reusing tests of the same MSA under different deployment configurations or after changing some of its components. In JoT, tests are orchestrators that can both consume or offer operations from/to the MSA under test. The language for writing JoT tests is Jolie, which provides constructs that support technology agnosticism and the definition of terse test behaviours.
{"title":"JoT: A Jolie framework for testing microservices","authors":"Saverio Giallorenzo , Fabrizio Montesi , Marco Peressotti , Florian Rademacher , Narongrit Unwerawattana","doi":"10.1016/j.scico.2024.103215","DOIUrl":"10.1016/j.scico.2024.103215","url":null,"abstract":"<div><div>We present JoT, a testing framework for Microservice Architectures (MSAs) based on technology agnosticism, a core principle of microservices. The main advantage of JoT is that it reduces the amount of work for a) testing for MSAs whose services use different technology stacks, b) writing tests that involve multiple services, and c) reusing tests of the same MSA under different deployment configurations or after changing some of its components. In JoT, tests are orchestrators that can both consume or offer operations from/to the MSA under test. The language for writing JoT tests is Jolie, which provides constructs that support technology agnosticism and the definition of terse test behaviours.</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"240 ","pages":"Article 103215"},"PeriodicalIF":1.5,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142441135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-03DOI: 10.1016/j.scico.2024.103218
Chen Yang , Peng Liang , Zinan Ma
Context
Stakeholders constantly make assumptions in the development of deep learning (DL) frameworks. These assumptions are related to various types of software artifacts (e.g., requirements, design decisions, and technical debt) and can turn out to be invalid, leading to system failures. Existing approaches and tools for assumption management usually depend on manual identification of assumptions. However, assumptions are scattered in various sources (e.g., code comments, commits, pull requests, and issues) of DL framework development, and manually identifying assumptions has high costs (e.g., time and resources).
Objective
The objective of the study is to evaluate different classification models for the purpose of identification with respect to assumptions from the point of view of developers and users in the context of DL framework projects (i.e., issues, pull requests, and commits) on GitHub.
Method
First, we constructed a new and largest dataset (i.e., the AssuEval dataset) of assumptions collected from the TensorFlow and Keras repositories on GitHub. Then we explored the performance of seven non-transformers based models (e.g., Support Vector Machine, Classification and Regression Trees), the ALBERT model, and three decoder-only models (i.e., ChatGPT, Claude, and Gemini) for identifying assumptions on the AssuEval dataset.
Results
The study results show that ALBERT achieves the best performance (f1-score: 0.9584) for identifying assumptions on the AssuEval dataset, which is much better than the other models (the 2nd best f1-score is 0.8858, achieved by the Claude 3.5 Sonnet model). Though ChatGPT, Claude, and Gemini are popular models, we do not recommend using them to identify assumptions in DL framework development because of their low performance. Fine-tuning ChatGPT, Claude, Gemini, or other language models (e.g., Llama3, Falcon, and BLOOM) specifically for assumptions might improve their performance for assumption identification.
Conclusions
This study provides researchers with the largest dataset of assumptions for further research (e.g., assumption classification, evaluation, and reasoning) and helps researchers and practitioners better understand assumptions and how to manage them in their projects (e.g., selection of classification models for identifying assumptions).
{"title":"An exploratory study on automatic identification of assumptions in the development of deep learning frameworks","authors":"Chen Yang , Peng Liang , Zinan Ma","doi":"10.1016/j.scico.2024.103218","DOIUrl":"10.1016/j.scico.2024.103218","url":null,"abstract":"<div><h3>Context</h3><div>Stakeholders constantly make assumptions in the development of deep learning (DL) frameworks. These assumptions are related to various types of software artifacts (e.g., requirements, design decisions, and technical debt) and can turn out to be invalid, leading to system failures. Existing approaches and tools for assumption management usually depend on manual identification of assumptions. However, assumptions are scattered in various sources (e.g., code comments, commits, pull requests, and issues) of DL framework development, and manually identifying assumptions has high costs (e.g., time and resources).</div></div><div><h3>Objective</h3><div>The objective of the study is to evaluate different classification models for the purpose of identification with respect to assumptions from the point of view of developers and users in the context of DL framework projects (i.e., issues, pull requests, and commits) on GitHub.</div></div><div><h3>Method</h3><div>First, we constructed a new and largest dataset (i.e., the AssuEval dataset) of assumptions collected from the TensorFlow and Keras repositories on GitHub. Then we explored the performance of seven non-transformers based models (e.g., Support Vector Machine, Classification and Regression Trees), the ALBERT model, and three decoder-only models (i.e., ChatGPT, Claude, and Gemini) for identifying assumptions on the AssuEval dataset.</div></div><div><h3>Results</h3><div>The study results show that ALBERT achieves the best performance (f1-score: 0.9584) for identifying assumptions on the AssuEval dataset, which is much better than the other models (the 2nd best f1-score is 0.8858, achieved by the Claude 3.5 Sonnet model). Though ChatGPT, Claude, and Gemini are popular models, we do not recommend using them to identify assumptions in DL framework development because of their low performance. Fine-tuning ChatGPT, Claude, Gemini, or other language models (e.g., Llama3, Falcon, and BLOOM) specifically for assumptions might improve their performance for assumption identification.</div></div><div><h3>Conclusions</h3><div>This study provides researchers with the largest dataset of assumptions for further research (e.g., assumption classification, evaluation, and reasoning) and helps researchers and practitioners better understand assumptions and how to manage them in their projects (e.g., selection of classification models for identifying assumptions).</div></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"240 ","pages":"Article 103218"},"PeriodicalIF":1.5,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}