Pub Date : 2024-04-05DOI: 10.1016/j.scico.2024.103116
Antonio Iannopollo , Inigo Incer , Alberto L. Sangiovanni-Vincentelli
We provide a method to synthesize an LTL Assume/Guarantee (A/G) specification, or contract, as an interconnection of elements from a library, each of which is also represented by an LTL A/G contract. Our approach, based on counterexample-guided inductive synthesis, leverages an off-the-shelf model checker to reason about infinite-length counterexamples and guarantee correctness. To increase scalability, we also introduce a novel concept of specification decomposition, based on contract projections; we show how it can be used to break down our synthesis problem into several simpler tasks, without reducing the size of the solution space. We test our technique on three industry-relevant case studies.
{"title":"Synthesizing LTL contracts from component libraries using rich counterexamples","authors":"Antonio Iannopollo , Inigo Incer , Alberto L. Sangiovanni-Vincentelli","doi":"10.1016/j.scico.2024.103116","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103116","url":null,"abstract":"<div><p>We provide a method to synthesize an LTL Assume/Guarantee (A/G) specification, or contract, as an interconnection of elements from a library, each of which is also represented by an LTL A/G contract. Our approach, based on counterexample-guided inductive synthesis, leverages an off-the-shelf model checker to reason about infinite-length counterexamples and guarantee correctness. To increase scalability, we also introduce a novel concept of specification decomposition, based on contract projections; we show how it can be used to break down our synthesis problem into several simpler tasks, without reducing the size of the solution space. We test our technique on three industry-relevant case studies.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"236 ","pages":"Article 103116"},"PeriodicalIF":1.3,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016764232400039X/pdfft?md5=045427a44a22a0758d49ce6073f362eb&pid=1-s2.0-S016764232400039X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140640999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Debugging is an important task to identify the defects in the software. Especially, logging is an important feature of a software system to record runtime information. Detailed logging allows developers to collect run-time information when they cannot use an interactive debugger, such as continuous integration and web application server cases. However, extensive logging leads to larger execution traces because few instructions can be repeated many times. In our previous work, to record detailed program behavior within limited storage space constraints, we proposed near-omniscient debugging, which is a methodology that records and visualizes an execution trace using fixed size buffers for each observed instruction. In this paper, we evaluate the effectiveness of near-omniscient debugging in recording infected states while reducing the size of execution traces. We conduct experiments on the Defects4J dataset and evaluate the effectiveness based on the completeness, trace size and runtime overhead. The result shows that near-omniscient debugging can completely record infected states for nearly 80 percent of bugs (with a buffer size of 1024 events). The size of execution traces can be reduced by a factor of one thousand for large repetitive executions.
{"title":"Evaluating the effectiveness of size-limited execution trace with near-omniscient debugging","authors":"Kazumasa Shimari , Takashi Ishio , Tetsuya Kanda , Katsuro Inoue","doi":"10.1016/j.scico.2024.103117","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103117","url":null,"abstract":"<div><p>Debugging is an important task to identify the defects in the software. Especially, logging is an important feature of a software system to record runtime information. Detailed logging allows developers to collect run-time information when they cannot use an interactive debugger, such as continuous integration and web application server cases. However, extensive logging leads to larger execution traces because few instructions can be repeated many times. In our previous work, to record detailed program behavior within limited storage space constraints, we proposed near-omniscient debugging, which is a methodology that records and visualizes an execution trace using fixed size buffers for each observed instruction. In this paper, we evaluate the effectiveness of near-omniscient debugging in recording infected states while reducing the size of execution traces. We conduct experiments on the Defects4J dataset and evaluate the effectiveness based on the completeness, trace size and runtime overhead. The result shows that near-omniscient debugging can completely record infected states for nearly 80 percent of bugs (with a buffer size of 1024 events). The size of execution traces can be reduced by a factor of one thousand for large repetitive executions.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"236 ","pages":"Article 103117"},"PeriodicalIF":1.3,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140544006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clone detection provides insight about replicated fragments in a code base. With the rise of multi-language code bases, new techniques addressing cross-language code clone detection enable the analysis of polyglot systems. Such techniques have not yet been applied to the mobile apps' domain, which are naturally polyglot. Native mobile app developers must synchronize their code base in at least two different programming languages. App synchronization is a difficult and time-consuming maintenance task, as features can rapidly diverge between platforms, and feature identification must be performed manually. The end goal of this work is to provide an analysis framework to reduce the impact of app synchronization. A first step in this direction consists in a structural algorithm for cross-language clone detection, called Out of Step, exploiting the idea behind enriched concrete syntax trees. Such trees are used as a common intermediate representation built from programming languages' grammars, to detect similarities between app code bases. Our technique finds code similarities with over 80% for the evaluation of language features, where Type 1-3 clones are manually injected for the analysis of both single- and cross-language cases for Kotlin and Dart. We validate the feasibility and correctness of our approach through the evaluation of the main language constructs for Kotlin and Dart. To validate the effectiveness we use a first case study detecting clones between 12 sorting algorithms across Kotlin and Dart, identifying clone similarities with a precision between 67% and 95%. Finally, we use a corpus of 144 mobile apps implemented in Kotlin and Dart, correctly identifying code similarities for the full application logic.
{"title":"Out of step: Code clone detection for mobile apps across different language codebases","authors":"Stephannie Jimenez , Gordana Rakić , Silvia Takahashi , Nicolás Cardozo","doi":"10.1016/j.scico.2024.103112","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103112","url":null,"abstract":"<div><p>Clone detection provides insight about replicated fragments in a code base. With the rise of multi-language code bases, new techniques addressing cross-language code clone detection enable the analysis of polyglot systems. Such techniques have not yet been applied to the mobile apps' domain, which are naturally polyglot. Native mobile app developers must synchronize their code base in at least two different programming languages. App synchronization is a difficult and time-consuming maintenance task, as features can rapidly diverge between platforms, and feature identification must be performed manually. The end goal of this work is to provide an analysis framework to reduce the impact of app synchronization. A first step in this direction consists in a structural algorithm for cross-language clone detection, called <span>Out of Step</span>, exploiting the idea behind enriched concrete syntax trees. Such trees are used as a common intermediate representation built from programming languages' grammars, to detect similarities between app code bases. Our technique finds code similarities with over 80% for the evaluation of language features, where Type 1-3 clones are manually injected for the analysis of both single- and cross-language cases for Kotlin and Dart. We validate the feasibility and correctness of our approach through the evaluation of the main language constructs for Kotlin and Dart. To validate the effectiveness we use a first case study detecting clones between 12 sorting algorithms across Kotlin and Dart, identifying clone similarities with a precision between 67% and 95%. Finally, we use a corpus of 144 mobile apps implemented in Kotlin and Dart, correctly identifying code similarities for the full application logic.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"236 ","pages":"Article 103112"},"PeriodicalIF":1.3,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167642324000352/pdfft?md5=9fb4baf02297c135f2162257609c5f70&pid=1-s2.0-S0167642324000352-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140540127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-03DOI: 10.1016/j.scico.2024.103113
Joabe Jesus, Augusto Sampaio
Compositional deadlock analysis of process networks is a well-known challenge. We propose a compositional deadlock analysis strategy for timed process networks, more specifically, those obtained from Simulink multi-rate block diagrams. We handle models with both acyclic and cyclic communication graphs. Particularly, the latter naturally happens in Simulink models with feedback, among other kinds of cycles. Since there is no general solution to analyse cyclic models in a compositional way, we explore the use of behavioural patterns that allow the verification to be carried out in a compositional fashion. We represent process networks in tock-CSP, a dialect of CSP that allows modelling time aspects using a special tock event. The verification approach is implemented as a new package in CSP-Prover, a theorem prover for CSP which is itself implemented in Isabelle/HOL. To illustrate the overall approach and, particularly, how it can scale, we consider several variations of an actuation system with increasing complexity. We show that the examples are instances of the client/server and the asynchronous dynamic timed behaviour patterns. These patterns and all verification steps are formalised using CSP-Prover.
{"title":"Local deadlock analysis of Simulink models based on timed behavioural patterns and theorem proving","authors":"Joabe Jesus, Augusto Sampaio","doi":"10.1016/j.scico.2024.103113","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103113","url":null,"abstract":"<div><p>Compositional deadlock analysis of process networks is a well-known challenge. We propose a compositional deadlock analysis strategy for timed process networks, more specifically, those obtained from <span>Simulink</span> multi-rate block diagrams. We handle models with both acyclic and cyclic communication graphs. Particularly, the latter naturally happens in <span>Simulink</span> models with feedback, among other kinds of cycles. Since there is no general solution to analyse cyclic models in a compositional way, we explore the use of behavioural patterns that allow the verification to be carried out in a compositional fashion. We represent process networks in <em><span>tock</span></em>-<em>CSP</em>, a dialect of <em>CSP</em> that allows modelling time aspects using a special tock event. The verification approach is implemented as a new package in <em>CSP</em>-<em>Prover</em>, a theorem prover for <em>CSP</em> which is itself implemented in <em>Isabelle</em>/<em>HOL</em>. To illustrate the overall approach and, particularly, how it can scale, we consider several variations of an actuation system with increasing complexity. We show that the examples are instances of the client/server and the asynchronous dynamic timed behaviour patterns. These patterns and all verification steps are formalised using <em>CSP</em>-<em>Prover</em>.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"236 ","pages":"Article 103113"},"PeriodicalIF":1.3,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140552585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-31DOI: 10.1016/j.scico.2024.103114
Muhammad Iqbal , Muhammad Ijaz , Tehseen Mazhar , Tariq Shahzad , Qamar Abbas , YazeedYasin Ghadi , Wasim Ahmad , Habib Hamam
Context
Effort estimation based on user stories plays a pivotal role in agile software development, where accurate predictions of project efforts are vital for success. While various supervised ML tools attempt to estimate effort, the prevalence of estimation errors presents significant challenges, as evidenced by the CHAOS report by the Standish Group, which highlights incorrect estimations contributing to a substantial percentage of failed agile projects.
Objectives
This research delves into the domain of user story-based effort estimation in agile software development, aiming to explore the issues arising from inaccurate estimations. The primary goal is to uncover these issues comprehensively and propose potential solutions, thus enhancing the efficacy of the user story-based estimation method.
Methods
To achieve the research objectives, a systematic literature review (SLR) is conducted, surveying a wide range of sources to gather insights into issues surrounding user story-based effort estimation. The review encompasses diverse estimation methods, user story attributes, and the array of challenges that can result from inaccurate estimations.
Results
The SLR reveals a spectrum of issues undermining the accuracy of user story-based effort estimation. It identifies internal factors like communication, team expertise, and composition as crucial determinants of estimation reliability. Consistency in user stories, technical complexities, and task engineering practices also emerge as significant contributors to estimation inaccuracies. The study underscores the interconnectedness of these issues, emphasizing the need for a standardized protocol to minimize inaccuracies and enhance estimation precision.
Conclusion
In light of the findings, it becomes evident that addressing the multi-dimensional factors influencing user story-based effort estimation is imperative for successful agile software development. The study underscores the interplay of various aspects, such as team dynamics, task complexity, and requirement engineering, in achieving accurate estimations. By recognizing these challenges and implementing recommended solutions, software development processes can avoid failures and enhance their prospects of success in the agile paradigm.
背景基于用户故事的工作量估算在敏捷软件开发中起着举足轻重的作用,准确预测项目工作量对项目的成功至关重要。虽然各种有监督的 ML 工具都在尝试估算工作量,但估算错误的普遍存在带来了巨大的挑战,Standish Group 的 CHAOS 报告就证明了这一点,该报告强调不正确的估算导致了很大比例的敏捷项目失败。方法为实现研究目标,我们进行了系统的文献综述(SLR),调查了广泛的资料来源,以收集有关基于用户故事的工作量估算问题的见解。综述内容包括各种估算方法、用户故事属性以及估算不准确可能导致的一系列挑战。它指出,沟通、团队专业知识和组成等内部因素是估算可靠性的关键决定因素。用户故事的一致性、技术复杂性和任务工程实践也是造成估算不准确的重要因素。研究强调了这些问题之间的相互关联性,并强调有必要制定标准化协议,以最大限度地减少估算误差并提高估算精度。这项研究强调了团队动力、任务复杂性和需求工程等多方面因素在实现精确估算过程中的相互作用。认识到这些挑战并实施建议的解决方案,软件开发过程就能避免失败,并提高在敏捷范例中取得成功的前景。
{"title":"Exploring issues of story-based effort estimation in Agile Software Development (ASD)","authors":"Muhammad Iqbal , Muhammad Ijaz , Tehseen Mazhar , Tariq Shahzad , Qamar Abbas , YazeedYasin Ghadi , Wasim Ahmad , Habib Hamam","doi":"10.1016/j.scico.2024.103114","DOIUrl":"10.1016/j.scico.2024.103114","url":null,"abstract":"<div><h3>Context</h3><p>Effort estimation based on user stories plays a pivotal role in agile software development, where accurate predictions of project efforts are vital for success. While various supervised ML tools attempt to estimate effort, the prevalence of estimation errors presents significant challenges, as evidenced by the CHAOS report by the Standish Group, which highlights incorrect estimations contributing to a substantial percentage of failed agile projects.</p></div><div><h3>Objectives</h3><p>This research delves into the domain of user story-based effort estimation in agile software development, aiming to explore the issues arising from inaccurate estimations. The primary goal is to uncover these issues comprehensively and propose potential solutions, thus enhancing the efficacy of the user story-based estimation method.</p></div><div><h3>Methods</h3><p>To achieve the research objectives, a systematic literature review (SLR) is conducted, surveying a wide range of sources to gather insights into issues surrounding user story-based effort estimation. The review encompasses diverse estimation methods, user story attributes, and the array of challenges that can result from inaccurate estimations.</p></div><div><h3>Results</h3><p>The SLR reveals a spectrum of issues undermining the accuracy of user story-based effort estimation. It identifies internal factors like communication, team expertise, and composition as crucial determinants of estimation reliability. Consistency in user stories, technical complexities, and task engineering practices also emerge as significant contributors to estimation inaccuracies. The study underscores the interconnectedness of these issues, emphasizing the need for a standardized protocol to minimize inaccuracies and enhance estimation precision.</p></div><div><h3>Conclusion</h3><p>In light of the findings, it becomes evident that addressing the multi-dimensional factors influencing user story-based effort estimation is imperative for successful agile software development. The study underscores the interplay of various aspects, such as team dynamics, task complexity, and requirement engineering, in achieving accurate estimations. By recognizing these challenges and implementing recommended solutions, software development processes can avoid failures and enhance their prospects of success in the agile paradigm.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"236 ","pages":"Article 103114"},"PeriodicalIF":1.3,"publicationDate":"2024-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140405240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Code smell detection is one of the essential tasks in the field of software engineering. Identifying whether a code snippet has a code smell is subjective and varies by programming language, developer, and development method. Moreover, developers tend to focus on code smells that have a real impact on development and ignore insignificant ones. However, existing static code analysis tools and code smell detection approaches exhibit a high false positive rate in detecting code smells, which makes insignificant smells drown out those smells that developers value. Therefore, accurately reporting those actionable code smells that developers tend to spend energy on refactoring can prevent developers from getting lost in the sea of smells and improve refactoring efficiency. In this paper, we aim to detect actionable code smells that developers tend to refactor. Specifically, we first collect actionable and non-actionable code smells from projects with numerous historical versions to construct our datasets. Then, we propose a dual-stream model for fusion learning of code metrics and code semantics to detect actionable code smells. On the one hand, code metrics quantify the code's structure and even some rules or patterns, providing fundamental information for detecting code smells. On the other hand, code semantics encompass information about developers' refactoring tendencies, which prove valuable in detecting actionable code smells. Extensive experiments show that our approach can detect actionable code smells more accurately compared to existing approaches.
{"title":"Actionable code smell identification with fusion learning of metrics and semantics","authors":"Dongjin Yu, Quanxin Yang, Xin Chen, Jie Chen, Sixuan Wang, Yihang Xu","doi":"10.1016/j.scico.2024.103110","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103110","url":null,"abstract":"<div><p>Code smell detection is one of the essential tasks in the field of software engineering. Identifying whether a code snippet has a code smell is subjective and varies by programming language, developer, and development method. Moreover, developers tend to focus on code smells that have a real impact on development and ignore insignificant ones. However, existing static code analysis tools and code smell detection approaches exhibit a high false positive rate in detecting code smells, which makes insignificant smells drown out those smells that developers value. Therefore, accurately reporting those actionable code smells that developers tend to spend energy on refactoring can prevent developers from getting lost in the sea of smells and improve refactoring efficiency. In this paper, we aim to detect actionable code smells that developers tend to refactor. Specifically, we first collect actionable and non-actionable code smells from projects with numerous historical versions to construct our datasets. Then, we propose a dual-stream model for fusion learning of code metrics and code semantics to detect actionable code smells. On the one hand, code metrics quantify the code's structure and even some rules or patterns, providing fundamental information for detecting code smells. On the other hand, code semantics encompass information about developers' refactoring tendencies, which prove valuable in detecting actionable code smells. Extensive experiments show that our approach can detect actionable code smells more accurately compared to existing approaches.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"236 ","pages":"Article 103110"},"PeriodicalIF":1.3,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140344782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-24DOI: 10.1016/j.scico.2024.103109
Ana Díaz-Muñoz , Moisés Rodríguez , Mario Piattini
Quantum computing is a revolutionary paradigm in computer science based on the principles of quantum mechanics. It has the potential to solve problems that are currently unsolvable for classical computing. Applications of quantum computing already span a variety of sectors.
Ongoing enhancements to the integrated programming and development environment simplify the creation and optimization of quantum algorithms. Ultimately, the focus on supporting tools represents the starting point towards achieving quantum computing maturity, facilitating its transition from an experimental domain to a practical industry.
As quantum software gains ground and relevance in various domains, it is essential to address the evaluation of hybrid systems that combine classical and quantum elements to ensure diverse quality characteristics. However, in the realm of quantum software, models, metrics, and tools are still to be established.
The primary contribution of this paper is to present the first technological environment for measuring and evaluating the analyzability of hybrid software.
Real-world examples of hybrid software are provided to showcase the functionality of the different tools in the environment, yielding readable and representative results for the evaluator.
{"title":"Implementing an environment for hybrid software evaluation","authors":"Ana Díaz-Muñoz , Moisés Rodríguez , Mario Piattini","doi":"10.1016/j.scico.2024.103109","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103109","url":null,"abstract":"<div><p>Quantum computing is a revolutionary paradigm in computer science based on the principles of quantum mechanics. It has the potential to solve problems that are currently unsolvable for classical computing. Applications of quantum computing already span a variety of sectors.</p><p>Ongoing enhancements to the integrated programming and development environment simplify the creation and optimization of quantum algorithms. Ultimately, the focus on supporting tools represents the starting point towards achieving quantum computing maturity, facilitating its transition from an experimental domain to a practical industry.</p><p>As quantum software gains ground and relevance in various domains, it is essential to address the evaluation of hybrid systems that combine classical and quantum elements to ensure diverse quality characteristics. However, in the realm of quantum software, models, metrics, and tools are still to be established.</p><p>The primary contribution of this paper is to present the first technological environment for measuring and evaluating the analyzability of hybrid software.</p><p>Real-world examples of hybrid software are provided to showcase the functionality of the different tools in the environment, yielding readable and representative results for the evaluator.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"236 ","pages":"Article 103109"},"PeriodicalIF":1.3,"publicationDate":"2024-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140350570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-22DOI: 10.1016/j.scico.2024.103111
Mohammad Amin Kuhail , Sujith Samuel Mathew , Ashraf Khalil , Jose Berengueres , Syed Jawad Hussain Shah
ChatGPT is a language model with artificial intelligence (AI) capabilities that has found utility across various sectors. Given its impact, we conducted two empirical studies to assess the potential and limitations of ChatGPT and other AI tools in software development. In the first study, we evaluated ChatGPT 3.5′s effectiveness in generating code for 180 coding problems from LeetCode, an online coding interview preparation platform. Our findings suggest that ChatGPT 3.5 is more effective in solving easy and medium coding problems but less reliable for harder problems. Further, ChatGPT 3.5 is somewhat more effective at coding problems with higher popularity scores. In the second study, we administered a questionnaire (N = 99) to programmers to gain insights into their views on ChatGPT and other AI tools. Our findings indicate that programmers use AI tools for various tasks, such as generating boilerplate code, explaining complex code, and conducting research. AI tools also help programmers to become more productive by creating better-performing, shorter, and more readable code, among other benefits. However, AI tools can sometimes misunderstand requirements and generate erroneous code. While most programmers are not currently concerned about AI tools replacing them, they are apprehensive about what the future may hold. Our research has also revealed associations between AI tool usage, trust, perceived productivity, and job security threats caused by the tools.
{"title":"“Will I be replaced?” Assessing ChatGPT's effect on software development and programmer perceptions of AI tools","authors":"Mohammad Amin Kuhail , Sujith Samuel Mathew , Ashraf Khalil , Jose Berengueres , Syed Jawad Hussain Shah","doi":"10.1016/j.scico.2024.103111","DOIUrl":"https://doi.org/10.1016/j.scico.2024.103111","url":null,"abstract":"<div><p>ChatGPT is a language model with artificial intelligence (AI) capabilities that has found utility across various sectors. Given its impact, we conducted two empirical studies to assess the potential and limitations of ChatGPT and other AI tools in software development. In the first study, we evaluated ChatGPT 3.5′s effectiveness in generating code for 180 coding problems from LeetCode, an online coding interview preparation platform. Our findings suggest that ChatGPT 3.5 is more effective in solving easy and medium coding problems but less reliable for harder problems. Further, ChatGPT 3.5 is somewhat more effective at coding problems with higher popularity scores. In the second study, we administered a questionnaire (<em>N</em> = 99) to programmers to gain insights into their views on ChatGPT and other AI tools. Our findings indicate that programmers use AI tools for various tasks, such as generating boilerplate code, explaining complex code, and conducting research. AI tools also help programmers to become more productive by creating better-performing, shorter, and more readable code, among other benefits. However, AI tools can sometimes misunderstand requirements and generate erroneous code. While most programmers are not currently concerned about AI tools replacing them, they are apprehensive about what the future may hold. Our research has also revealed associations between AI tool usage, trust, perceived productivity, and job security threats caused by the tools.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"235 ","pages":"Article 103111"},"PeriodicalIF":1.3,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140327665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-20DOI: 10.1016/j.scico.2024.103108
Saba Gholizadeh Ansari , I.S.W.B. Prasetya , Mehdi Dastani , Gabriele Keller , Davide Prandi , Fitsum Meshesha Kifetew , Frank Dignum
As video games become more complex and widespread, player experience (PX) testing becomes crucial in the game industry. Attracting and retaining players are key elements to guarantee the success of a game in the highly competitive market. Although a number of techniques have been introduced to measure the emotional aspect of the experience, automated testing of player experience still needs to be explored. This paper presents PX-MBT, a framework for automated player experience testing with emotion pattern verification. PX-MBT (1) utilizes a model-based testing approach for test suite generation, (2) employs a computational model of emotions developed based on a psychological theory of emotions to model players' emotions during game-plays with an intelligent agent, and (3) verifies emotion patterns given by game designers on executed test suites to identify PX-issues. We explain PX-MBT architecture and provide an example along with its result in emotion pattern verification, which asserts the evolution of emotions over time, and heat-maps to showcase the spatial distribution of emotions on the game map.
{"title":"PX-MBT: A framework for model-based player experience testing","authors":"Saba Gholizadeh Ansari , I.S.W.B. Prasetya , Mehdi Dastani , Gabriele Keller , Davide Prandi , Fitsum Meshesha Kifetew , Frank Dignum","doi":"10.1016/j.scico.2024.103108","DOIUrl":"10.1016/j.scico.2024.103108","url":null,"abstract":"<div><p>As video games become more complex and widespread, player experience (PX) testing becomes crucial in the game industry. Attracting and retaining players are key elements to guarantee the success of a game in the highly competitive market. Although a number of techniques have been introduced to measure the emotional aspect of the experience, automated testing of player experience still needs to be explored. This paper presents <span>PX-MBT</span>, a framework for automated player experience testing with emotion pattern verification. <span>PX-MBT</span> (1) utilizes a model-based testing approach for test suite generation, (2) employs a computational model of emotions developed based on a psychological theory of emotions to model players' emotions during game-plays with an intelligent agent, and (3) verifies emotion patterns given by game designers on executed test suites to identify PX-issues. We explain <span>PX-MBT</span> architecture and provide an example along with its result in emotion pattern verification, which asserts the evolution of emotions over time, and heat-maps to showcase the spatial distribution of emotions on the game map.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"236 ","pages":"Article 103108"},"PeriodicalIF":1.3,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167642324000315/pdfft?md5=3feb08ed6c236db63ae3355a5f46a72f&pid=1-s2.0-S0167642324000315-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140280471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The primary aim of Automatic Program Repair (APR) is to automatically repair defective programs, with the intention of reducing the amount of effort required by developers. However, APR techniques may produce overfitting patches that do not truly repair the program, allowing the program to pass all test cases. This paper provides a comprehensive review of the overfitting problem and adds to the existing research on overfitting in conditional statements. Our proposed method, ETPAT (Expression Tree-based Patch Assessment Technique), implements expression trees and targeted coverage criteria to identify differences between the original and the patched program. We utilize ETPAT to verify test case adequacy. In parallel, ETPAT also guides the generation of corresponding test cases via equivalence class information, which may be added to the original test suite, making it more robust while also preventing the repair technique from generating comparable overfitting patches. With reference to the patch set in the BuggyJavaJML benchmark, ETPAT recognized 77/82 (93.9%) overfitting patches out of 120 patches related to conditional constraints, displaying superior accuracy rates and fewer test cases required than the original repair tool.
{"title":"A method to identify overfitting program repair patches based on expression tree","authors":"Yukun Dong, Xiaotong Cheng, Yufei Yang, Lulu Zhang, Shuqi Wang, Lingjie Kong","doi":"10.1016/j.scico.2024.103105","DOIUrl":"10.1016/j.scico.2024.103105","url":null,"abstract":"<div><p>The primary aim of Automatic Program Repair (APR) is to automatically repair defective programs, with the intention of reducing the amount of effort required by developers. However, APR techniques may produce overfitting patches that do not truly repair the program, allowing the program to pass all test cases. This paper provides a comprehensive review of the overfitting problem and adds to the existing research on overfitting in conditional statements. Our proposed method, ETPAT (Expression Tree-based Patch Assessment Technique), implements expression trees and targeted coverage criteria to identify differences between the original and the patched program. We utilize ETPAT to verify test case adequacy. In parallel, ETPAT also guides the generation of corresponding test cases via equivalence class information, which may be added to the original test suite, making it more robust while also preventing the repair technique from generating comparable overfitting patches. With reference to the patch set in the BuggyJavaJML benchmark, ETPAT recognized 77/82 (93.9%) overfitting patches out of 120 patches related to conditional constraints, displaying superior accuracy rates and fewer test cases required than the original repair tool.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"235 ","pages":"Article 103105"},"PeriodicalIF":1.3,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140053861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}