Pub Date : 2024-09-04DOI: 10.1016/j.jss.2024.112203
Grover’s algorithm is a well-known contribution to quantum computing. It searches one value within an unordered sequence faster than any classical algorithm. A fundamental part of this algorithm is the so-called oracle, a quantum circuit that marks the quantum state corresponding to the desired value. A generalisation of it is the oracle for Amplitude Amplification, that marks multiple desired states. In this work we present a classical algorithm that builds a phase-marking oracle for Amplitude Amplification. This oracle performs a less-than operation, marking states representing natural numbers smaller than a given one. Results of both simulations and experiments are shown to prove its functionality. This less-than oracle implementation works on any number of qubits and does not require any ancilla qubits. Regarding depth, the proposed implementation is compared with the one generated by Qiskit automatic method, Diagonal. We show that the depth of our less-than oracle implementation is always lower. In addition, a comparison with another method for oracle generation in terms of gate count is also conducted to prove the efficiency of our method. The result presented here is part of a research work that aims to achieve reusable quantum operations that can be composed to perform more complex ones. The final aim is to provide Quantum Developers with tools that can be easily integrated in their programs/circuits.
{"title":"Automatic generation of efficient oracles: The less-than case","authors":"","doi":"10.1016/j.jss.2024.112203","DOIUrl":"10.1016/j.jss.2024.112203","url":null,"abstract":"<div><p>Grover’s algorithm is a well-known contribution to quantum computing. It searches one value within an unordered sequence faster than any classical algorithm. A fundamental part of this algorithm is the so-called oracle, a quantum circuit that marks the quantum state corresponding to the desired value. A generalisation of it is the oracle for Amplitude Amplification, that marks multiple desired states. In this work we present a classical algorithm that builds a phase-marking oracle for Amplitude Amplification. This oracle performs a less-than operation, marking states representing natural numbers smaller than a given one. Results of both simulations and experiments are shown to prove its functionality. This less-than oracle implementation works on any number of qubits and does not require any ancilla qubits. Regarding depth, the proposed implementation is compared with the one generated by Qiskit automatic method, <em>Diagonal</em>. We show that the depth of our less-than oracle implementation is always lower. In addition, a comparison with another method for oracle generation in terms of gate count is also conducted to prove the efficiency of our method. The result presented here is part of a research work that aims to achieve reusable quantum operations that can be composed to perform more complex ones. The final aim is to provide Quantum Developers with tools that can be easily integrated in their programs/circuits.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0164121224002474/pdfft?md5=045c01aa1d649973c047ead28c3395fc&pid=1-s2.0-S0164121224002474-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142161758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-02DOI: 10.1016/j.jss.2024.112204
With the recent advancement of Artificial Intelligence (AI) and Large Language Models (LLMs), AI-based code generation tools become a practical solution for software development. GitHub Copilot, the AI pair programmer, utilizes machine learning models trained on a large corpus of code snippets to generate code suggestions using natural language processing. Despite its popularity in software development, there is limited empirical evidence on the actual experiences of practitioners who work with Copilot. To this end, we conducted an empirical study to understand the problems that practitioners face when using Copilot, as well as their underlying causes and potential solutions. We collected data from 473 GitHub issues, 706 GitHub discussions, and 142 Stack Overflow posts. Our results reveal that (1) Operation Issue and Compatibility Issue are the most common problems faced by Copilot users, (2) Copilot Internal Error, Network Connection Error, and Editor/IDE Compatibility Issue are identified as the most frequent causes, and (3) Bug Fixed by Copilot, Modify Configuration/Setting, and Use Suitable Version are the predominant solutions. Based on the results, we discuss the potential areas of Copilot for enhancement, and provide the implications for the Copilot users, the Copilot team, and researchers.
{"title":"Exploring the problems, their causes and solutions of AI pair programming: A study on GitHub and Stack Overflow","authors":"","doi":"10.1016/j.jss.2024.112204","DOIUrl":"10.1016/j.jss.2024.112204","url":null,"abstract":"<div><p>With the recent advancement of Artificial Intelligence (AI) and Large Language Models (LLMs), AI-based code generation tools become a practical solution for software development. GitHub Copilot, the AI pair programmer, utilizes machine learning models trained on a large corpus of code snippets to generate code suggestions using natural language processing. Despite its popularity in software development, there is limited empirical evidence on the actual experiences of practitioners who work with Copilot. To this end, we conducted an empirical study to understand the problems that practitioners face when using Copilot, as well as their underlying causes and potential solutions. We collected data from 473 GitHub issues, 706 GitHub discussions, and 142 Stack Overflow posts. Our results reveal that (1) <em>Operation Issue</em> and <em>Compatibility Issue</em> are the most common problems faced by Copilot users, (2) <em>Copilot Internal Error</em>, <em>Network Connection Error</em>, and <em>Editor/IDE Compatibility Issue</em> are identified as the most frequent causes, and (3) <em>Bug Fixed by Copilot</em>, <em>Modify Configuration/Setting</em>, and <em>Use Suitable Version</em> are the predominant solutions. Based on the results, we discuss the potential areas of Copilot for enhancement, and provide the implications for the Copilot users, the Copilot team, and researchers.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142150765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-31DOI: 10.1016/j.jss.2024.112201
With the deployment of Deep Neural Network (DNN) systems in security-critical fields, more and more researchers are concerned about DNN robustness. Unfortunately, DNNs are vulnerable to adversarial attacks and produce completely wrong outputs. This inspired numerous testing works devoted to improving the adversarial robustness of DNNs. Coverage and uncertainty criteria were proposed to guide sample selections for DNN retraining. However, they are greatly limited to evaluating DNN abnormal behaviors rather than locating the root cause of adversarial vulnerability. This work aims to bridge this gap. We propose an adversarial testing framework, DeepFeature, using robust features. DeepFeature generates robust features related to the model decision-making. It locates the weak features within these features that fail to be transformed by the DNN. They are the main culprits of vulnerability. DeepFeature selects diverse samples containing weak features for adversarial retraining. Our evaluation shows that DeepFeature significantly improves overall robustness, average improved by 77.83%, and individual robustness, average improved by 42.81‰, of the models in adversarial testing. Compared with coverage and uncertainty criteria, these two performances are improved by 3.93% and 15.00% in DeepFeature, respectively. The positive correlation coefficient between DeepFeature and improved robustness can achieve 0.858, and the -value is 0.001.
{"title":"DeepFeature: Guiding adversarial testing for deep neural network systems using robust features","authors":"","doi":"10.1016/j.jss.2024.112201","DOIUrl":"10.1016/j.jss.2024.112201","url":null,"abstract":"<div><p>With the deployment of Deep Neural Network (DNN) systems in security-critical fields, more and more researchers are concerned about DNN robustness. Unfortunately, DNNs are vulnerable to adversarial attacks and produce completely wrong outputs. This inspired numerous testing works devoted to improving the adversarial robustness of DNNs. Coverage and uncertainty criteria were proposed to guide sample selections for DNN retraining. However, they are greatly limited to evaluating DNN abnormal behaviors rather than locating the root cause of adversarial vulnerability. This work aims to bridge this gap. We propose an adversarial testing framework, DeepFeature, using robust features. DeepFeature generates robust features related to the model decision-making. It locates the weak features within these features that fail to be transformed by the DNN. They are the main culprits of vulnerability. DeepFeature selects diverse samples containing weak features for adversarial retraining. Our evaluation shows that DeepFeature significantly improves overall robustness, average improved by 77.83%, and individual robustness, average improved by 42.81‰, of the models in adversarial testing. Compared with coverage and uncertainty criteria, these two performances are improved by 3.93% and 15.00% in DeepFeature, respectively. The positive correlation coefficient between DeepFeature and improved robustness can achieve 0.858, and the <span><math><mi>p</mi></math></span>-value is 0.001.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142167885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-30DOI: 10.1016/j.jss.2024.112196
Software development requires collaborative efforts and consensus among developers, emphasizing the need for effective communication and knowledge sharing within the teams. In line with this, GitHub introduced GitHub Discussions, a collaborative communication feature for the community around an open-source or internal project. The user-friendly interface of Discussions facilitates easy sharing of links to different resources. Maintainers highlight this feature’s significance, enabling users to maintain a well-informed project environment. We hypothesize that link-sharing activities in Discussions contribute to disseminating project knowledge. To investigate this hypothesis, we conducted a mixed-method study combining qualitative and quantitative analysis based on a convenience sample of ten open-source projects. We aimed to gain insight into the scope and intentions behind these shared links. Our findings indicate that link-sharing activities are common in the Discussions. Users share links to resources directly or indirectly related to the project/repository. Discussions users share links to project documentation, source code, and issues, aiming to clarify concepts, provide supplementary information, offer context to questions, and suggest new features. These findings offer insights for project maintainers to understand Discussions usage better and enable the GitHub Engineering team to promote the feature adoption.
{"title":"How are discussions linked? A link analysis study on GitHub Discussions","authors":"","doi":"10.1016/j.jss.2024.112196","DOIUrl":"10.1016/j.jss.2024.112196","url":null,"abstract":"<div><p>Software development requires collaborative efforts and consensus among developers, emphasizing the need for effective communication and knowledge sharing within the teams. In line with this, GitHub introduced GitHub Discussions, a collaborative communication feature for the community around an open-source or internal project. The user-friendly interface of Discussions facilitates easy sharing of links to different resources. Maintainers highlight this feature’s significance, enabling users to maintain a well-informed project environment. We hypothesize that link-sharing activities in Discussions contribute to disseminating project knowledge. To investigate this hypothesis, we conducted a mixed-method study combining qualitative and quantitative analysis based on a convenience sample of ten open-source projects. We aimed to gain insight into the scope and intentions behind these shared links. Our findings indicate that link-sharing activities are common in the Discussions. Users share links to resources directly or indirectly related to the project/repository. Discussions users share links to project documentation, source code, and issues, aiming to clarify concepts, provide supplementary information, offer context to questions, and suggest new features. These findings offer insights for project maintainers to understand Discussions usage better and enable the GitHub Engineering team to promote the feature adoption.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142150764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.1016/j.jss.2024.112188
Legacy software poses a critical challenge for organizations due to the costs of maintaining and modernizing outdated systems, as well as the scarcity of experts in aging programming languages. The issue extends beyond commercial applications, affecting public administration, as exemplified by the urgent need for COBOL programmers during the COVID-19 pandemic. In response, this work introduces a modernization approach based on dynamic language product lines, a subset of dynamic software product lines. This approach leverages open language implementations and dynamically generated micro-languages for the incremental migration of legacy systems to modern technologies. The language can be reconfigured at runtime to adapt to the execution of either legacy or modern code, and to generate a compatibility layer between the data types handled by the two languages. Through this process, the costs of modernizing legacy systems can be spread across several iterations, as developers can replace legacy code incrementally, with legacy and modern code coexisting until a complete refactoring is possible. By moving the overhead of making legacy and modern features work together in a hybrid system from the system implementation to the language implementation, the quality of the system itself does not degrade due to the introduction of glue code. To demonstrate the practical applicability of this approach, we present a case study on a COBOL system migration to Java. Using the Neverlang language workbench to create modular and reconfigurable language implementations, both the COBOL interpreter and the application evolve to spread the development effort across several iterations. Through this study, this work presents a viable solution for organizations dealing with the complexity of modernizing legacy software to contemporary technologies. The contributions of this work are (i) a language-oriented, incremental refactoring process for legacy systems, (ii) a concrete application of open language implementations, and (iii) a general template for the implementation of interoperability between languages in hybrid systems.
{"title":"Software modernization powered by dynamic language product lines","authors":"","doi":"10.1016/j.jss.2024.112188","DOIUrl":"10.1016/j.jss.2024.112188","url":null,"abstract":"<div><p>Legacy software poses a critical challenge for organizations due to the costs of maintaining and modernizing outdated systems, as well as the scarcity of experts in aging programming languages. The issue extends beyond commercial applications, affecting public administration, as exemplified by the urgent need for <span>COBOL</span> programmers during the COVID-19 pandemic. In response, this work introduces a modernization approach based on dynamic language product lines, a subset of dynamic software product lines. This approach leverages open language implementations and dynamically generated micro-languages for the incremental migration of legacy systems to modern technologies. The language can be reconfigured at runtime to adapt to the execution of either legacy or modern code, and to generate a compatibility layer between the data types handled by the two languages. Through this process, the costs of modernizing legacy systems can be spread across several iterations, as developers can replace legacy code incrementally, with legacy and modern code coexisting until a complete refactoring is possible. By moving the overhead of making legacy and modern features work together in a hybrid system from the system implementation to the language implementation, the quality of the system itself does not degrade due to the introduction of glue code. To demonstrate the practical applicability of this approach, we present a case study on a <span>COBOL</span> system migration to <span>Java</span>. Using the <span>Neverlang</span> language workbench to create modular and reconfigurable language implementations, both the <span>COBOL</span> interpreter and the application evolve to spread the development effort across several iterations. Through this study, this work presents a viable solution for organizations dealing with the complexity of modernizing legacy software to contemporary technologies. The contributions of this work are (i) a language-oriented, incremental refactoring process for legacy systems, (ii) a concrete application of open language implementations, and (iii) a general template for the implementation of interoperability between languages in hybrid systems.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0164121224002322/pdfft?md5=087cacb89bcf9dc4a2cefae984eef08a&pid=1-s2.0-S0164121224002322-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-27DOI: 10.1016/j.jss.2024.112187
Context:
The Open Science (OS) movement promotes the value of making public the research artifacts (datasets, analysis scripts, guidelines, etc.) used during empirical studies. OS is widely known in areas such as Medicine or Biology, where the process of sharing research artifacts is subject to strict protocols. Unfortunately, in Software Engineering (SE), this process is carried out in a non-systematic way, resulting in incomplete or inaccurate material shared by researchers, which hinders the reproducibility and replicability of empirical studies. Nevertheless, in recent years, it seems that the Empirical Software Engineering (ESE) community is embracing some of the proposed OS initiatives, such as the one proposed by the Association for Computing Machinery (ACM), which provides a badge system to evaluate the quality of a research artifact. This badge system has been adopted by several SE conferences as a method of assessing research artifacts.
Aims:
Focusing on human-oriented experiments (HOEs) in SE, whose research artifacts are more complex than those for computational experiments, this work applies Design Science Research (DSR) with a twofold purpose: (i) review the current status of HOEs research artifacts publication through evaluation of this practice in the most relevant ESE journals , and (ii) propose a structured outline for HOEs research artifacts driven by the aforementioned ACM badging policy.
Method:
Regarding the first purpose, we carried out a survey to analyze the current status of the publication of research artifacts considering relevant peer review journals and the quality of 106 research artifacts published in these journals with respect to the ACM badging policy. For the second purpose, an iterative process was carried out to review the content of 106 research artifacts research and their concordance with ACM badges, obtaining a structured scheme for HOEs research artifacts that has been validated through a detailed review of 12 research artifacts obtained from some of those of ACM badges in relevant SE conferences. In addition, we validated the proposal in the research artifacts of 2 of our own experiments.
Results:
Our survey reveals issues such as the 39,70% of journal studies making completely accessible their research artifacts; most of the analyzed research artifacts are incomplete; the most common repositories used in the ESE community to share the research artifacts are GitHub, institutional repositories, and Zenodo. On the other hand, the validated and structured research artifact outline consists of a list of ordered sections containing a set of artifacts, which can be mandatory or not to achieve a particular ACM badge. For its internal validation, several improvement iterations on the first release of the outline have been carried out based on the conference guidelines, the ACM badging policy, and other relevant proposals.
背景:开放科学(Open Science,OS)运动提倡将实证研究中使用的研究人工制品(数据集、分析脚本、指南等)公之于众的价值。OS 在医学或生物学等领域广为人知,在这些领域,共享研究成果的过程需要遵守严格的协议。遗憾的是,在软件工程(SE)领域,这一过程是以非系统化的方式进行的,导致研究人员共享的材料不完整或不准确,从而阻碍了实证研究的再现性和可复制性。尽管如此,近年来,实证软件工程(ESE)界似乎正在接受一些建议的操作系统倡议,例如由美国计算机协会(ACM)提出的倡议,该倡议提供了一个徽章系统来评估研究成果的质量。该徽章系统已被多个 SE 会议采用,作为评估研究成果的一种方法。目的:设计科学研究(DSR)的目的有两个:(i) 通过评估在最相关的ESE期刊上发表HOEs研究成果的做法,回顾HOEs研究成果发表的现状;(ii) 根据上述ACM徽章政策,提出HOEs研究成果的结构化纲要。方法:关于第一个目的,我们进行了一项调查,分析了在相关同行评审期刊上发表研究成果的现状,以及根据 ACM 徽章政策在这些期刊上发表的 106 篇研究成果的质量。第二个目的是对 106 项研究成果的研究内容及其与 ACM 徽章的一致性进行反复审查,通过详细审查从相关 SE 会议上的一些 ACM 徽章研究成果中获得的 12 项研究成果,验证了 HOEs 研究成果的结构化方案。结果:我们的调查发现了一些问题,如39.70%的期刊研究可完全访问其研究成果;大多数分析的研究成果不完整;ESE社区中最常用的研究成果共享库是GitHub、机构资源库和Zenodo。另一方面,经过验证的结构化研究工件大纲由一系列有序的部分组成,其中包含一组工件,这些工件可以是获得特定 ACM 徽章的必备工件,也可以是非必备工件。为了对其进行内部验证,我们根据会议指南、ACM 徽章政策和其他相关建议,对第一版大纲进行了多次改进迭代。结论:尽管 ESE 社区在与操作系统相关的标准化、审查和数字出版方面做出了巨大努力,但研究工件的可用性和完整性仍有待提高。我们关于制定结构化研究成果大纲的建议符合 SE 中 HOE 的要求。不过,我们还需要进一步研究,不仅要对其进行改进和外部验证,还要在研究界推广使用。
{"title":"Research artifacts for human-oriented experiments in software engineering: An ACM badges-driven structure proposal","authors":"","doi":"10.1016/j.jss.2024.112187","DOIUrl":"10.1016/j.jss.2024.112187","url":null,"abstract":"<div><h3>Context:</h3><p>The Open Science (OS) movement promotes the value of making public the research artifacts (datasets, analysis scripts, guidelines, etc.) used during empirical studies. OS is widely known in areas such as Medicine or Biology, where the process of sharing research artifacts is subject to strict protocols. Unfortunately, in Software Engineering (SE), this process is carried out in a non-systematic way, resulting in incomplete or inaccurate material shared by researchers, which hinders the reproducibility and replicability of empirical studies. Nevertheless, in recent years, it seems that the Empirical Software Engineering (ESE) community is embracing some of the proposed OS initiatives, such as the one proposed by the Association for Computing Machinery (ACM), which provides a badge system to evaluate the quality of a research artifact. This badge system has been adopted by several SE conferences as a method of assessing research artifacts.</p></div><div><h3>Aims:</h3><p>Focusing on human-oriented experiments (HOEs) in SE, whose research artifacts are more complex than those for computational experiments, this work applies Design Science Research (DSR) with a twofold purpose: (i) review the current status of HOEs research artifacts publication through evaluation of this practice in the most relevant ESE journals , and (ii) propose a structured outline for HOEs research artifacts driven by the aforementioned ACM badging policy.</p></div><div><h3>Method:</h3><p>Regarding the first purpose, we carried out a survey to analyze the current status of the publication of research artifacts considering relevant peer review journals and the quality of 106 research artifacts published in these journals with respect to the ACM badging policy. For the second purpose, an iterative process was carried out to review the content of 106 research artifacts research and their concordance with ACM badges, obtaining a structured scheme for HOEs research artifacts that has been validated through a detailed review of 12 research artifacts obtained from some of those of ACM badges in relevant SE conferences. In addition, we validated the proposal in the research artifacts of 2 of our own experiments.</p></div><div><h3>Results:</h3><p>Our survey reveals issues such as the 39,70% of journal studies making completely accessible their research artifacts; most of the analyzed research artifacts are incomplete; the most common repositories used in the ESE community to share the research artifacts are GitHub, institutional repositories, and Zenodo. On the other hand, the validated and structured research artifact outline consists of a list of ordered sections containing a set of artifacts, which can be mandatory or not to achieve a particular ACM badge. For its internal validation, several improvement iterations on the first release of the outline have been carried out based on the conference guidelines, the ACM badging policy, and other relevant proposals.","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0164121224002310/pdfft?md5=fb4a53f470e5e9d69349ac3f01883dd0&pid=1-s2.0-S0164121224002310-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142087631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-26DOI: 10.1016/j.jss.2024.112189
Fault Localization (FL) is an important and time-consuming phase of software debugging. The essence of FL lies in the process of calculating the suspiciousness of different program entities (e.g., statements) and generating a ranking list to guide developers in their code inspection. Nonetheless, a prevalent challenge within existing FL methodologies is the propensity for program entities with analogous execution information to receive a similar suspiciousness. This phenomenon can lead to confusion among developers, thereby reducing the effectiveness of debugging significantly. To alleviate this issue, we introduce fine-grained contextual information (such as partial code structural, coverage, and features from mutation analysis) to enrich the characteristics of program entities. Graphical structures are proposed to organize such information, where the passed and failed tests are constructed separately with the consideration of their differential impacts. In order to support the analysis of multidimensional features and the representation of large-scale programs, the PageRank algorithm is adopted to compute each program entity’s weight. Rather than altering the fundamental FL process, we leverage these computed weights to refine the suspiciousness produced by various FL techniques, thereby providing developers with a more precise and actionable ranking of potential fault locations. The proposed strategy Graph-Based Suspiciousness Refinement (GBSR) is evaluated on 243 real-world faulty programs from the Defects4J. The results demonstrate that GBSR can improve the accuracy of various FL techniques. Specifically, for the refinement with traditional SBFL and MBFL techniques, the number of faults localized by the first position of the ranking list (-1) is increased by 189% and 68%, respectively. Furthermore, GBSR can also boost the state-of-the-art learning-based FL technique Grace by achieving a 2.8% performance improvement in -1.
{"title":"GBSR: Graph-based suspiciousness refinement for improving fault localization","authors":"","doi":"10.1016/j.jss.2024.112189","DOIUrl":"10.1016/j.jss.2024.112189","url":null,"abstract":"<div><p>Fault Localization (FL) is an important and time-consuming phase of software debugging. The essence of FL lies in the process of calculating the suspiciousness of different program entities (e.g., statements) and generating a ranking list to guide developers in their code inspection. Nonetheless, a prevalent challenge within existing FL methodologies is the propensity for program entities with analogous execution information to receive a similar suspiciousness. This phenomenon can lead to confusion among developers, thereby reducing the effectiveness of debugging significantly. To alleviate this issue, we introduce fine-grained contextual information (such as partial code structural, coverage, and features from mutation analysis) to enrich the characteristics of program entities. Graphical structures are proposed to organize such information, where the passed and failed tests are constructed separately with the consideration of their differential impacts. In order to support the analysis of multidimensional features and the representation of large-scale programs, the PageRank algorithm is adopted to compute each program entity’s weight. Rather than altering the fundamental FL process, we leverage these computed weights to refine the suspiciousness produced by various FL techniques, thereby providing developers with a more precise and actionable ranking of potential fault locations. The proposed strategy Graph-Based Suspiciousness Refinement (GBSR) is evaluated on 243 real-world faulty programs from the Defects4J. The results demonstrate that GBSR can improve the accuracy of various FL techniques. Specifically, for the refinement with traditional SBFL and MBFL techniques, the number of faults localized by the first position of the ranking list (<span><math><mrow><mi>T</mi><mi>o</mi><mi>p</mi></mrow></math></span>-1) is increased by 189% and 68%, respectively. Furthermore, GBSR can also boost the state-of-the-art learning-based FL technique Grace by achieving a 2.8% performance improvement in <span><math><mrow><mi>T</mi><mi>o</mi><mi>p</mi></mrow></math></span>-1.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142098169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-17DOI: 10.1016/j.jss.2024.112186
Context:
A large part of the software produced by many companies and organizations today are web applications. Testing web applications is vital to ensure and maintain the quality of these systems. They play an important role in promoting brands and enabling better communication with customers.
Objective:
There is a gap in existing literature research in recent years regarding testing of web applications, although the landscape of web applications techniques has changed. New methods, frameworks, environments and techniques have recently been used both for developing and testing these applications. This paper presents an overview of the research directions, problems and challenges in the field of Web application testing in the last decade. Our paper investigates current implementation and validation techniques, the quality of existing approaches and reveals areas of incomplete or superficial research.
Methods:
In this paper, a systematic literature review about the state of web application testing has been conducted. Based on initially about 6000 papers, that were extracted from Science Direct, Springer Link, WebOfScience, IEEE Explore and ACM, we used a final number of 72 papers after a filtration process for this literature review. The extracted papers were examined for demographics, problems, techniques and tools. We looked at the quality, the empirical evidence and the test application used for validating the different methods in the extracted papers.
Results:
The most important journals, authors, tools and research directions in this field are discovered, a deep analysis of quality, rigor and empirical evidence is given, and the most important validation applications are described. We found that three groups of authors contributed to more than 25% of the papers, the three most important journals published more than 50% of the papers. 30% of the developed tools were open accessible. Most papers had a good description of study design and threats for validity, but have little industrial relevance. Only 6 papers validated on industrial applications. Only 40% of these papers compared their technique with other existing techniques, and applications used for validation are usually outdated.
Conclusions:
We discuss trends and challenges in research in web application testing. We also show gaps in research and areas that need more attention from the research community. Research in web application testing needs more focus on industrial relevance and scalability to analyze the usability for industry. New techniques should be validated on modern test application frameworks to get comparable results. The results can help researchers to get an overview of publication venues, active researchers, current research gaps and problems in the field.
{"title":"Web application testing—Challenges and opportunities","authors":"","doi":"10.1016/j.jss.2024.112186","DOIUrl":"10.1016/j.jss.2024.112186","url":null,"abstract":"<div><h3>Context:</h3><p>A large part of the software produced by many companies and organizations today are web applications. Testing web applications is vital to ensure and maintain the quality of these systems. They play an important role in promoting brands and enabling better communication with customers.</p></div><div><h3>Objective:</h3><p>There is a gap in existing literature research in recent years regarding testing of web applications, although the landscape of web applications techniques has changed. New methods, frameworks, environments and techniques have recently been used both for developing and testing these applications. This paper presents an overview of the research directions, problems and challenges in the field of Web application testing in the last decade. Our paper investigates current implementation and validation techniques, the quality of existing approaches and reveals areas of incomplete or superficial research.</p></div><div><h3>Methods:</h3><p>In this paper, a systematic literature review about the state of web application testing has been conducted. Based on initially about 6000 papers, that were extracted from Science Direct, Springer Link, WebOfScience, IEEE Explore and ACM, we used a final number of 72 papers after a filtration process for this literature review. The extracted papers were examined for demographics, problems, techniques and tools. We looked at the quality, the empirical evidence and the test application used for validating the different methods in the extracted papers.</p></div><div><h3>Results:</h3><p>The most important journals, authors, tools and research directions in this field are discovered, a deep analysis of quality, rigor and empirical evidence is given, and the most important validation applications are described. We found that three groups of authors contributed to more than 25% of the papers, the three most important journals published more than 50% of the papers. 30% of the developed tools were open accessible. Most papers had a good description of study design and threats for validity, but have little industrial relevance. Only 6 papers validated on industrial applications. Only 40% of these papers compared their technique with other existing techniques, and applications used for validation are usually outdated.</p></div><div><h3>Conclusions:</h3><p>We discuss trends and challenges in research in web application testing. We also show gaps in research and areas that need more attention from the research community. Research in web application testing needs more focus on industrial relevance and scalability to analyze the usability for industry. New techniques should be validated on modern test application frameworks to get comparable results. The results can help researchers to get an overview of publication venues, active researchers, current research gaps and problems in the field.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0164121224002309/pdfft?md5=3ef92544adb97aa90f903887254cfc82&pid=1-s2.0-S0164121224002309-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142167884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-17DOI: 10.1016/j.jss.2024.112185
In recent years, the push to make organizations data-driven has led to data-focused software projects, both in the private and public sectors. The strive for increasing data-driven initiatives introduces a range of new socio-technical challenges, yet there are to date few empirical studies in terms of how data-focused initiatives affect large organizations with significant variations in terms of data needs and usage. This study presents a longitudinal descriptive case study of how data-driven initiatives in the Norwegian public sector cause organizational tensions in a very large, complex organization. We conducted 32 semi-structured interviews over a period of 18 months representing two different data-intensive parts of the organization that had developed incompatible data cultures. Our study shows that these cultural differences create organizational conflicts that hinder data-driven initiatives. The findings also suggest, however, that overcoming these is possible through the strategic, top-down facilitation of a common data-driven culture built on uniting data principles, in turn potentially leading to improved decision-making and enhanced innovation.
{"title":"Towards a common data-driven culture: A longitudinal study of the tensions and emerging solutions involved in becoming data-driven in a large public sector organization","authors":"","doi":"10.1016/j.jss.2024.112185","DOIUrl":"10.1016/j.jss.2024.112185","url":null,"abstract":"<div><p>In recent years, the push to make organizations data-driven has led to data-focused software projects, both in the private and public sectors. The strive for increasing data-driven initiatives introduces a range of new socio-technical challenges, yet there are to date few empirical studies in terms of how data-focused initiatives affect large organizations with significant variations in terms of data needs and usage. This study presents a longitudinal descriptive case study of how data-driven initiatives in the Norwegian public sector cause organizational tensions in a very large, complex organization. We conducted 32 semi-structured interviews over a period of 18 months representing two different data-intensive parts of the organization that had developed incompatible data cultures. Our study shows that these cultural differences create organizational conflicts that hinder data-driven initiatives. The findings also suggest, however, that overcoming these is possible through the strategic, top-down facilitation of a common data-driven culture built on uniting data principles, in turn potentially leading to improved decision-making and enhanced innovation.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142098170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1016/j.jss.2024.112184
Within data-driven artificial intelligence (AI) systems for industrial applications, ensuring the reliability of the incoming data streams is an integral part of trustworthy decision-making. An approach to assess data validity is data quality scoring, which assigns a score to each data point or stream based on various quality dimensions. However, certain dimensions exhibit dynamic qualities, which require adaptation on the basis of the system’s current conditions. Existing methods often overlook this aspect, making them inefficient in dynamic production environments. In this paper, we introduce the Adaptive Data Quality Scoring Operations Framework, a novel framework developed to address the challenges posed by dynamic quality dimensions in industrial data streams. The framework introduces an innovative approach by integrating a dynamic change detector mechanism that actively monitors and adapts to changes in data quality, ensuring the relevance of quality scores. We evaluate the proposed framework performance in a real-world industrial use case. The experimental results reveal high predictive performance and efficient processing time, highlighting its effectiveness in practical quality-driven AI applications.
{"title":"Adaptive data quality scoring operations framework using drift-aware mechanism for industrial applications","authors":"","doi":"10.1016/j.jss.2024.112184","DOIUrl":"10.1016/j.jss.2024.112184","url":null,"abstract":"<div><p>Within data-driven artificial intelligence (AI) systems for industrial applications, ensuring the reliability of the incoming data streams is an integral part of trustworthy decision-making. An approach to assess data validity is data quality scoring, which assigns a score to each data point or stream based on various quality dimensions. However, certain dimensions exhibit dynamic qualities, which require adaptation on the basis of the system’s current conditions. Existing methods often overlook this aspect, making them inefficient in dynamic production environments. In this paper, we introduce the Adaptive Data Quality Scoring Operations Framework, a novel framework developed to address the challenges posed by dynamic quality dimensions in industrial data streams. The framework introduces an innovative approach by integrating a dynamic change detector mechanism that actively monitors and adapts to changes in data quality, ensuring the relevance of quality scores. We evaluate the proposed framework performance in a real-world industrial use case. The experimental results reveal high predictive performance and efficient processing time, highlighting its effectiveness in practical quality-driven AI applications.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0164121224002280/pdfft?md5=11e8fc908484f0a491cea864fab6396e&pid=1-s2.0-S0164121224002280-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142002324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}