Pub Date : 2023-11-23DOI: 10.1007/s11219-023-09654-0
Yanyan Zhuang, Yu Yan, Lois Anne DeLong, Martin K. Yeh
Recent studies have empirically validated the existence of small patterns in C code, named atoms of confusion (or atoms for short), that can interfere with program comprehension. The focus of this research is an attempt to see if these patterns in C would have a similar impact on a second group of participants who have similar levels of experience with C, but come from different places. We report on studies conducted with students from the USA and China. Both sets of participants were shown snippets of code and asked to predict the output. While performance measures (accuracy and speed) showed little difference in aggregate, a few individual atoms yielded surprising results. For example, we found examples where the clarified versions of code, with the atoms removed, were more confusing to the Chinese participants, despite the presence of atoms having much less impact on this group in general. These findings suggest that both the atoms themselves, and the processes used to remove them, may be viewed differently by individuals from different parts of the world. As such, developing insights on the “cross-border” applicability of coding practices could help create better pedagogical practices to prepare students for today’s globally-integrated approach to software development.
{"title":"Do developer perceptions have borders? Comparing C code responses across continents","authors":"Yanyan Zhuang, Yu Yan, Lois Anne DeLong, Martin K. Yeh","doi":"10.1007/s11219-023-09654-0","DOIUrl":"https://doi.org/10.1007/s11219-023-09654-0","url":null,"abstract":"<p>Recent studies have empirically validated the existence of small patterns in C code, named atoms of confusion (or atoms for short), that can interfere with program comprehension. The focus of this research is an attempt to see if these patterns in C would have a similar impact on a second group of participants who have similar levels of experience with C, but come from different places. We report on studies conducted with students from the USA and China. Both sets of participants were shown snippets of code and asked to predict the output. While performance measures (accuracy and speed) showed little difference in aggregate, a few individual atoms yielded surprising results. For example, we found examples where the clarified versions of code, with the atoms removed, were <i>more confusing</i> to the Chinese participants, despite the presence of atoms having much less impact on this group in general. These findings suggest that both the atoms themselves, and the processes used to remove them, may be viewed differently by individuals from different parts of the world. As such, developing insights on the “cross-border” applicability of coding practices could help create better pedagogical practices to prepare students for today’s globally-integrated approach to software development.\u0000</p>","PeriodicalId":21827,"journal":{"name":"Software Quality Journal","volume":"32 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138517431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-16DOI: 10.1007/s11219-023-09655-z
Leevi Rantala, Mika Mäntylä, Valentina Lenarduzzi
Technical debt presents sub-optimal choices made in development, which are beneficial in the short term but not in the long run. Consciously admitted debt, which is marked with a keyword, e.g., TODO, is called keyword-labeled self-admitted technical debt (KL-SATD). KL-SATD can lead to adverse effects in software development, e.g., to a rise in complexity within the developed software. We investigated the relationship between KL-SATD from source code comments and reports from the highly popular industrial program analysis tool SonarQube. The goal was to find which SonarQube metrics and issues are related to KL-SATD introduction and removal and how many KL-SATD in the context of an issue addresses that issue. We performed a study with 33 software repositories. We analyzed the changes in SonarQube reports (sqale index, reliability and security remediation metrics, and SonarQube issues) and the relationship to KL-SATD addition and removal with mixed model analysis. We manually annotated a sample to investigate how many KL-SATD comments are in the context of SonarQube issues and how many address them directly. KL-SATD is associated with a reduction in code maintainability measured with SonarQube’s sqale index. KL-SATD removal is associated with an increase in code maintainability (sqale index) and reliability measured with SonarQube’s reliability remediation effort. The introduction and removal of KL-SATD have a predominantly relationship with code smells, and not with vulnerabilities and bugs. Manual annotation revealed that 36% of KL-SATD comments are in the context of a SonarQube issue, but only 15% of the comment address an issue. This means that despite of statistical relationship between KL-SATD comments and SonarQube reports there is a large set of KL-SATD comments that are in areas that Sonarqube reports as clean or free of maintainability issues. KL-SATD introduction and removal are connected mainly to code smells, connecting them to maintainability rather than reliability or security. This is reinforced by the relationship with the sqale index, as well as the dominance of code smells in SonarQube issues. Many KL-SATD issues have characteristics going beyond static analysis tools and require future studies extending the capabilities of the current tools. As KL-SATD comments and SonarQube reports appear to have limited overlap, it suggests that they are complementary and both are needed for getting a comprehensive view coverage of code maintainability. The study also presents rules violations developers should be aware of regarding KL-SATD introduction and removal.
{"title":"Keyword-labeled self-admitted technical debt and static code analysis have significant relationship but limited overlap","authors":"Leevi Rantala, Mika Mäntylä, Valentina Lenarduzzi","doi":"10.1007/s11219-023-09655-z","DOIUrl":"https://doi.org/10.1007/s11219-023-09655-z","url":null,"abstract":"<p>Technical debt presents sub-optimal choices made in development, which are beneficial in the short term but not in the long run. Consciously admitted debt, which is marked with a keyword, e.g., TODO, is called keyword-labeled self-admitted technical debt (KL-SATD). KL-SATD can lead to adverse effects in software development, e.g., to a rise in complexity within the developed software. We investigated the relationship between KL-SATD from source code comments and reports from the highly popular industrial program analysis tool SonarQube. The goal was to find which SonarQube metrics and issues are related to KL-SATD introduction and removal and how many KL-SATD in the context of an issue addresses that issue. We performed a study with 33 software repositories. We analyzed the changes in SonarQube reports (sqale index, reliability and security remediation metrics, and SonarQube issues) and the relationship to KL-SATD addition and removal with mixed model analysis. We manually annotated a sample to investigate how many KL-SATD comments are in the context of SonarQube issues and how many address them directly. KL-SATD is associated with a reduction in code maintainability measured with SonarQube’s sqale index. KL-SATD removal is associated with an increase in code maintainability (sqale index) and reliability measured with SonarQube’s reliability remediation effort. The introduction and removal of KL-SATD have a predominantly relationship with code smells, and not with vulnerabilities and bugs. Manual annotation revealed that 36% of KL-SATD comments are in the context of a SonarQube issue, but only 15% of the comment address an issue. This means that despite of statistical relationship between KL-SATD comments and SonarQube reports there is a large set of KL-SATD comments that are in areas that Sonarqube reports as clean or free of maintainability issues. KL-SATD introduction and removal are connected mainly to code smells, connecting them to maintainability rather than reliability or security. This is reinforced by the relationship with the sqale index, as well as the dominance of code smells in SonarQube issues. Many KL-SATD issues have characteristics going beyond static analysis tools and require future studies extending the capabilities of the current tools. As KL-SATD comments and SonarQube reports appear to have limited overlap, it suggests that they are complementary and both are needed for getting a comprehensive view coverage of code maintainability. The study also presents rules violations developers should be aware of regarding KL-SATD introduction and removal.\u0000</p>","PeriodicalId":21827,"journal":{"name":"Software Quality Journal","volume":"235 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138517478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-16DOI: 10.1007/s11219-023-09650-4
Nasir Mehmood Minhas, Mohsin Irshad, Kai Petersen, Jürgen Börstler
Abstract Replication studies help solidify and extend knowledge by evaluating previous studies’ findings. Software engineering literature showed that too few replications are conducted focusing on software artifacts without the involvement of humans. This study aims to replicate an artifact-based study on software testing to address the gap related to replications. In this investigation, we focus on (i) providing a step-by-step guide of the replication, reflecting on challenges when replicating artifact-based testing research and (ii) evaluating the replicated study concerning the validity and robustness of the findings. We replicate a test case prioritization technique proposed by Kwon et al. We replicated the original study using six software programs, four from the original study and two additional software programs. We automated the steps of the original study using a Jupyter notebook to support future replications. Various general factors facilitating replications are identified, such as (1) the importance of documentation; (2) the need for assistance from the original authors; (3) issues in the maintenance of open-source repositories (e.g., concerning needed software dependencies, versioning); and (4) availability of scripts. We also noted observations specific to the study and its context, such as insights from using different mutation tools and strategies for mutant generation. We conclude that the study by Kwon et al. is partially replicable for small software programs and could be automated to facilitate software practitioners, given the availability of required information. However, it is hard to implement the technique for large software programs with the current guidelines. Based on lessons learned, we suggest that the authors of original studies need to publish their data and experimental setup to support the external replications.
{"title":"Lessons learned from replicating a study on information-retrieval-based test case prioritization","authors":"Nasir Mehmood Minhas, Mohsin Irshad, Kai Petersen, Jürgen Börstler","doi":"10.1007/s11219-023-09650-4","DOIUrl":"https://doi.org/10.1007/s11219-023-09650-4","url":null,"abstract":"Abstract Replication studies help solidify and extend knowledge by evaluating previous studies’ findings. Software engineering literature showed that too few replications are conducted focusing on software artifacts without the involvement of humans. This study aims to replicate an artifact-based study on software testing to address the gap related to replications. In this investigation, we focus on (i) providing a step-by-step guide of the replication, reflecting on challenges when replicating artifact-based testing research and (ii) evaluating the replicated study concerning the validity and robustness of the findings. We replicate a test case prioritization technique proposed by Kwon et al. We replicated the original study using six software programs, four from the original study and two additional software programs. We automated the steps of the original study using a Jupyter notebook to support future replications. Various general factors facilitating replications are identified, such as (1) the importance of documentation; (2) the need for assistance from the original authors; (3) issues in the maintenance of open-source repositories (e.g., concerning needed software dependencies, versioning); and (4) availability of scripts. We also noted observations specific to the study and its context, such as insights from using different mutation tools and strategies for mutant generation. We conclude that the study by Kwon et al. is partially replicable for small software programs and could be automated to facilitate software practitioners, given the availability of required information. However, it is hard to implement the technique for large software programs with the current guidelines. Based on lessons learned, we suggest that the authors of original studies need to publish their data and experimental setup to support the external replications.","PeriodicalId":21827,"journal":{"name":"Software Quality Journal","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136112943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-10DOI: 10.1007/s11219-023-09651-3
Dongjin Yu, Sicheng Li, Xin Chen, Tian Sun
{"title":"Identifying the severity of technical debt issues based on semantic and structural information","authors":"Dongjin Yu, Sicheng Li, Xin Chen, Tian Sun","doi":"10.1007/s11219-023-09651-3","DOIUrl":"https://doi.org/10.1007/s11219-023-09651-3","url":null,"abstract":"","PeriodicalId":21827,"journal":{"name":"Software Quality Journal","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136356535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-07DOI: 10.1007/s11219-023-09652-2
Andres Rodriguez, Juan Cruz Gardey, Julian Grigera, Gustavo Rossi, Alejandra Garrido
{"title":"UX debt in an agile development process: evidence and characterization","authors":"Andres Rodriguez, Juan Cruz Gardey, Julian Grigera, Gustavo Rossi, Alejandra Garrido","doi":"10.1007/s11219-023-09652-2","DOIUrl":"https://doi.org/10.1007/s11219-023-09652-2","url":null,"abstract":"","PeriodicalId":21827,"journal":{"name":"Software Quality Journal","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135252730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-27DOI: 10.1007/s11219-023-09649-x
Giovanni Giachetti, José Luis de la Vara, Beatriz Marín
{"title":"Model-driven gap analysis for the fulfillment of quality standards in software development processes","authors":"Giovanni Giachetti, José Luis de la Vara, Beatriz Marín","doi":"10.1007/s11219-023-09649-x","DOIUrl":"https://doi.org/10.1007/s11219-023-09649-x","url":null,"abstract":"","PeriodicalId":21827,"journal":{"name":"Software Quality Journal","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.1007/s11219-023-09645-1
Amid Golmohammadi, Man Zhang, Andrea Arcuri
{"title":".NET/C# instrumentation for search-based software testing","authors":"Amid Golmohammadi, Man Zhang, Andrea Arcuri","doi":"10.1007/s11219-023-09645-1","DOIUrl":"https://doi.org/10.1007/s11219-023-09645-1","url":null,"abstract":"","PeriodicalId":21827,"journal":{"name":"Software Quality Journal","volume":" ","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43576626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-29DOI: 10.1007/s11219-023-09647-z
Enrique Roncero, Andrés Silva
{"title":"Correction to: TeqReq: a new family of test‑related requirements attributes","authors":"Enrique Roncero, Andrés Silva","doi":"10.1007/s11219-023-09647-z","DOIUrl":"https://doi.org/10.1007/s11219-023-09647-z","url":null,"abstract":"","PeriodicalId":21827,"journal":{"name":"Software Quality Journal","volume":" ","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43305795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-28DOI: 10.1007/s11219-023-09646-0
Dusica Marijan
There is a growing body of research indicating the potential of machine learning to tackle complex software testing challenges. One such challenge pertains to continuous integration testing, which is highly time-constrained, and generates a large amount of data coming from iterative code commits and test runs. In such a setting, we can use plentiful test data for training machine learning predictors to identify test cases able to speed up the detection of regression bugs introduced during code integration. However, different machine learning models can have different fault prediction performance depending on the context and the parameters of continuous integration testing, for example, variable time budget available for continuous integration cycles, or the size of test execution history used for learning to prioritize failing test cases. Existing studies on test case prioritization rarely study both of these factors, which are essential for the continuous integration practice. In this study, we perform a comprehensive comparison of the fault prediction performance of machine learning approaches that have shown the best performance on test case prioritization tasks in the literature. We evaluate the accuracy of the classifiers in predicting fault-detecting tests for different values of the continuous integration time budget and with different lengths of test history used for training the classifiers. In evaluation, we use real-world and augmented industrial datasets from a continuous integration practice. The results show that different machine learning models have different performance for different size of test history used for model training and for different time budgets available for test case execution. Our results imply that machine learning approaches for test prioritization in continuous integration testing should be carefully configured to achieve optimal performance.
{"title":"Comparative study of machine learning test case prioritization for continuous integration testing","authors":"Dusica Marijan","doi":"10.1007/s11219-023-09646-0","DOIUrl":"https://doi.org/10.1007/s11219-023-09646-0","url":null,"abstract":"There is a growing body of research indicating the potential of machine learning to tackle complex software testing challenges. One such challenge pertains to continuous integration testing, which is highly time-constrained, and generates a large amount of data coming from iterative code commits and test runs. In such a setting, we can use plentiful test data for training machine learning predictors to identify test cases able to speed up the detection of regression bugs introduced during code integration. However, different machine learning models can have different fault prediction performance depending on the context and the parameters of continuous integration testing, for example, variable time budget available for continuous integration cycles, or the size of test execution history used for learning to prioritize failing test cases. Existing studies on test case prioritization rarely study both of these factors, which are essential for the continuous integration practice. In this study, we perform a comprehensive comparison of the fault prediction performance of machine learning approaches that have shown the best performance on test case prioritization tasks in the literature. We evaluate the accuracy of the classifiers in predicting fault-detecting tests for different values of the continuous integration time budget and with different lengths of test history used for training the classifiers. In evaluation, we use real-world and augmented industrial datasets from a continuous integration practice. The results show that different machine learning models have different performance for different size of test history used for model training and for different time budgets available for test case execution. Our results imply that machine learning approaches for test prioritization in continuous integration testing should be carefully configured to achieve optimal performance.","PeriodicalId":21827,"journal":{"name":"Software Quality Journal","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135556996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}