Oleksandra Klymenko, O. Kosenkov, Stephen Meisenbacher, Parisa Elahidoost, D. Méndez, F. Matthes
Background: Modern privacy regulations, such as the General Data Protection Regulation (GDPR), address privacy in software systems in a technologically agnostic way by mentioning general ”technical measures” for data privacy compliance rather than dictating how these should be implemented. An understanding of the concept of technical measures and how exactly these can be handled in practice, however, is not trivial due to its interdisciplinary nature and the necessary technical-legal interactions. Aims: We aim to investigate how the concept of technical measures for data privacy compliance is understood in practice as well as the technical-legal interaction intrinsic to the process of implementing those technical measures. Methods: We follow a research design that is 1) exploratory in nature, 2) qualitative, and 3) interview-based, with 16 selected privacy professionals in the technical and legal domains. Results: Our results suggest that there is no clear mutual understanding and commonly accepted approach to handling technical measures. Both technical and legal roles are involved in the implementation of such measures. While they still often operate in separate spheres, a predominant opinion amongst the interviewees is to promote more interdisciplinary collaboration. Conclusions: Our empirical findings confirm the need for better interaction between legal and engineering teams when implementing technical measures for data privacy. We posit that interdisciplinary collaboration is paramount to a more complete understanding of technical measures, which currently lacks a mutually accepted notion. Yet, as strongly suggested by our results, there is still a lack of systematic approaches to such interaction. Therefore, the results strengthen our confidence in the need for further investigations into the technical-legal dynamic of data privacy compliance.
{"title":"Understanding the Implementation of Technical Measures in the Process of Data Privacy Compliance: A Qualitative Study","authors":"Oleksandra Klymenko, O. Kosenkov, Stephen Meisenbacher, Parisa Elahidoost, D. Méndez, F. Matthes","doi":"10.1145/3544902.3546234","DOIUrl":"https://doi.org/10.1145/3544902.3546234","url":null,"abstract":"Background: Modern privacy regulations, such as the General Data Protection Regulation (GDPR), address privacy in software systems in a technologically agnostic way by mentioning general ”technical measures” for data privacy compliance rather than dictating how these should be implemented. An understanding of the concept of technical measures and how exactly these can be handled in practice, however, is not trivial due to its interdisciplinary nature and the necessary technical-legal interactions. Aims: We aim to investigate how the concept of technical measures for data privacy compliance is understood in practice as well as the technical-legal interaction intrinsic to the process of implementing those technical measures. Methods: We follow a research design that is 1) exploratory in nature, 2) qualitative, and 3) interview-based, with 16 selected privacy professionals in the technical and legal domains. Results: Our results suggest that there is no clear mutual understanding and commonly accepted approach to handling technical measures. Both technical and legal roles are involved in the implementation of such measures. While they still often operate in separate spheres, a predominant opinion amongst the interviewees is to promote more interdisciplinary collaboration. Conclusions: Our empirical findings confirm the need for better interaction between legal and engineering teams when implementing technical measures for data privacy. We posit that interdisciplinary collaboration is paramount to a more complete understanding of technical measures, which currently lacks a mutually accepted notion. Yet, as strongly suggested by our results, there is still a lack of systematic approaches to such interaction. Therefore, the results strengthen our confidence in the need for further investigations into the technical-legal dynamic of data privacy compliance.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130343440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Otávio Cury da Costa Castro, G. Avelino, P. Neto, Ricardo Britto, M. T. Valente
Background: In software development, the identification of source code file experts is an important task. Identifying these experts helps to improve software maintenance and evolution activities, such as developing new features, code reviews, and bug fixes. Although some studies have proposed repository-mining techniques to automatically identify source code experts, there are still gaps in this area that can be explored. For example, investigating new variables related to source code knowledge and applying machine learning aiming to improve the performance of techniques to identify source code experts. Aim: The goal of this study is to investigate opportunities to improve the performance of existing techniques to recommend source code files experts. Method: We built an oracle by collecting data from the development history and surveying developers of 113 software projects. Then, we use this oracle to: (i) analyze the correlation between measures extracted from the development history and the developers’ source code knowledge and (ii) investigate the use of machine learning classifiers by evaluating their performance in identifying source code files experts. Results:First Authorship and Recency of Modification are the variables with the highest positive and negative correlations with source code knowledge, respectively. Machine learning classifiers outperformed the linear techniques (F-Measure = 71% to 73%) in the public dataset, but this advantage is not clear in the private dataset, with F-Measure ranging from 55% to 68% for the linear techniques and 58% to 67% for ML techniques. Conclusion: Overall, the linear techniques and the machine learning classifiers achieved similar performance, particularly if we analyze F-Measure. However, machine learning classifiers usually get higher precision while linear techniques obtained the highest recall values. Therefore, the choice of the best technique depends on the user’s tolerance to false positives and false negatives.
背景:在软件开发中,源代码文件专家的识别是一项重要任务。识别这些专家有助于改进软件维护和发展活动,例如开发新特性、代码审查和错误修复。尽管一些研究已经提出了自动识别源代码专家的存储库挖掘技术,但在这一领域仍然存在有待探索的空白。例如,调查与源代码知识相关的新变量,并应用旨在提高识别源代码专家的技术性能的机器学习。目的:本研究的目的是调查机会,以提高现有技术的性能,以推荐源代码文件专家。方法:通过对113个软件项目的开发人员进行调查,收集开发历史数据,构建了一个oracle数据库。然后,我们使用该oracle:(i)分析从开发历史中提取的度量与开发人员的源代码知识之间的相关性;(ii)通过评估机器学习分类器在识别源代码文件专家方面的性能来研究机器学习分类器的使用。结果:第一作者身份(First author)和修改近时性(recent of Modification)分别是与源代码知识正相关和负相关最高的变量。机器学习分类器在公共数据集中优于线性技术(F-Measure = 71%至73%),但这种优势在私有数据集中并不明显,线性技术的F-Measure范围为55%至68%,ML技术的F-Measure范围为58%至67%。结论:总体而言,线性技术和机器学习分类器实现了相似的性能,特别是如果我们分析F-Measure。然而,机器学习分类器通常获得更高的精度,而线性技术获得最高的召回值。因此,最佳技术的选择取决于用户对假阳性和假阴性的容忍度。
{"title":"Identifying Source Code File Experts","authors":"Otávio Cury da Costa Castro, G. Avelino, P. Neto, Ricardo Britto, M. T. Valente","doi":"10.1145/3544902.3546243","DOIUrl":"https://doi.org/10.1145/3544902.3546243","url":null,"abstract":"Background: In software development, the identification of source code file experts is an important task. Identifying these experts helps to improve software maintenance and evolution activities, such as developing new features, code reviews, and bug fixes. Although some studies have proposed repository-mining techniques to automatically identify source code experts, there are still gaps in this area that can be explored. For example, investigating new variables related to source code knowledge and applying machine learning aiming to improve the performance of techniques to identify source code experts. Aim: The goal of this study is to investigate opportunities to improve the performance of existing techniques to recommend source code files experts. Method: We built an oracle by collecting data from the development history and surveying developers of 113 software projects. Then, we use this oracle to: (i) analyze the correlation between measures extracted from the development history and the developers’ source code knowledge and (ii) investigate the use of machine learning classifiers by evaluating their performance in identifying source code files experts. Results:First Authorship and Recency of Modification are the variables with the highest positive and negative correlations with source code knowledge, respectively. Machine learning classifiers outperformed the linear techniques (F-Measure = 71% to 73%) in the public dataset, but this advantage is not clear in the private dataset, with F-Measure ranging from 55% to 68% for the linear techniques and 58% to 67% for ML techniques. Conclusion: Overall, the linear techniques and the machine learning classifiers achieved similar performance, particularly if we analyze F-Measure. However, machine learning classifiers usually get higher precision while linear techniques obtained the highest recall values. Therefore, the choice of the best technique depends on the user’s tolerance to false positives and false negatives.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"203 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122434276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Much research has been conducted to investigate the impact of Continuous Integration (CI) on the productivity and quality of open-source projects. Most of studies have analyzed the impact of adopting a CI server service (e.g, Travis-CI) but did not analyze CI sub-practices. Aims: We aim to evaluate the impact of five CI sub-practices with respect to the productivity and quality of GitHub open-source projects. Method: We collect CI sub-practices of 90 relevant open-source projects for a period of 2 years. We use regression models to analyze whether projects upholding the CI sub-practices are more productive and/or generate fewer bugs. We also perform a qualitative document analysis to understand whether CI best practices are related to a higher quality of projects. Results: Our findings reveal a correlation between the Build Activity and Commit Activity sub-practices and the number of merged pull requests. We also observe a correlation between the Build Activity, Build Health and Time to Fix Broken Builds sub-practices and number of bug-related issues. The qualitative analysis reveals that projects with the best values for CI sub-practices face fewer CI-related problems compared to projects that exhibit the worst values for CI sub-practices. Conclusions: We recommend that projects should strive to uphold the several CI sub-practices as they can impact in the productivity and quality of projects.
{"title":"Investigating the Impact of Continuous Integration Practices on the Productivity and Quality of Open-Source Projects","authors":"Jadson Santos, D. A. D. Costa, U. Kulesza","doi":"10.1145/3544902.3546244","DOIUrl":"https://doi.org/10.1145/3544902.3546244","url":null,"abstract":"Background: Much research has been conducted to investigate the impact of Continuous Integration (CI) on the productivity and quality of open-source projects. Most of studies have analyzed the impact of adopting a CI server service (e.g, Travis-CI) but did not analyze CI sub-practices. Aims: We aim to evaluate the impact of five CI sub-practices with respect to the productivity and quality of GitHub open-source projects. Method: We collect CI sub-practices of 90 relevant open-source projects for a period of 2 years. We use regression models to analyze whether projects upholding the CI sub-practices are more productive and/or generate fewer bugs. We also perform a qualitative document analysis to understand whether CI best practices are related to a higher quality of projects. Results: Our findings reveal a correlation between the Build Activity and Commit Activity sub-practices and the number of merged pull requests. We also observe a correlation between the Build Activity, Build Health and Time to Fix Broken Builds sub-practices and number of bug-related issues. The qualitative analysis reveals that projects with the best values for CI sub-practices face fewer CI-related problems compared to projects that exhibit the worst values for CI sub-practices. Conclusions: We recommend that projects should strive to uphold the several CI sub-practices as they can impact in the productivity and quality of projects.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115887429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Code reviewing is an essential part of software development to ensure software quality. However, the abundance of review tasks and the intensity of the workload for reviewers negatively impact the quality of the reviews. The short review text is often unactionable. Aims: We propose the Example Driven Review Explanation (EDRE) method to facilitate the code review process by adding additional explanations through examples. EDRE recommends similar code reviews as examples to further explain a review and help a developer to understand the received reviews with less communication overhead. Method: Through an empirical study in an industrial setting and by analyzing 3,722 Code reviews across three open-source projects, we compared five methods of data retrieval, text classification, and text recommendation. Results: EDRE using TF-IDF word embedding along with an SVM classifier can provide practical examples for each code review with 92% F-score and 90% Accuracy. Conclusions: The example-based explanation is an established method for assisting experts in explaining decisions. EDRE can accurately provide a set of context-specific examples to facilitate the code review process in software teams.
{"title":"Example Driven Code Review Explanation","authors":"Shadikur Rahman, Umme Ayman Koana, Maleknaz Nayebi","doi":"10.1145/3544902.3546639","DOIUrl":"https://doi.org/10.1145/3544902.3546639","url":null,"abstract":"Background: Code reviewing is an essential part of software development to ensure software quality. However, the abundance of review tasks and the intensity of the workload for reviewers negatively impact the quality of the reviews. The short review text is often unactionable. Aims: We propose the Example Driven Review Explanation (EDRE) method to facilitate the code review process by adding additional explanations through examples. EDRE recommends similar code reviews as examples to further explain a review and help a developer to understand the received reviews with less communication overhead. Method: Through an empirical study in an industrial setting and by analyzing 3,722 Code reviews across three open-source projects, we compared five methods of data retrieval, text classification, and text recommendation. Results: EDRE using TF-IDF word embedding along with an SVM classifier can provide practical examples for each code review with 92% F-score and 90% Accuracy. Conclusions: The example-based explanation is an established method for assisting experts in explaining decisions. EDRE can accurately provide a set of context-specific examples to facilitate the code review process in software teams.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127132948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Software development results in the production of various types of artifacts: source code, version control system metadata, bug reports, mailing list conversations, test data, etc. Empirical software engineering (ESE) has thrived mining those artifacts to uncover the inner workings of software development and improve its practices. But which artifacts are studied in the field is a moving target, which we study empirically in this paper. Aims: We quantitatively characterize the most frequently mined and co-mined software artifacts in ESE research and the research purposes they support. Method: We conduct a meta-analysis of artifact mining studies published in 11 top conferences in ESE, for a total of 9621 papers. We use natural language processing (NLP) techniques to characterize the types of software artifacts that are most often mined and their evolution over a 16-year period (2004–2020). We analyze the combinations of artifact types that are most often mined together, as well as the relationship between study purposes and mined artifacts. Results: We find that: (1) mining happens in the vast majority of analyzed papers, (2) source code and test data are the most mined artifacts, (3) there is an increasing interest in mining novel artifacts, together with source code, (4) researchers are most interested in the evaluation of software systems and use all possible empirical signals to support that goal. Conclusions: Our study presents a meta analysis of the usage of software artifacts in the field over a period of 16 years using NLP techniques.
{"title":"Software Artifact Mining in Software Engineering Conferences: A Meta-Analysis","authors":"Zeinab Abou Khalil, Stefano Zacchiroli","doi":"10.1145/3544902.3546239","DOIUrl":"https://doi.org/10.1145/3544902.3546239","url":null,"abstract":"Background: Software development results in the production of various types of artifacts: source code, version control system metadata, bug reports, mailing list conversations, test data, etc. Empirical software engineering (ESE) has thrived mining those artifacts to uncover the inner workings of software development and improve its practices. But which artifacts are studied in the field is a moving target, which we study empirically in this paper. Aims: We quantitatively characterize the most frequently mined and co-mined software artifacts in ESE research and the research purposes they support. Method: We conduct a meta-analysis of artifact mining studies published in 11 top conferences in ESE, for a total of 9621 papers. We use natural language processing (NLP) techniques to characterize the types of software artifacts that are most often mined and their evolution over a 16-year period (2004–2020). We analyze the combinations of artifact types that are most often mined together, as well as the relationship between study purposes and mined artifacts. Results: We find that: (1) mining happens in the vast majority of analyzed papers, (2) source code and test data are the most mined artifacts, (3) there is an increasing interest in mining novel artifacts, together with source code, (4) researchers are most interested in the evaluation of software systems and use all possible empirical signals to support that goal. Conclusions: Our study presents a meta analysis of the usage of software artifacts in the field over a period of 16 years using NLP techniques.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123859463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Larissa Braz, Enrico Fregnan, Vivek Arora, Alberto Bacchelli
Background: Security regressions are vulnerabilities introduced in a previously unaffected software system. They often happen as a result of code changes (e.g., a bug fix) and can have severe effects. Aims: We aim to increase the understanding of security regressions. Method: To this aim, we perform an exploratory, mixed-method case study of Mozilla. First, we analyze 78 regression vulnerabilities and 72 bug reports where a bug fix introduced a regression vulnerability at Mozilla. We investigate how developers interact in these bug reports, how they perform the changes, and under what conditions they introduce these regressions. Second, we conduct five semi-structured interviews with as many Mozilla developers involved in the vulnerability-inducing fixes. Results: Security is not discussed during bug fixes. Developers’ main concerns are the complexity of the bug at hand and the community pressure to fix it. Developers do not to worry about regression vulnerabilities and assume tools will detect them. Indeed, dynamic analysis tools helped finding around 30% of these regressions. Conclusions: Although tool support helps identify regression vulnerabilities, it may not be enough to ensure security during bug fixes. Furthermore, our results call for further work on the security tooling support and their integration during bug fixes. Preprint: https://arxiv.org/abs/2207.01942 Data and materials: https://doi.org/10.5281/zenodo.6792317
{"title":"An Exploratory Study on Regression Vulnerabilities","authors":"Larissa Braz, Enrico Fregnan, Vivek Arora, Alberto Bacchelli","doi":"10.1145/3544902.3546250","DOIUrl":"https://doi.org/10.1145/3544902.3546250","url":null,"abstract":"Background: Security regressions are vulnerabilities introduced in a previously unaffected software system. They often happen as a result of code changes (e.g., a bug fix) and can have severe effects. Aims: We aim to increase the understanding of security regressions. Method: To this aim, we perform an exploratory, mixed-method case study of Mozilla. First, we analyze 78 regression vulnerabilities and 72 bug reports where a bug fix introduced a regression vulnerability at Mozilla. We investigate how developers interact in these bug reports, how they perform the changes, and under what conditions they introduce these regressions. Second, we conduct five semi-structured interviews with as many Mozilla developers involved in the vulnerability-inducing fixes. Results: Security is not discussed during bug fixes. Developers’ main concerns are the complexity of the bug at hand and the community pressure to fix it. Developers do not to worry about regression vulnerabilities and assume tools will detect them. Indeed, dynamic analysis tools helped finding around 30% of these regressions. Conclusions: Although tool support helps identify regression vulnerabilities, it may not be enough to ensure security during bug fixes. Furthermore, our results call for further work on the security tooling support and their integration during bug fixes. Preprint: https://arxiv.org/abs/2207.01942 Data and materials: https://doi.org/10.5281/zenodo.6792317","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132619905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
[Background:] Teamwork, coordination, and communication are a prerequisite for the timely completion of a software project. Meetings as a facilitator for coordination and communication are an established medium for information exchange. Analyses of meetings in software projects have shown that certain interactions in these meetings, such as proactive statements followed by supportive ones, influence the mood and motivation of a team, which in turn affects its productivity. So far, however, research has focused only on certain interactions at a detailed level, requiring a complex and fine-grained analysis of a meeting itself. [Aim:] In this paper, we investigate meetings from a more abstract perspective, focusing on the polarity of the statements, i.e., whether they appear to be positive, negative, or neutral. [Method:] We analyze the relationship between the polarity of statements in meetings and different social aspects, including conflicts as well as the mood before and after a meeting. [Results:] Our results emerge from 21 student software project meetings and show some interesting insights: (1) Positive mood before a meeting is both related to the amount of positive statements in the beginning, as well as throughout the whole meeting, (2) negative mood before the meeting only influences the amount of negative statements in the first quarter of the meeting, but not the whole meeting, and (3) the amount of positive and negative statements during the meeting has no influence on the mood afterwards. [Conclusions:] We conclude that the behaviour in meetings might rather influence short-term emotional states (feelings) than long-term emotional states (mood), which are more important for the project.
{"title":"Meetings and Mood – Related or Not? Insights from Student Software Projects","authors":"Jil Klunder, Oliver Karras","doi":"10.1145/3544902.3546252","DOIUrl":"https://doi.org/10.1145/3544902.3546252","url":null,"abstract":"[Background:] Teamwork, coordination, and communication are a prerequisite for the timely completion of a software project. Meetings as a facilitator for coordination and communication are an established medium for information exchange. Analyses of meetings in software projects have shown that certain interactions in these meetings, such as proactive statements followed by supportive ones, influence the mood and motivation of a team, which in turn affects its productivity. So far, however, research has focused only on certain interactions at a detailed level, requiring a complex and fine-grained analysis of a meeting itself. [Aim:] In this paper, we investigate meetings from a more abstract perspective, focusing on the polarity of the statements, i.e., whether they appear to be positive, negative, or neutral. [Method:] We analyze the relationship between the polarity of statements in meetings and different social aspects, including conflicts as well as the mood before and after a meeting. [Results:] Our results emerge from 21 student software project meetings and show some interesting insights: (1) Positive mood before a meeting is both related to the amount of positive statements in the beginning, as well as throughout the whole meeting, (2) negative mood before the meeting only influences the amount of negative statements in the first quarter of the meeting, but not the whole meeting, and (3) the amount of positive and negative statements during the meeting has no influence on the mood afterwards. [Conclusions:] We conclude that the behaviour in meetings might rather influence short-term emotional states (feelings) than long-term emotional states (mood), which are more important for the project.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129801985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liming Fu, Peng Liang, Z. Rasheed, Zengyang Li, Amjed Tahir, Xiaofeng Han
Background: Technical Debt (TD) refers to the situation where developers make trade-offs to achieve short-term goals at the expense of long-term code quality, which can have a negative impact on the quality of software systems. In the context of code review, such sub-optimal implementations have chances to be timely resolved during the review process before the code is merged. Therefore, we could consider them as Potential Technical Debt (PTD) since PTD will evolve into TD when it is injected into software systems without being resolved. Aim: To date, little is known about the extent to which PTD is identified in code reviews. Many tools have been provided to detect TD, but these tools lack consensus and a large amount of PTD are undetectable by tools while code review could help verify the quality of code that has been committed by identifying issues, such as PTD. To this end, we conducted an exploratory study in an attempt to understand the nature of PTD in code reviews and track down the resolution of PTD after being identified. Method: We randomly collected 2,030 review comments from the Nova project of OpenStack and the Qt Base project of Qt. We then manually checked these review comments, and obtained 163 PTD-related review comments for further analysis. Results: Our results show that: (1) PTD can be identified in code reviews but is not prevalent. (2) Design, defect, documentation, requirement, test, and code PTD are identified in code reviews, in which code and documentation PTD are the dominant. (3) 81.0% of the PTD identified in code reviews has been resolved by developers, and 78.0% of the resolved TD was resolved by developers within a week. (4) Code refactoring is the main practice used by developers to resolve the PTD identified in code reviews. Conclusions: Our findings indicate that: (1) review-based detection of PTD is seen as one of the trustworthy mechanisms in development, and (2) there is still a significant proportion of PTD (19.0%) remaining unresolved when injected into the software systems. Practitioners and researchers should establish effective strategies to manage and resolve PTD in development.
{"title":"Potential Technical Debt and Its Resolution in Code Reviews: An Exploratory Study of the OpenStack and Qt Communities","authors":"Liming Fu, Peng Liang, Z. Rasheed, Zengyang Li, Amjed Tahir, Xiaofeng Han","doi":"10.1145/3544902.3546253","DOIUrl":"https://doi.org/10.1145/3544902.3546253","url":null,"abstract":"Background: Technical Debt (TD) refers to the situation where developers make trade-offs to achieve short-term goals at the expense of long-term code quality, which can have a negative impact on the quality of software systems. In the context of code review, such sub-optimal implementations have chances to be timely resolved during the review process before the code is merged. Therefore, we could consider them as Potential Technical Debt (PTD) since PTD will evolve into TD when it is injected into software systems without being resolved. Aim: To date, little is known about the extent to which PTD is identified in code reviews. Many tools have been provided to detect TD, but these tools lack consensus and a large amount of PTD are undetectable by tools while code review could help verify the quality of code that has been committed by identifying issues, such as PTD. To this end, we conducted an exploratory study in an attempt to understand the nature of PTD in code reviews and track down the resolution of PTD after being identified. Method: We randomly collected 2,030 review comments from the Nova project of OpenStack and the Qt Base project of Qt. We then manually checked these review comments, and obtained 163 PTD-related review comments for further analysis. Results: Our results show that: (1) PTD can be identified in code reviews but is not prevalent. (2) Design, defect, documentation, requirement, test, and code PTD are identified in code reviews, in which code and documentation PTD are the dominant. (3) 81.0% of the PTD identified in code reviews has been resolved by developers, and 78.0% of the resolved TD was resolved by developers within a week. (4) Code refactoring is the main practice used by developers to resolve the PTD identified in code reviews. Conclusions: Our findings indicate that: (1) review-based detection of PTD is seen as one of the trustworthy mechanisms in development, and (2) there is still a significant proportion of PTD (19.0%) remaining unresolved when injected into the software systems. Practitioners and researchers should establish effective strategies to manage and resolve PTD in development.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134172449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background. Software effort can be measured by story point [35]. Story point estimation is important in software projects’ planning. Current approaches for automatically estimating story points focus on applying pre-trained embedding models and deep learning for text regression to solve this problem. These approaches require expensive embedding models and confront challenges that the sequence of text might not be an efficient representation for software issues which can be the combination of text and code. Aims. We propose HeteroSP, a tool for estimating story points from textual input of Agile software project issues. We select GPT2SP [12] and Deep-SE [8] as the baselines for comparison. Method. First, from the analysis of the story point dataset [8], we conclude that software issues are actually a mixture of natural language sentences with quoted code snippets and have problems related to large-size vocabulary. Second, we provide a module to normalize the input text including words and code tokens of the software issues. Third, we design an algorithm to convert an input software issue to a graph with different types of nodes and edges. Fourth, we construct a heterogeneous graph neural networks model with the support of fastText [6] for constructing initial node embedding to learn and predict the story points of new issues. Results. We did the comparison over three scenarios of estimation, including within project, cross-project within the repository, and cross-project cross repository with our baseline approaches. We achieve the average Mean Absolute Error (MAE) as 2.38, 2.61, and 2.63 for three scenarios. We outperform GPT2SP in 2/3 of the scenarios while outperforming Deep-SE in the most challenging scenario with significantly less amount of running time. We also compare our approaches with different homogeneous graph neural network models and the results show that the heterogeneous graph neural networks model outperforms the homogeneous models in story point estimation. For time performance, we achieve about 570 seconds as the time performance in both three processes: node embedding initialization, model construction, and story point estimation. HeterpSP’s artifacts are available at [22]. Conclusion. HeteroSP, a heterogeneous graph neural networks model for story point estimation, achieved good accuracy and running time.
{"title":"Heterogeneous Graph Neural Networks for Software Effort Estimation","authors":"H. Phan, A. Jannesari","doi":"10.1145/3544902.3546248","DOIUrl":"https://doi.org/10.1145/3544902.3546248","url":null,"abstract":"Background. Software effort can be measured by story point [35]. Story point estimation is important in software projects’ planning. Current approaches for automatically estimating story points focus on applying pre-trained embedding models and deep learning for text regression to solve this problem. These approaches require expensive embedding models and confront challenges that the sequence of text might not be an efficient representation for software issues which can be the combination of text and code. Aims. We propose HeteroSP, a tool for estimating story points from textual input of Agile software project issues. We select GPT2SP [12] and Deep-SE [8] as the baselines for comparison. Method. First, from the analysis of the story point dataset [8], we conclude that software issues are actually a mixture of natural language sentences with quoted code snippets and have problems related to large-size vocabulary. Second, we provide a module to normalize the input text including words and code tokens of the software issues. Third, we design an algorithm to convert an input software issue to a graph with different types of nodes and edges. Fourth, we construct a heterogeneous graph neural networks model with the support of fastText [6] for constructing initial node embedding to learn and predict the story points of new issues. Results. We did the comparison over three scenarios of estimation, including within project, cross-project within the repository, and cross-project cross repository with our baseline approaches. We achieve the average Mean Absolute Error (MAE) as 2.38, 2.61, and 2.63 for three scenarios. We outperform GPT2SP in 2/3 of the scenarios while outperforming Deep-SE in the most challenging scenario with significantly less amount of running time. We also compare our approaches with different homogeneous graph neural network models and the results show that the heterogeneous graph neural networks model outperforms the homogeneous models in story point estimation. For time performance, we achieve about 570 seconds as the time performance in both three processes: node embedding initialization, model construction, and story point estimation. HeterpSP’s artifacts are available at [22]. Conclusion. HeteroSP, a heterogeneous graph neural networks model for story point estimation, achieved good accuracy and running time.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133307144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leonardo Barbosa, V. H. S. C. Pinto, A. Souza, G. Pinto
Background: Cognitive-Driven Development (CDD) is a coding design technique that aims to reduce developers’ cognitive effort in understanding a given code unit (e.g., a class). By following CDD design practices, it is expected that the coding units to be smaller and, thus, easier to maintain and evolve. However, it is so unknown whether these smaller code units coded using CDD standards are easier to understand. Aims: This work aims to assess how much CDD improves code readability. Method: To achieve this goal, we conducted a two-phase study. We start by inviting professional software developers to vote (and justify their rationale) on the most readable pair of code snippets (from a set of 10 pairs); one of the pairs was coded using CDD practices. We received 133 answers. In the second phase, we applied the state-of-the-art readability model to the 10-pairs of CDD-driven refactorings. Results: We observed some conflicting results. On the one hand, developers perceived that seven (out of 10) CDD-driven refactorings were more readable than their counterparts; for two other CDD-driven refactorings, developers were undecided, while only in one of the CDD-driven refactorings, developers preferred the original code snippet. On the other hand, we noticed that only one CDD-driven refactorings has better performance readability, assessed by state-of-the-art readability models. Conclusions: Our results provide initial evidence that CDD could be an exciting approach for software design.
{"title":"To What Extent Cognitive-Driven Development Improves Code Readability?","authors":"Leonardo Barbosa, V. H. S. C. Pinto, A. Souza, G. Pinto","doi":"10.1145/3544902.3546241","DOIUrl":"https://doi.org/10.1145/3544902.3546241","url":null,"abstract":"Background: Cognitive-Driven Development (CDD) is a coding design technique that aims to reduce developers’ cognitive effort in understanding a given code unit (e.g., a class). By following CDD design practices, it is expected that the coding units to be smaller and, thus, easier to maintain and evolve. However, it is so unknown whether these smaller code units coded using CDD standards are easier to understand. Aims: This work aims to assess how much CDD improves code readability. Method: To achieve this goal, we conducted a two-phase study. We start by inviting professional software developers to vote (and justify their rationale) on the most readable pair of code snippets (from a set of 10 pairs); one of the pairs was coded using CDD practices. We received 133 answers. In the second phase, we applied the state-of-the-art readability model to the 10-pairs of CDD-driven refactorings. Results: We observed some conflicting results. On the one hand, developers perceived that seven (out of 10) CDD-driven refactorings were more readable than their counterparts; for two other CDD-driven refactorings, developers were undecided, while only in one of the CDD-driven refactorings, developers preferred the original code snippet. On the other hand, we noticed that only one CDD-driven refactorings has better performance readability, assessed by state-of-the-art readability models. Conclusions: Our results provide initial evidence that CDD could be an exciting approach for software design.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124659836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}