Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement最新文献_第2页

Analyzing the Relationship between Community and Design Smells in Open-Source Software Projects: An Empirical Study 开源软件项目中社区与设计气味关系的实证研究

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2022-09-19 DOI: 10.1145/3544902.3546249

Haris Mumtaz, Paramvir Singh, Kelly Blincoe

Background: Software smells reflect the sub-optimal patterns in the software. In a similar way, community smells consider the sub-optimal patterns in the organizational and social structures of software teams. Related work performed empirical studies to identify the relationship between community smells and software smells at the architecture and code levels. However, how community smells relate with design smells is still unknown. Aims: In this paper, we empirically investigate the relationship between community smells and design smells during the evolution of software projects. Method: We apply three statistical methods: correlation, trend, and information gain analysis to empirically examine the relationship between community and design smells in 100 releases of 10 large-scale Apache open-source software projects. Results: Our results reveal that the relationship between community and design smells varies across the analyzed projects. We find significant correlations and trend similarities for one type of community smell (when developers work in isolation without peer communication—Missing Links) with design smells in most of the analyzed projects. Furthermore, the results of our statistical model disclose that community smells are more relevant for design smells compared to other community-related factors. Conclusion: Our results find that the relationship of community smells (in particular, the Missing Links smell) exists with design smells. Based on our findings, we discuss specific community smell refactoring techniques that should be done together when refactoring design smells so that the problems associated with the social and technical (design) aspects of the projects can be managed concurrently.

背景:软件气味反映了软件中的次优模式。以类似的方式，社区气味考虑软件团队的组织和社会结构中的次优模式。相关的工作进行了实证研究，以确定社区气味和软件气味在体系结构和代码级别之间的关系。然而，社区气味与设计气味之间的关系仍然未知。目的:本文对软件项目发展过程中社区气味和设计气味之间的关系进行了实证研究。方法:我们采用相关性、趋势和信息增益分析三种统计方法，对10个大型Apache开源软件项目的100个版本的社区和设计气味之间的关系进行了实证研究。结果:我们的结果揭示了社区和设计气味之间的关系在分析的项目中是不同的。我们发现，在大多数分析项目中，一种类型的社区气味(当开发人员在没有同级通信的情况下孤立工作时——缺失链接)与设计气味具有显著的相关性和趋势相似性。此外，我们的统计模型结果显示，与其他社区相关因素相比，社区气味与设计气味的相关性更大。结论:我们的研究结果发现社区气味(特别是Missing Links气味)与设计气味存在关系。根据我们的发现，我们讨论了特定的社区气味重构技术，这些技术在重构设计气味时应该一起完成，这样与项目的社会和技术(设计)方面相关的问题就可以同时管理。

{"title":"Analyzing the Relationship between Community and Design Smells in Open-Source Software Projects: An Empirical Study","authors":"Haris Mumtaz, Paramvir Singh, Kelly Blincoe","doi":"10.1145/3544902.3546249","DOIUrl":"https://doi.org/10.1145/3544902.3546249","url":null,"abstract":"Background: Software smells reflect the sub-optimal patterns in the software. In a similar way, community smells consider the sub-optimal patterns in the organizational and social structures of software teams. Related work performed empirical studies to identify the relationship between community smells and software smells at the architecture and code levels. However, how community smells relate with design smells is still unknown. Aims: In this paper, we empirically investigate the relationship between community smells and design smells during the evolution of software projects. Method: We apply three statistical methods: correlation, trend, and information gain analysis to empirically examine the relationship between community and design smells in 100 releases of 10 large-scale Apache open-source software projects. Results: Our results reveal that the relationship between community and design smells varies across the analyzed projects. We find significant correlations and trend similarities for one type of community smell (when developers work in isolation without peer communication—Missing Links) with design smells in most of the analyzed projects. Furthermore, the results of our statistical model disclose that community smells are more relevant for design smells compared to other community-related factors. Conclusion: Our results find that the relationship of community smells (in particular, the Missing Links smell) exists with design smells. Based on our findings, we discuss specific community smell refactoring techniques that should be done together when refactoring design smells so that the problems associated with the social and technical (design) aspects of the projects can be managed concurrently.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115100745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Do Static Analysis Tools Affect Software Quality when Using Test-driven Development? 当使用测试驱动开发时，静态分析工具会影响软件质量吗?

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2022-09-19 DOI: 10.1145/3544902.3546233

Simone Romano, Fiorella Zampetti, M. T. Baldassarre, M. D. Penta, G. Scanniello

Background. Test-Driven Development (TDD) is an agile software development practice, which encourages developers to write “quick-and-dirty” production code to make tests pass, and then apply refactoring to “clean” written code. However, previous studies have found that refactoring is not applied as often as the TDD process requires, potentially affecting software quality. Aims. We investigated the benefits of leveraging a Static Analysis Tool (SAT)—plugged-in the Integrated Development Environment (IDE)—on software quality, when applying TDD. Method. We conducted two controlled experiments, in which the participants—92, in total—performed an implementation task by applying TDD with or without a SAT highlighting the presence of code smells in their source code. We then analyzed the effect of the used SAT on software quality. Results. We found that, overall, the use of a SAT helped the participants to significantly improve software quality, yet the participants perceived TDD more difficult to be performed. Conclusions. The obtained results may impact: (i) practitioners, helping them improve their TDD practice through the adoption of proper settings and tools; (ii) educators, in better introducing TDD within their courses; and (iii) researchers, interested in developing better tool support for developers, or further studying TDD.

背景。测试驱动开发(TDD)是一种敏捷的软件开发实践，它鼓励开发人员编写“快速而肮脏”的产品代码以使测试通过，然后对“干净”的编写代码应用重构。然而，先前的研究发现重构并没有像TDD过程所要求的那样频繁地应用，这可能会影响软件质量。目标当应用TDD时，我们调查了利用静态分析工具(SAT)——插入到集成开发环境(IDE)中——对软件质量的好处。方法。我们进行了两个对照实验，其中参与者——总共92人——通过应用TDD(有或没有突出显示源代码中代码气味的SAT)来执行实现任务。然后我们分析了使用的SAT对软件质量的影响。结果。我们发现，总的来说，SAT的使用帮助参与者显著地提高了软件质量，然而参与者认为TDD更难执行。结论。获得的结果可能影响:(i)实践者，通过采用适当的设置和工具，帮助他们改进TDD实践;(ii)教育工作者，在他们的课程中更好地引入TDD;(iii)研究人员，有兴趣为开发人员开发更好的工具支持，或者进一步研究TDD。

{"title":"Do Static Analysis Tools Affect Software Quality when Using Test-driven Development?","authors":"Simone Romano, Fiorella Zampetti, M. T. Baldassarre, M. D. Penta, G. Scanniello","doi":"10.1145/3544902.3546233","DOIUrl":"https://doi.org/10.1145/3544902.3546233","url":null,"abstract":"Background. Test-Driven Development (TDD) is an agile software development practice, which encourages developers to write “quick-and-dirty” production code to make tests pass, and then apply refactoring to “clean” written code. However, previous studies have found that refactoring is not applied as often as the TDD process requires, potentially affecting software quality. Aims. We investigated the benefits of leveraging a Static Analysis Tool (SAT)—plugged-in the Integrated Development Environment (IDE)—on software quality, when applying TDD. Method. We conducted two controlled experiments, in which the participants—92, in total—performed an implementation task by applying TDD with or without a SAT highlighting the presence of code smells in their source code. We then analyzed the effect of the used SAT on software quality. Results. We found that, overall, the use of a SAT helped the participants to significantly improve software quality, yet the participants perceived TDD more difficult to be performed. Conclusions. The obtained results may impact: (i) practitioners, helping them improve their TDD practice through the adoption of proper settings and tools; (ii) educators, in better introducing TDD within their courses; and (iii) researchers, interested in developing better tool support for developers, or further studying TDD.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129013224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

An Empirical Study on the Occurrences of Code Smells in Open Source and Industrial Projects 开源和工业项目中代码气味发生的实证研究

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2022-09-19 DOI: 10.1145/3544902.3546634

Md. Masudur Rahman, A. Satter, Md. Mahbubul Alam Joarder, K. Sakib

Background: Reusing source code containing code smells can induce significant amount of maintenance time and cost. A list of code smells has been identified in the literature and developers are encouraged to avoid the smells from the very beginning while writing new code or reusing existing code, and it increases time and cost to identify and refactor the code after the development of a system. Again, remembering a long list of smells is difficult specially for the new developers. Besides, two different types of software development environment - open source and industry, might have an effect on the occurrences of code smells. Aims: A study on the occurrences of code smells in open source and industrial systems can provide insights about the most frequently occurring smells in each type of software system. The insights can make developers aware of the most frequent occurring smells, and researchers to focus on the improvement and innovation of automatic refactoring tools or techniques for the smells on priority basis. Method: We have conducted a study on 40 large scale Java systems, where 25 are open source and 15 are industrial systems, for 18 code smells. Results: The results show that 6 smells have not occurred in any system, and 12 smells have occurred 21,182 times in total where 60.66% in the open source systems and 39.34% in the industrial systems. Long Method, Complex Class and Long Parameter List have been seen as frequently occurring code smells. The one tailed t-test with 5% level of significant analysis has shown that there is no difference between the occurrences of 10 code smells in industrial and open source systems, and 2 smells are occurred more frequently in open source systems than industrial systems. Conclusions: Our findings conclude that all smells do not occur at the same frequency and some smells are very frequent. The short list of most frequently occurred smells can help developers to write or reuse source code carefully without inducing the smells from the beginning during software development. Our study also concludes that industry and open source environments do not have significant impact on the occurrences of code smells.

背景:重用包含代码气味的源代码会导致大量的维护时间和成本。在文献中已经确定了代码气味列表，并且鼓励开发人员在编写新代码或重用现有代码时从一开始就避免气味，并且在系统开发之后识别和重构代码会增加时间和成本。同样，记住一长串气味是很困难的，尤其是对新开发人员来说。此外，两种不同类型的软件开发环境——开源和工业——可能会对代码异味的出现产生影响。目的:对开源和工业系统中代码气味的研究可以提供关于每种类型的软件系统中最常见的气味的见解。这些见解可以使开发人员意识到最常见的气味，并使研究人员将重点放在优先级基础上对气味的自动重构工具或技术的改进和创新上。方法:我们对40个大型Java系统进行了研究，其中25个是开源的，15个是工业系统，有18种代码气味。结果:6种气味在所有系统中均未发生，其中12种气味共发生21,182次，其中开源系统占60.66%，工业系统占39.34%。长方法、复杂类和长参数列表被视为经常出现的代码气味。具有5%显著性分析水平的单尾t检验表明，在工业系统和开源系统中，10种代码气味的出现没有差异，而在开源系统中，2种气味的出现频率高于工业系统。结论:我们的研究结果表明，并非所有气味都以相同的频率出现，有些气味非常频繁。最常出现的气味的简短列表可以帮助开发人员仔细编写或重用源代码，而不会在软件开发过程中从一开始就引起气味。我们的研究还得出结论，工业和开源环境对代码气味的出现没有显著的影响。

{"title":"An Empirical Study on the Occurrences of Code Smells in Open Source and Industrial Projects","authors":"Md. Masudur Rahman, A. Satter, Md. Mahbubul Alam Joarder, K. Sakib","doi":"10.1145/3544902.3546634","DOIUrl":"https://doi.org/10.1145/3544902.3546634","url":null,"abstract":"Background: Reusing source code containing code smells can induce significant amount of maintenance time and cost. A list of code smells has been identified in the literature and developers are encouraged to avoid the smells from the very beginning while writing new code or reusing existing code, and it increases time and cost to identify and refactor the code after the development of a system. Again, remembering a long list of smells is difficult specially for the new developers. Besides, two different types of software development environment - open source and industry, might have an effect on the occurrences of code smells. Aims: A study on the occurrences of code smells in open source and industrial systems can provide insights about the most frequently occurring smells in each type of software system. The insights can make developers aware of the most frequent occurring smells, and researchers to focus on the improvement and innovation of automatic refactoring tools or techniques for the smells on priority basis. Method: We have conducted a study on 40 large scale Java systems, where 25 are open source and 15 are industrial systems, for 18 code smells. Results: The results show that 6 smells have not occurred in any system, and 12 smells have occurred 21,182 times in total where 60.66% in the open source systems and 39.34% in the industrial systems. Long Method, Complex Class and Long Parameter List have been seen as frequently occurring code smells. The one tailed t-test with 5% level of significant analysis has shown that there is no difference between the occurrences of 10 code smells in industrial and open source systems, and 2 smells are occurred more frequently in open source systems than industrial systems. Conclusions: Our findings conclude that all smells do not occur at the same frequency and some smells are very frequent. The short list of most frequently occurred smells can help developers to write or reuse source code carefully without inducing the smells from the beginning during software development. Our study also concludes that industry and open source environments do not have significant impact on the occurrences of code smells.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133792892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Preliminary Investigation of MLOps Practices in GitHub GitHub中MLOps实践的初步调查

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2022-09-19 DOI: 10.1145/3544902.3546636

Fabio Calefato, F. Lanubile, L. Quaranta

Background. The rapid and growing popularity of machine learning (ML) applications has led to an increasing interest in MLOps, that is, the practice of continuous integration and deployment (CI/CD) of ML-enabled systems. Aims. Since changes may affect not only the code but also the ML model parameters and the data themselves, the automation of traditional CI/CD needs to be extended to manage model retraining in production. Method. In this paper, we present an initial investigation of the MLOps practices implemented in a set of ML-enabled systems retrieved from GitHub, focusing on GitHub Actions and CML, two solutions to automate the development workflow. Results. Our preliminary results suggest that the adoption of MLOps workflows in open-source GitHub projects is currently rather limited. Conclusions. Issues are also identified, which can guide future research work.

背景。机器学习(ML)应用程序的快速和日益普及导致对MLOps的兴趣日益增加，即支持ML的系统的持续集成和部署(CI/CD)的实践。目标由于更改不仅会影响代码，还会影响ML模型参数和数据本身，因此需要扩展传统CI/CD的自动化，以管理生产中的模型再培训。方法。在本文中，我们对从GitHub检索的一组支持ml的系统中实现的MLOps实践进行了初步调查，重点关注GitHub Actions和CML，这两种自动化开发工作流的解决方案。结果。我们的初步结果表明，在开源GitHub项目中采用MLOps工作流目前相当有限。结论。指出了存在的问题，对今后的研究工作具有指导意义。

引用次数: 7

On the Relationship Between Story Points and Development Effort in Agile Open-Source Software 敏捷开源软件中故事点与开发努力的关系

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2022-09-19 DOI: 10.1145/3544902.3546238

Vali Tawosi, Federica Sarro

Background: Previous work has provided some initial evidence that Story Point (SP) estimated by human-experts may not accurately reflect the effort needed to realise Agile software projects. Aims: In this paper, we aim to shed further light on the relationship between SP and Agile software development effort to understand the extent to which human-estimated SP is a good indicator of user story development effort expressed in terms of time needed to realise it. Method: To this end, we carry out a thorough empirical study involving a total of 37,440 unique user stories from 37 different open-source projects publicly available in the TAWOS dataset. For these user stories, we investigate the correlation between the issue development time (or its approximation when the actual time is not available) and the SP estimated by human-expert by using three widely-used correlation statistics (i.e., Pearson, Kendall and Spearman). Furthermore, we investigate SP estimations made by the human-experts in order to assess the extent to which they are consistent in their estimations throughout the project, i.e., we assess whether the development time of the issues is proportionate to the SP assigned to them. Results: The average results across the three correlation measures reveal that the correlation between the human-expert estimated SP and the approximated development time is strong for only 7% of the projects investigated, and medium (58%) or low (35%) for the remaining ones. Similar results are obtained when the actual development time is considered. Our empirical study also reveals that the estimation made is often not consistent throughout the project and the human estimator tends to misestimate in 78% of the cases. Conclusions: Our empirical results suggest that SP might not be an accurate indicator of open-source Agile software development effort expressed in terms of development time. The impact of its use as an indicator of effort should be explored in future work, for example as a cost-driver in automated effort estimation models or as the prediction target.

背景:以前的工作已经提供了一些初步的证据，证明人类专家估计的故事点(SP)可能不能准确地反映实现敏捷软件项目所需的工作量。目的:在本文中，我们的目标是进一步阐明SP与敏捷软件开发工作之间的关系，以了解在何种程度上，人类估计的SP是一个很好的指标，以实现它所需的时间来表示用户故事开发工作。方法:为此，我们对TAWOS数据集中公开的37个不同开源项目的37440个独特用户故事进行了深入的实证研究。对于这些用户故事，我们通过使用三种广泛使用的相关统计(即Pearson, Kendall和Spearman)来研究问题开发时间(或在实际时间不可用时其近似值)与人类专家估计的SP之间的相关性。此外，我们调查了人类专家所做的SP估计，以评估他们在整个项目中估计的一致性程度，即，我们评估问题的开发时间是否与分配给他们的SP成比例。结果:三个相关度量的平均结果表明，人类专家估计的SP和估计的开发时间之间的相关性只有7%的项目是强的，其余的项目是中等(58%)或低(35%)的。当考虑实际开发时间时，得到了类似的结果。我们的实证研究还表明，在整个项目中所做的估计往往是不一致的，在78%的情况下，人类估计者倾向于错误估计。结论:我们的实证结果表明，SP可能不是以开发时间表示的开源敏捷软件开发工作的准确指标。应该在未来的工作中探索将其用作工作指标的影响，例如作为自动化工作估计模型中的成本驱动因素或作为预测目标。

{"title":"On the Relationship Between Story Points and Development Effort in Agile Open-Source Software","authors":"Vali Tawosi, Federica Sarro","doi":"10.1145/3544902.3546238","DOIUrl":"https://doi.org/10.1145/3544902.3546238","url":null,"abstract":"Background: Previous work has provided some initial evidence that Story Point (SP) estimated by human-experts may not accurately reflect the effort needed to realise Agile software projects. Aims: In this paper, we aim to shed further light on the relationship between SP and Agile software development effort to understand the extent to which human-estimated SP is a good indicator of user story development effort expressed in terms of time needed to realise it. Method: To this end, we carry out a thorough empirical study involving a total of 37,440 unique user stories from 37 different open-source projects publicly available in the TAWOS dataset. For these user stories, we investigate the correlation between the issue development time (or its approximation when the actual time is not available) and the SP estimated by human-expert by using three widely-used correlation statistics (i.e., Pearson, Kendall and Spearman). Furthermore, we investigate SP estimations made by the human-experts in order to assess the extent to which they are consistent in their estimations throughout the project, i.e., we assess whether the development time of the issues is proportionate to the SP assigned to them. Results: The average results across the three correlation measures reveal that the correlation between the human-expert estimated SP and the approximated development time is strong for only 7% of the projects investigated, and medium (58%) or low (35%) for the remaining ones. Similar results are obtained when the actual development time is considered. Our empirical study also reveals that the estimation made is often not consistent throughout the project and the human estimator tends to misestimate in 78% of the cases. Conclusions: Our empirical results suggest that SP might not be an accurate indicator of open-source Agile software development effort expressed in terms of development time. The impact of its use as an indicator of effort should be explored in future work, for example as a cost-driver in automated effort estimation models or as the prediction target.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122762555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Android API Field Evolution and Its Induced Compatibility Issues Android API领域的演变及其引发的兼容性问题

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2022-09-19 DOI: 10.1145/3544902.3546242

Tarek Mahmud, Meiru Che, Guowei Yang

Background: The continuous evolution of the Android operating system necessitates regular API updates, which may affect the functionality of Android apps. Recent studies investigated API evolution to ensure the reliability of Android apps; however, they focused on API methods alone. Aim: We aim to empirically investigate how Android API fields evolve, and how this evolution affects the compatibility of Android apps. Method: We conducted a study based on real-world app development history data involving 11098 tags out of 105 popular open-source Android apps. Results: Our study yields interesting findings, e.g., on average two API field compatibility issues exist per app, different types of checks are preferred when addressing different types of compatibility issues, and fixing compatibility issues induced by API field evolution takes more time than fixing compatibility issues induced by API method evolution. Conclusion: These findings will help developers and researchers better understand, detect, and handle Android compatibility issues induced by API field evolution.

背景:Android操作系统的不断发展需要定期更新API，这可能会影响Android应用的功能。最近的研究调查了API的发展，以确保Android应用的可靠性;然而，他们只关注API方法。目的:我们的目标是实证研究Android API领域是如何演变的，以及这种演变如何影响Android应用的兼容性。方法:我们基于真实世界的应用开发历史数据进行了一项研究，涉及105个流行的开源Android应用中的11098个标签。结果:我们的研究产生了有趣的发现，例如，每个应用程序平均存在两个API字段兼容性问题，在解决不同类型的兼容性问题时首选不同类型的检查，修复由API字段演变引起的兼容性问题比修复由API方法演变引起的兼容性问题需要更多的时间。结论:这些发现将有助于开发者和研究人员更好地理解、检测和处理由API领域演变引起的Android兼容性问题。

引用次数: 2

Bayesian Analysis of Bug-Fixing Time using Report Data 使用报告数据的bug修复时间的贝叶斯分析

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2022-09-19 DOI: 10.1145/3544902.3546256

Renan Vieira, Diego Mesquita, C. Mattos, Ricardo Britto, Lincoln S. Rocha, J. Gomes

Background: Bug-fixing is the crux of software maintenance. It entails tending to heaps of bug reports using limited resources. Using historical data, we can ask questions that contribute to better-informed allocation heuristics. The caveat here is that often there is not enough data to provide a sound response. This issue is especially prominent for young projects. Also, answers may vary from project to project. Consequently, it is impossible to generalize results without assuming a notion of relatedness between projects. Aims: Evaluate the independent impact of three report features in the bug-fixing time (BFT), generalizing results from many projects: bug priority, code-churn size in bug fixing commits, and existence of links to other reports (e.g., depends on or blocks other bug reports). Method: We analyze 55 projects from the Apache ecosystem using Bayesian statistics. Similar to standard random effects methodology, we assume each project’s average BFT is a dispersed version of a global average BFT that we want to assess. We split the data based on feature values/range (e.g., with or without links). For each split, we compute a posterior distribution over its respective global BFT. Finally, we compare the posteriors to establish the feature’s effect on the BFT. We run independent analyses for each feature. Results: Our results show that the existence of links and higher code-churn values lead to BFTs that are at least twice as long. On the other hand, considering three levels of priority (low, medium, and high), we observe no difference in the BFT. Conclusion: To the best of our knowledge, this is the first study using hierarchical Bayes to extrapolate results from multiple projects and assess the global effect of different attributes on the BFT. We use this methodology to gain insight on how links, priority, and code-churn size impact the BFT. On top of that, our posteriors can be used as a prior to analyze novel projects, potentially young and scarce on data. We also believe our methodology can be reused for other generalization studies in empirical software engineering.

背景:bug修复是软件维护的关键。它需要使用有限的资源处理大量的bug报告。利用历史数据，我们可以提出一些问题，有助于更好地了解分配启发式。这里需要注意的是，通常没有足够的数据来提供合理的回应。这个问题对于年轻的项目来说尤其突出。此外，答案可能因项目而异。因此，如果不假设项目之间的关系，就不可能概括结果。目的:评估三个报告特性在bug修复时间(BFT)中的独立影响，概括许多项目的结果:bug优先级，bug修复提交中的代码流失量，以及是否存在与其他报告的链接(例如，依赖或阻止其他bug报告)。方法:采用贝叶斯统计方法对Apache生态系统中的55个项目进行分析。与标准随机效应方法类似，我们假设每个项目的平均BFT是我们想要评估的全球平均BFT的分散版本。我们根据特征值/范围(例如，有或没有链接)拆分数据。对于每个分裂，我们计算其各自全局BFT的后验分布。最后，我们比较后验来确定特征对BFT的影响。我们对每个特征进行独立分析。结果:我们的结果表明，链接的存在和更高的代码流失值导致bft至少长两倍。另一方面，考虑到三个优先级(低、中、高)，我们观察到BFT没有差异。结论:据我们所知，这是第一个使用分层贝叶斯从多个项目中推断结果并评估不同属性对BFT的整体影响的研究。我们使用这种方法来深入了解链接、优先级和代码流失大小如何影响BFT。最重要的是，我们的后验可以用作分析新项目的先验，这些项目可能年轻且数据匮乏。我们也相信我们的方法可以在经验软件工程中的其他泛化研究中重用。

{"title":"Bayesian Analysis of Bug-Fixing Time using Report Data","authors":"Renan Vieira, Diego Mesquita, C. Mattos, Ricardo Britto, Lincoln S. Rocha, J. Gomes","doi":"10.1145/3544902.3546256","DOIUrl":"https://doi.org/10.1145/3544902.3546256","url":null,"abstract":"Background: Bug-fixing is the crux of software maintenance. It entails tending to heaps of bug reports using limited resources. Using historical data, we can ask questions that contribute to better-informed allocation heuristics. The caveat here is that often there is not enough data to provide a sound response. This issue is especially prominent for young projects. Also, answers may vary from project to project. Consequently, it is impossible to generalize results without assuming a notion of relatedness between projects. Aims: Evaluate the independent impact of three report features in the bug-fixing time (BFT), generalizing results from many projects: bug priority, code-churn size in bug fixing commits, and existence of links to other reports (e.g., depends on or blocks other bug reports). Method: We analyze 55 projects from the Apache ecosystem using Bayesian statistics. Similar to standard random effects methodology, we assume each project’s average BFT is a dispersed version of a global average BFT that we want to assess. We split the data based on feature values/range (e.g., with or without links). For each split, we compute a posterior distribution over its respective global BFT. Finally, we compare the posteriors to establish the feature’s effect on the BFT. We run independent analyses for each feature. Results: Our results show that the existence of links and higher code-churn values lead to BFTs that are at least twice as long. On the other hand, considering three levels of priority (low, medium, and high), we observe no difference in the BFT. Conclusion: To the best of our knowledge, this is the first study using hierarchical Bayes to extrapolate results from multiple projects and assess the global effect of different attributes on the BFT. We use this methodology to gain insight on how links, priority, and code-churn size impact the BFT. On top of that, our posteriors can be used as a prior to analyze novel projects, potentially young and scarce on data. We also believe our methodology can be reused for other generalization studies in empirical software engineering.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128271367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Characterizing the Usage of CI Tools in ML Projects 描述机器学习项目中CI工具的使用

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2022-09-19 DOI: 10.1145/3544902.3546237

D. Rzig, Foyzul Hassan, Chetan Bansal, Nachiappan Nagappan

Background: Continuous Integration (CI) has become widely adopted to enable faster code change integration. Meanwhile, Machine Learning (ML) is being used by software applications for previously unsolvable real-world scenarios. ML projects employ development processes different from those of traditional software projects, but they too require multiple iterations in their development, and may benefit from CI. Aims: While there are many works covering CI within traditional software, none of them empirically explored the adoption of CI and its associated issues within ML projects. To address this knowledge gap, we performed an empirical analysis comparing CI adoption between ML and Non-ML projects. Method: We developed TraVanalyzer, the first Travis CI configuration analyzer, to analyze the CI practices of ML projects, and developed a CI log analyzer to identify the different CI problems of ML projects. Results: We found that Travis CI is the most popular CI tool for ML projects, and that their CI adoption lags behind that of Non-ML projects, but that ML projects which adopted CI, used it for building, testing, code analysis, and automatic deployment more than Non-ML projects. Furthermore, while CI in ML projects is as likely to experience problems as CI in Non-ML projects, it has more varied reasons for build-breakage. The most frequent CI failures of ML projects are due to testing-related problems, similar to Non-ML and OSS CI failures. Conclusion: To the best of our knowledge, this is the first work that has analyzed ML projects’ CI usage, practices, and issues, and contextualized its results by comparing them with similar Non-ML projects. It provides findings for researchers and ML developers to identify possible improvement scopes for CI in ML projects.

背景:持续集成(CI)已被广泛采用，以实现更快的代码更改集成。与此同时，机器学习(ML)正在被软件应用程序用于解决以前无法解决的现实世界场景。ML项目采用的开发过程与传统的软件项目不同，但是它们在开发过程中也需要多次迭代，并且可能从CI中受益。目标:虽然有很多作品涉及传统软件中的持续集成，但没有一个是经验性地探讨持续集成的采用及其在ML项目中的相关问题。为了解决这一知识差距，我们对ML和非ML项目之间的CI采用进行了实证分析。方法:开发了第一个Travis CI配置分析器TraVanalyzer来分析ML项目的CI实践，并开发了一个CI日志分析器来识别ML项目的不同CI问题。结果:我们发现Travis CI是ML项目中最流行的CI工具，并且他们的CI采用落后于非ML项目，但是采用CI的ML项目比非ML项目更多地使用它进行构建、测试、代码分析和自动部署。此外，虽然ML项目中的CI与非ML项目中的CI一样可能遇到问题，但它有更多不同的构建破坏原因。ML项目中最常见的CI失败是由于与测试相关的问题，类似于非ML和OSS CI失败。结论:据我们所知，这是第一个分析机器学习项目的CI使用、实践和问题的工作，并通过将其与类似的非机器学习项目进行比较，将其结果置于环境中。它为研究人员和机器学习开发人员提供了发现，以确定机器学习项目中CI的可能改进范围。

{"title":"Characterizing the Usage of CI Tools in ML Projects","authors":"D. Rzig, Foyzul Hassan, Chetan Bansal, Nachiappan Nagappan","doi":"10.1145/3544902.3546237","DOIUrl":"https://doi.org/10.1145/3544902.3546237","url":null,"abstract":"Background: Continuous Integration (CI) has become widely adopted to enable faster code change integration. Meanwhile, Machine Learning (ML) is being used by software applications for previously unsolvable real-world scenarios. ML projects employ development processes different from those of traditional software projects, but they too require multiple iterations in their development, and may benefit from CI. Aims: While there are many works covering CI within traditional software, none of them empirically explored the adoption of CI and its associated issues within ML projects. To address this knowledge gap, we performed an empirical analysis comparing CI adoption between ML and Non-ML projects. Method: We developed TraVanalyzer, the first Travis CI configuration analyzer, to analyze the CI practices of ML projects, and developed a CI log analyzer to identify the different CI problems of ML projects. Results: We found that Travis CI is the most popular CI tool for ML projects, and that their CI adoption lags behind that of Non-ML projects, but that ML projects which adopted CI, used it for building, testing, code analysis, and automatic deployment more than Non-ML projects. Furthermore, while CI in ML projects is as likely to experience problems as CI in Non-ML projects, it has more varied reasons for build-breakage. The most frequent CI failures of ML projects are due to testing-related problems, similar to Non-ML and OSS CI failures. Conclusion: To the best of our knowledge, this is the first work that has analyzed ML projects’ CI usage, practices, and issues, and contextualized its results by comparing them with similar Non-ML projects. It provides findings for researchers and ML developers to identify possible improvement scopes for CI in ML projects.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130527973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Does Collaborative Editing Help Mitigate Security Vulnerabilities in Crowd-Shared IoT Code Examples? 协同编辑是否有助于减轻人群共享物联网代码示例中的安全漏洞?

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2022-09-19 DOI: 10.1145/3544902.3546235

Madhu Selvaraj, Gias Uddin

Background: With the proliferation of crowd-sourced developer forums, Software developers are increasingly sharing more coding solutions to programming problems with others in forums. The decentralized nature of knowledge sharing on sites has raised the concern of sharing security vulnerable code, which then can be reused into mission critical software systems - making those systems vulnerable in the process. Collaborative editing has been introduced in forums like Stack Overflow to improve the quality of the shared contents. Aim: In this paper, we investigate whether code editing can mitigate shared vulnerable code examples by analyzing IoT code snippets and their revisions in three Stack Exchange sites: Stack Overflow, Arduino, and Raspberry Pi. Method:We analyze the vulnerabilities present in shared IoT C/C++ code snippets, as C/C++ is one of the most widely used languages in mission-critical devices and low-powered IoT devices. We further analyse the revisions made to these code snippets, and their effects. Results: We find several vulnerabilities such as CWE 788 - Access of Memory Location After End of Buffer , in 740 code snippets. However, we find the vast majority of posts are not revised, or revisions are not made to the code snippets themselves (598 out of 740). We also find that revisions are most likely to result in no change to the number of vulnerabilities in a code snippet rather than deteriorating or improving the snippet. Conclusions: We conclude that the current collaborative editing system in the forums may be insufficient to help mitigate vulnerabilities in the shared code.

背景:随着众包开发人员论坛的激增，软件开发人员越来越多地在论坛上与其他人分享编程问题的编码解决方案。网站上知识共享的分散性引发了人们对共享易受安全攻击的代码的担忧，这些代码随后可能被重用到关键任务软件系统中，从而使这些系统在此过程中容易受到攻击。像Stack Overflow这样的论坛已经引入了协作编辑，以提高共享内容的质量。目的:在本文中，我们通过分析三个Stack Exchange站点(Stack Overflow, Arduino和Raspberry Pi)中的物联网代码片段及其修订，研究代码编辑是否可以减轻共享的易受攻击代码示例。方法:我们分析共享物联网C/ c++代码片段中存在的漏洞，因为C/ c++是关键任务设备和低功耗物联网设备中使用最广泛的语言之一。我们进一步分析对这些代码片段所做的修改及其影响。结果:我们在740个代码片段中发现了几个漏洞，例如CWE 788 -缓冲区结束后内存位置的访问。然而，我们发现绝大多数帖子没有修改，或者没有对代码片段本身进行修改(740篇中的598篇)。我们还发现，修订很可能导致代码段中的漏洞数量没有变化，而不是恶化或改进代码段。结论:我们得出结论，当前论坛中的协作编辑系统可能不足以帮助减轻共享代码中的漏洞。

{"title":"Does Collaborative Editing Help Mitigate Security Vulnerabilities in Crowd-Shared IoT Code Examples?","authors":"Madhu Selvaraj, Gias Uddin","doi":"10.1145/3544902.3546235","DOIUrl":"https://doi.org/10.1145/3544902.3546235","url":null,"abstract":"Background: With the proliferation of crowd-sourced developer forums, Software developers are increasingly sharing more coding solutions to programming problems with others in forums. The decentralized nature of knowledge sharing on sites has raised the concern of sharing security vulnerable code, which then can be reused into mission critical software systems - making those systems vulnerable in the process. Collaborative editing has been introduced in forums like Stack Overflow to improve the quality of the shared contents. Aim: In this paper, we investigate whether code editing can mitigate shared vulnerable code examples by analyzing IoT code snippets and their revisions in three Stack Exchange sites: Stack Overflow, Arduino, and Raspberry Pi. Method:We analyze the vulnerabilities present in shared IoT C/C++ code snippets, as C/C++ is one of the most widely used languages in mission-critical devices and low-powered IoT devices. We further analyse the revisions made to these code snippets, and their effects. Results: We find several vulnerabilities such as CWE 788 - Access of Memory Location After End of Buffer , in 740 code snippets. However, we find the vast majority of posts are not revised, or revisions are not made to the code snippets themselves (598 out of 740). We also find that revisions are most likely to result in no change to the number of vulnerabilities in a code snippet rather than deteriorating or improving the snippet. Conclusions: We conclude that the current collaborative editing system in the forums may be insufficient to help mitigate vulnerabilities in the shared code.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"23 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134395956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Experience Report on Technical Debt in Pull Requests: Challenges and Lessons Learned 关于拉式请求中的技术债务的经验报告:挑战和教训

Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

Pub Date : 2022-09-19 DOI: 10.1145/3544902.3546637

Shubhashis Karmakar, Zadia Codabux, M. Vidoni

Background: GitHub is a collaborative platform for global software development, where Pull Requests (PRs) are essential to bridge code changes with version control. However, developers often trade software quality for faster implementation, incurring Technical Debt (TD). When developers undertake reviewers’ roles and evaluate PRs, they can often detect TD instances, leading to either PR rejection or discussions. Aims: We investigated whether Pull Request Comments (PRCs) indicate TD by assessing three large-scale repositories: Spark, Kafka, and React. Method: We combined manual classification with automated detection using machine learning and deep learning models. Results: We classified two datasets and found that 37.7 and 38.7% of PRCs indicate TD, respectively. Our best model achieved F1 = 0.85 when classifying TD during the validation phase. Conclusions: We faced several challenges during this process, which may hint that TD in PRCs is discussed differently from other software artifacts (e.g., code comments, commits, issues, or discussion forums). Thus, we present challenges and lessons learned to assist researchers in pursuing this area of research.

背景:GitHub是一个用于全球软件开发的协作平台，其中Pull Requests (pr)对于将代码更改与版本控制连接起来至关重要。然而，开发人员经常为了更快的实现而牺牲软件质量，从而产生技术债务(TD)。当开发人员承担审查者的角色并评估PR时，他们通常可以发现TD实例，从而导致PR拒绝或讨论。目的:我们通过评估三个大型存储库:Spark、Kafka和React，调查了Pull Request Comments (prc)是否表明TD。方法:采用机器学习和深度学习模型，将人工分类与自动检测相结合。结果:我们对两个数据集进行分类，发现分别有37.7%和38.7%的prc提示TD。在验证阶段对TD进行分类时，我们的最佳模型达到了F1 = 0.85。结论:我们在这个过程中面临了几个挑战，这可能暗示prc中的TD与其他软件工件(例如，代码注释、提交、问题或讨论论坛)的讨论方式不同。因此，我们提出了挑战和经验教训，以帮助研究人员进行这一领域的研究。

{"title":"An Experience Report on Technical Debt in Pull Requests: Challenges and Lessons Learned","authors":"Shubhashis Karmakar, Zadia Codabux, M. Vidoni","doi":"10.1145/3544902.3546637","DOIUrl":"https://doi.org/10.1145/3544902.3546637","url":null,"abstract":"Background: GitHub is a collaborative platform for global software development, where Pull Requests (PRs) are essential to bridge code changes with version control. However, developers often trade software quality for faster implementation, incurring Technical Debt (TD). When developers undertake reviewers’ roles and evaluate PRs, they can often detect TD instances, leading to either PR rejection or discussions. Aims: We investigated whether Pull Request Comments (PRCs) indicate TD by assessing three large-scale repositories: Spark, Kafka, and React. Method: We combined manual classification with automated detection using machine learning and deep learning models. Results: We classified two datasets and found that 37.7 and 38.7% of PRCs indicate TD, respectively. Our best model achieved F1 = 0.85 when classifying TD during the validation phase. Conclusions: We faced several challenges during this process, which may hint that TD in PRCs is discussed differently from other software artifacts (e.g., code comments, commits, issues, or discussion forums). Thus, we present challenges and lessons learned to assist researchers in pursuing this area of research.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134041006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1