Background: Software smells reflect the sub-optimal patterns in the software. In a similar way, community smells consider the sub-optimal patterns in the organizational and social structures of software teams. Related work performed empirical studies to identify the relationship between community smells and software smells at the architecture and code levels. However, how community smells relate with design smells is still unknown. Aims: In this paper, we empirically investigate the relationship between community smells and design smells during the evolution of software projects. Method: We apply three statistical methods: correlation, trend, and information gain analysis to empirically examine the relationship between community and design smells in 100 releases of 10 large-scale Apache open-source software projects. Results: Our results reveal that the relationship between community and design smells varies across the analyzed projects. We find significant correlations and trend similarities for one type of community smell (when developers work in isolation without peer communication—Missing Links) with design smells in most of the analyzed projects. Furthermore, the results of our statistical model disclose that community smells are more relevant for design smells compared to other community-related factors. Conclusion: Our results find that the relationship of community smells (in particular, the Missing Links smell) exists with design smells. Based on our findings, we discuss specific community smell refactoring techniques that should be done together when refactoring design smells so that the problems associated with the social and technical (design) aspects of the projects can be managed concurrently.
{"title":"Analyzing the Relationship between Community and Design Smells in Open-Source Software Projects: An Empirical Study","authors":"Haris Mumtaz, Paramvir Singh, Kelly Blincoe","doi":"10.1145/3544902.3546249","DOIUrl":"https://doi.org/10.1145/3544902.3546249","url":null,"abstract":"Background: Software smells reflect the sub-optimal patterns in the software. In a similar way, community smells consider the sub-optimal patterns in the organizational and social structures of software teams. Related work performed empirical studies to identify the relationship between community smells and software smells at the architecture and code levels. However, how community smells relate with design smells is still unknown. Aims: In this paper, we empirically investigate the relationship between community smells and design smells during the evolution of software projects. Method: We apply three statistical methods: correlation, trend, and information gain analysis to empirically examine the relationship between community and design smells in 100 releases of 10 large-scale Apache open-source software projects. Results: Our results reveal that the relationship between community and design smells varies across the analyzed projects. We find significant correlations and trend similarities for one type of community smell (when developers work in isolation without peer communication—Missing Links) with design smells in most of the analyzed projects. Furthermore, the results of our statistical model disclose that community smells are more relevant for design smells compared to other community-related factors. Conclusion: Our results find that the relationship of community smells (in particular, the Missing Links smell) exists with design smells. Based on our findings, we discuss specific community smell refactoring techniques that should be done together when refactoring design smells so that the problems associated with the social and technical (design) aspects of the projects can be managed concurrently.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115100745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simone Romano, Fiorella Zampetti, M. T. Baldassarre, M. D. Penta, G. Scanniello
Background. Test-Driven Development (TDD) is an agile software development practice, which encourages developers to write “quick-and-dirty” production code to make tests pass, and then apply refactoring to “clean” written code. However, previous studies have found that refactoring is not applied as often as the TDD process requires, potentially affecting software quality. Aims. We investigated the benefits of leveraging a Static Analysis Tool (SAT)—plugged-in the Integrated Development Environment (IDE)—on software quality, when applying TDD. Method. We conducted two controlled experiments, in which the participants—92, in total—performed an implementation task by applying TDD with or without a SAT highlighting the presence of code smells in their source code. We then analyzed the effect of the used SAT on software quality. Results. We found that, overall, the use of a SAT helped the participants to significantly improve software quality, yet the participants perceived TDD more difficult to be performed. Conclusions. The obtained results may impact: (i) practitioners, helping them improve their TDD practice through the adoption of proper settings and tools; (ii) educators, in better introducing TDD within their courses; and (iii) researchers, interested in developing better tool support for developers, or further studying TDD.
{"title":"Do Static Analysis Tools Affect Software Quality when Using Test-driven Development?","authors":"Simone Romano, Fiorella Zampetti, M. T. Baldassarre, M. D. Penta, G. Scanniello","doi":"10.1145/3544902.3546233","DOIUrl":"https://doi.org/10.1145/3544902.3546233","url":null,"abstract":"Background. Test-Driven Development (TDD) is an agile software development practice, which encourages developers to write “quick-and-dirty” production code to make tests pass, and then apply refactoring to “clean” written code. However, previous studies have found that refactoring is not applied as often as the TDD process requires, potentially affecting software quality. Aims. We investigated the benefits of leveraging a Static Analysis Tool (SAT)—plugged-in the Integrated Development Environment (IDE)—on software quality, when applying TDD. Method. We conducted two controlled experiments, in which the participants—92, in total—performed an implementation task by applying TDD with or without a SAT highlighting the presence of code smells in their source code. We then analyzed the effect of the used SAT on software quality. Results. We found that, overall, the use of a SAT helped the participants to significantly improve software quality, yet the participants perceived TDD more difficult to be performed. Conclusions. The obtained results may impact: (i) practitioners, helping them improve their TDD practice through the adoption of proper settings and tools; (ii) educators, in better introducing TDD within their courses; and (iii) researchers, interested in developing better tool support for developers, or further studying TDD.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129013224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md. Masudur Rahman, A. Satter, Md. Mahbubul Alam Joarder, K. Sakib
Background: Reusing source code containing code smells can induce significant amount of maintenance time and cost. A list of code smells has been identified in the literature and developers are encouraged to avoid the smells from the very beginning while writing new code or reusing existing code, and it increases time and cost to identify and refactor the code after the development of a system. Again, remembering a long list of smells is difficult specially for the new developers. Besides, two different types of software development environment - open source and industry, might have an effect on the occurrences of code smells. Aims: A study on the occurrences of code smells in open source and industrial systems can provide insights about the most frequently occurring smells in each type of software system. The insights can make developers aware of the most frequent occurring smells, and researchers to focus on the improvement and innovation of automatic refactoring tools or techniques for the smells on priority basis. Method: We have conducted a study on 40 large scale Java systems, where 25 are open source and 15 are industrial systems, for 18 code smells. Results: The results show that 6 smells have not occurred in any system, and 12 smells have occurred 21,182 times in total where 60.66% in the open source systems and 39.34% in the industrial systems. Long Method, Complex Class and Long Parameter List have been seen as frequently occurring code smells. The one tailed t-test with 5% level of significant analysis has shown that there is no difference between the occurrences of 10 code smells in industrial and open source systems, and 2 smells are occurred more frequently in open source systems than industrial systems. Conclusions: Our findings conclude that all smells do not occur at the same frequency and some smells are very frequent. The short list of most frequently occurred smells can help developers to write or reuse source code carefully without inducing the smells from the beginning during software development. Our study also concludes that industry and open source environments do not have significant impact on the occurrences of code smells.
{"title":"An Empirical Study on the Occurrences of Code Smells in Open Source and Industrial Projects","authors":"Md. Masudur Rahman, A. Satter, Md. Mahbubul Alam Joarder, K. Sakib","doi":"10.1145/3544902.3546634","DOIUrl":"https://doi.org/10.1145/3544902.3546634","url":null,"abstract":"Background: Reusing source code containing code smells can induce significant amount of maintenance time and cost. A list of code smells has been identified in the literature and developers are encouraged to avoid the smells from the very beginning while writing new code or reusing existing code, and it increases time and cost to identify and refactor the code after the development of a system. Again, remembering a long list of smells is difficult specially for the new developers. Besides, two different types of software development environment - open source and industry, might have an effect on the occurrences of code smells. Aims: A study on the occurrences of code smells in open source and industrial systems can provide insights about the most frequently occurring smells in each type of software system. The insights can make developers aware of the most frequent occurring smells, and researchers to focus on the improvement and innovation of automatic refactoring tools or techniques for the smells on priority basis. Method: We have conducted a study on 40 large scale Java systems, where 25 are open source and 15 are industrial systems, for 18 code smells. Results: The results show that 6 smells have not occurred in any system, and 12 smells have occurred 21,182 times in total where 60.66% in the open source systems and 39.34% in the industrial systems. Long Method, Complex Class and Long Parameter List have been seen as frequently occurring code smells. The one tailed t-test with 5% level of significant analysis has shown that there is no difference between the occurrences of 10 code smells in industrial and open source systems, and 2 smells are occurred more frequently in open source systems than industrial systems. Conclusions: Our findings conclude that all smells do not occur at the same frequency and some smells are very frequent. The short list of most frequently occurred smells can help developers to write or reuse source code carefully without inducing the smells from the beginning during software development. Our study also concludes that industry and open source environments do not have significant impact on the occurrences of code smells.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133792892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background. The rapid and growing popularity of machine learning (ML) applications has led to an increasing interest in MLOps, that is, the practice of continuous integration and deployment (CI/CD) of ML-enabled systems. Aims. Since changes may affect not only the code but also the ML model parameters and the data themselves, the automation of traditional CI/CD needs to be extended to manage model retraining in production. Method. In this paper, we present an initial investigation of the MLOps practices implemented in a set of ML-enabled systems retrieved from GitHub, focusing on GitHub Actions and CML, two solutions to automate the development workflow. Results. Our preliminary results suggest that the adoption of MLOps workflows in open-source GitHub projects is currently rather limited. Conclusions. Issues are also identified, which can guide future research work.
{"title":"A Preliminary Investigation of MLOps Practices in GitHub","authors":"Fabio Calefato, F. Lanubile, L. Quaranta","doi":"10.1145/3544902.3546636","DOIUrl":"https://doi.org/10.1145/3544902.3546636","url":null,"abstract":"Background. The rapid and growing popularity of machine learning (ML) applications has led to an increasing interest in MLOps, that is, the practice of continuous integration and deployment (CI/CD) of ML-enabled systems. Aims. Since changes may affect not only the code but also the ML model parameters and the data themselves, the automation of traditional CI/CD needs to be extended to manage model retraining in production. Method. In this paper, we present an initial investigation of the MLOps practices implemented in a set of ML-enabled systems retrieved from GitHub, focusing on GitHub Actions and CML, two solutions to automate the development workflow. Results. Our preliminary results suggest that the adoption of MLOps workflows in open-source GitHub projects is currently rather limited. Conclusions. Issues are also identified, which can guide future research work.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115858321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Previous work has provided some initial evidence that Story Point (SP) estimated by human-experts may not accurately reflect the effort needed to realise Agile software projects. Aims: In this paper, we aim to shed further light on the relationship between SP and Agile software development effort to understand the extent to which human-estimated SP is a good indicator of user story development effort expressed in terms of time needed to realise it. Method: To this end, we carry out a thorough empirical study involving a total of 37,440 unique user stories from 37 different open-source projects publicly available in the TAWOS dataset. For these user stories, we investigate the correlation between the issue development time (or its approximation when the actual time is not available) and the SP estimated by human-expert by using three widely-used correlation statistics (i.e., Pearson, Kendall and Spearman). Furthermore, we investigate SP estimations made by the human-experts in order to assess the extent to which they are consistent in their estimations throughout the project, i.e., we assess whether the development time of the issues is proportionate to the SP assigned to them. Results: The average results across the three correlation measures reveal that the correlation between the human-expert estimated SP and the approximated development time is strong for only 7% of the projects investigated, and medium (58%) or low (35%) for the remaining ones. Similar results are obtained when the actual development time is considered. Our empirical study also reveals that the estimation made is often not consistent throughout the project and the human estimator tends to misestimate in 78% of the cases. Conclusions: Our empirical results suggest that SP might not be an accurate indicator of open-source Agile software development effort expressed in terms of development time. The impact of its use as an indicator of effort should be explored in future work, for example as a cost-driver in automated effort estimation models or as the prediction target.
{"title":"On the Relationship Between Story Points and Development Effort in Agile Open-Source Software","authors":"Vali Tawosi, Federica Sarro","doi":"10.1145/3544902.3546238","DOIUrl":"https://doi.org/10.1145/3544902.3546238","url":null,"abstract":"Background: Previous work has provided some initial evidence that Story Point (SP) estimated by human-experts may not accurately reflect the effort needed to realise Agile software projects. Aims: In this paper, we aim to shed further light on the relationship between SP and Agile software development effort to understand the extent to which human-estimated SP is a good indicator of user story development effort expressed in terms of time needed to realise it. Method: To this end, we carry out a thorough empirical study involving a total of 37,440 unique user stories from 37 different open-source projects publicly available in the TAWOS dataset. For these user stories, we investigate the correlation between the issue development time (or its approximation when the actual time is not available) and the SP estimated by human-expert by using three widely-used correlation statistics (i.e., Pearson, Kendall and Spearman). Furthermore, we investigate SP estimations made by the human-experts in order to assess the extent to which they are consistent in their estimations throughout the project, i.e., we assess whether the development time of the issues is proportionate to the SP assigned to them. Results: The average results across the three correlation measures reveal that the correlation between the human-expert estimated SP and the approximated development time is strong for only 7% of the projects investigated, and medium (58%) or low (35%) for the remaining ones. Similar results are obtained when the actual development time is considered. Our empirical study also reveals that the estimation made is often not consistent throughout the project and the human estimator tends to misestimate in 78% of the cases. Conclusions: Our empirical results suggest that SP might not be an accurate indicator of open-source Agile software development effort expressed in terms of development time. The impact of its use as an indicator of effort should be explored in future work, for example as a cost-driver in automated effort estimation models or as the prediction target.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122762555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The continuous evolution of the Android operating system necessitates regular API updates, which may affect the functionality of Android apps. Recent studies investigated API evolution to ensure the reliability of Android apps; however, they focused on API methods alone. Aim: We aim to empirically investigate how Android API fields evolve, and how this evolution affects the compatibility of Android apps. Method: We conducted a study based on real-world app development history data involving 11098 tags out of 105 popular open-source Android apps. Results: Our study yields interesting findings, e.g., on average two API field compatibility issues exist per app, different types of checks are preferred when addressing different types of compatibility issues, and fixing compatibility issues induced by API field evolution takes more time than fixing compatibility issues induced by API method evolution. Conclusion: These findings will help developers and researchers better understand, detect, and handle Android compatibility issues induced by API field evolution.
{"title":"Android API Field Evolution and Its Induced Compatibility Issues","authors":"Tarek Mahmud, Meiru Che, Guowei Yang","doi":"10.1145/3544902.3546242","DOIUrl":"https://doi.org/10.1145/3544902.3546242","url":null,"abstract":"Background: The continuous evolution of the Android operating system necessitates regular API updates, which may affect the functionality of Android apps. Recent studies investigated API evolution to ensure the reliability of Android apps; however, they focused on API methods alone. Aim: We aim to empirically investigate how Android API fields evolve, and how this evolution affects the compatibility of Android apps. Method: We conducted a study based on real-world app development history data involving 11098 tags out of 105 popular open-source Android apps. Results: Our study yields interesting findings, e.g., on average two API field compatibility issues exist per app, different types of checks are preferred when addressing different types of compatibility issues, and fixing compatibility issues induced by API field evolution takes more time than fixing compatibility issues induced by API method evolution. Conclusion: These findings will help developers and researchers better understand, detect, and handle Android compatibility issues induced by API field evolution.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129805951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renan Vieira, Diego Mesquita, C. Mattos, Ricardo Britto, Lincoln S. Rocha, J. Gomes
Background: Bug-fixing is the crux of software maintenance. It entails tending to heaps of bug reports using limited resources. Using historical data, we can ask questions that contribute to better-informed allocation heuristics. The caveat here is that often there is not enough data to provide a sound response. This issue is especially prominent for young projects. Also, answers may vary from project to project. Consequently, it is impossible to generalize results without assuming a notion of relatedness between projects. Aims: Evaluate the independent impact of three report features in the bug-fixing time (BFT), generalizing results from many projects: bug priority, code-churn size in bug fixing commits, and existence of links to other reports (e.g., depends on or blocks other bug reports). Method: We analyze 55 projects from the Apache ecosystem using Bayesian statistics. Similar to standard random effects methodology, we assume each project’s average BFT is a dispersed version of a global average BFT that we want to assess. We split the data based on feature values/range (e.g., with or without links). For each split, we compute a posterior distribution over its respective global BFT. Finally, we compare the posteriors to establish the feature’s effect on the BFT. We run independent analyses for each feature. Results: Our results show that the existence of links and higher code-churn values lead to BFTs that are at least twice as long. On the other hand, considering three levels of priority (low, medium, and high), we observe no difference in the BFT. Conclusion: To the best of our knowledge, this is the first study using hierarchical Bayes to extrapolate results from multiple projects and assess the global effect of different attributes on the BFT. We use this methodology to gain insight on how links, priority, and code-churn size impact the BFT. On top of that, our posteriors can be used as a prior to analyze novel projects, potentially young and scarce on data. We also believe our methodology can be reused for other generalization studies in empirical software engineering.
{"title":"Bayesian Analysis of Bug-Fixing Time using Report Data","authors":"Renan Vieira, Diego Mesquita, C. Mattos, Ricardo Britto, Lincoln S. Rocha, J. Gomes","doi":"10.1145/3544902.3546256","DOIUrl":"https://doi.org/10.1145/3544902.3546256","url":null,"abstract":"Background: Bug-fixing is the crux of software maintenance. It entails tending to heaps of bug reports using limited resources. Using historical data, we can ask questions that contribute to better-informed allocation heuristics. The caveat here is that often there is not enough data to provide a sound response. This issue is especially prominent for young projects. Also, answers may vary from project to project. Consequently, it is impossible to generalize results without assuming a notion of relatedness between projects. Aims: Evaluate the independent impact of three report features in the bug-fixing time (BFT), generalizing results from many projects: bug priority, code-churn size in bug fixing commits, and existence of links to other reports (e.g., depends on or blocks other bug reports). Method: We analyze 55 projects from the Apache ecosystem using Bayesian statistics. Similar to standard random effects methodology, we assume each project’s average BFT is a dispersed version of a global average BFT that we want to assess. We split the data based on feature values/range (e.g., with or without links). For each split, we compute a posterior distribution over its respective global BFT. Finally, we compare the posteriors to establish the feature’s effect on the BFT. We run independent analyses for each feature. Results: Our results show that the existence of links and higher code-churn values lead to BFTs that are at least twice as long. On the other hand, considering three levels of priority (low, medium, and high), we observe no difference in the BFT. Conclusion: To the best of our knowledge, this is the first study using hierarchical Bayes to extrapolate results from multiple projects and assess the global effect of different attributes on the BFT. We use this methodology to gain insight on how links, priority, and code-churn size impact the BFT. On top of that, our posteriors can be used as a prior to analyze novel projects, potentially young and scarce on data. We also believe our methodology can be reused for other generalization studies in empirical software engineering.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128271367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Rzig, Foyzul Hassan, Chetan Bansal, Nachiappan Nagappan
Background: Continuous Integration (CI) has become widely adopted to enable faster code change integration. Meanwhile, Machine Learning (ML) is being used by software applications for previously unsolvable real-world scenarios. ML projects employ development processes different from those of traditional software projects, but they too require multiple iterations in their development, and may benefit from CI. Aims: While there are many works covering CI within traditional software, none of them empirically explored the adoption of CI and its associated issues within ML projects. To address this knowledge gap, we performed an empirical analysis comparing CI adoption between ML and Non-ML projects. Method: We developed TraVanalyzer, the first Travis CI configuration analyzer, to analyze the CI practices of ML projects, and developed a CI log analyzer to identify the different CI problems of ML projects. Results: We found that Travis CI is the most popular CI tool for ML projects, and that their CI adoption lags behind that of Non-ML projects, but that ML projects which adopted CI, used it for building, testing, code analysis, and automatic deployment more than Non-ML projects. Furthermore, while CI in ML projects is as likely to experience problems as CI in Non-ML projects, it has more varied reasons for build-breakage. The most frequent CI failures of ML projects are due to testing-related problems, similar to Non-ML and OSS CI failures. Conclusion: To the best of our knowledge, this is the first work that has analyzed ML projects’ CI usage, practices, and issues, and contextualized its results by comparing them with similar Non-ML projects. It provides findings for researchers and ML developers to identify possible improvement scopes for CI in ML projects.
{"title":"Characterizing the Usage of CI Tools in ML Projects","authors":"D. Rzig, Foyzul Hassan, Chetan Bansal, Nachiappan Nagappan","doi":"10.1145/3544902.3546237","DOIUrl":"https://doi.org/10.1145/3544902.3546237","url":null,"abstract":"Background: Continuous Integration (CI) has become widely adopted to enable faster code change integration. Meanwhile, Machine Learning (ML) is being used by software applications for previously unsolvable real-world scenarios. ML projects employ development processes different from those of traditional software projects, but they too require multiple iterations in their development, and may benefit from CI. Aims: While there are many works covering CI within traditional software, none of them empirically explored the adoption of CI and its associated issues within ML projects. To address this knowledge gap, we performed an empirical analysis comparing CI adoption between ML and Non-ML projects. Method: We developed TraVanalyzer, the first Travis CI configuration analyzer, to analyze the CI practices of ML projects, and developed a CI log analyzer to identify the different CI problems of ML projects. Results: We found that Travis CI is the most popular CI tool for ML projects, and that their CI adoption lags behind that of Non-ML projects, but that ML projects which adopted CI, used it for building, testing, code analysis, and automatic deployment more than Non-ML projects. Furthermore, while CI in ML projects is as likely to experience problems as CI in Non-ML projects, it has more varied reasons for build-breakage. The most frequent CI failures of ML projects are due to testing-related problems, similar to Non-ML and OSS CI failures. Conclusion: To the best of our knowledge, this is the first work that has analyzed ML projects’ CI usage, practices, and issues, and contextualized its results by comparing them with similar Non-ML projects. It provides findings for researchers and ML developers to identify possible improvement scopes for CI in ML projects.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130527973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: With the proliferation of crowd-sourced developer forums, Software developers are increasingly sharing more coding solutions to programming problems with others in forums. The decentralized nature of knowledge sharing on sites has raised the concern of sharing security vulnerable code, which then can be reused into mission critical software systems - making those systems vulnerable in the process. Collaborative editing has been introduced in forums like Stack Overflow to improve the quality of the shared contents. Aim: In this paper, we investigate whether code editing can mitigate shared vulnerable code examples by analyzing IoT code snippets and their revisions in three Stack Exchange sites: Stack Overflow, Arduino, and Raspberry Pi. Method:We analyze the vulnerabilities present in shared IoT C/C++ code snippets, as C/C++ is one of the most widely used languages in mission-critical devices and low-powered IoT devices. We further analyse the revisions made to these code snippets, and their effects. Results: We find several vulnerabilities such as CWE 788 - Access of Memory Location After End of Buffer , in 740 code snippets. However, we find the vast majority of posts are not revised, or revisions are not made to the code snippets themselves (598 out of 740). We also find that revisions are most likely to result in no change to the number of vulnerabilities in a code snippet rather than deteriorating or improving the snippet. Conclusions: We conclude that the current collaborative editing system in the forums may be insufficient to help mitigate vulnerabilities in the shared code.
{"title":"Does Collaborative Editing Help Mitigate Security Vulnerabilities in Crowd-Shared IoT Code Examples?","authors":"Madhu Selvaraj, Gias Uddin","doi":"10.1145/3544902.3546235","DOIUrl":"https://doi.org/10.1145/3544902.3546235","url":null,"abstract":"Background: With the proliferation of crowd-sourced developer forums, Software developers are increasingly sharing more coding solutions to programming problems with others in forums. The decentralized nature of knowledge sharing on sites has raised the concern of sharing security vulnerable code, which then can be reused into mission critical software systems - making those systems vulnerable in the process. Collaborative editing has been introduced in forums like Stack Overflow to improve the quality of the shared contents. Aim: In this paper, we investigate whether code editing can mitigate shared vulnerable code examples by analyzing IoT code snippets and their revisions in three Stack Exchange sites: Stack Overflow, Arduino, and Raspberry Pi. Method:We analyze the vulnerabilities present in shared IoT C/C++ code snippets, as C/C++ is one of the most widely used languages in mission-critical devices and low-powered IoT devices. We further analyse the revisions made to these code snippets, and their effects. Results: We find several vulnerabilities such as CWE 788 - Access of Memory Location After End of Buffer , in 740 code snippets. However, we find the vast majority of posts are not revised, or revisions are not made to the code snippets themselves (598 out of 740). We also find that revisions are most likely to result in no change to the number of vulnerabilities in a code snippet rather than deteriorating or improving the snippet. Conclusions: We conclude that the current collaborative editing system in the forums may be insufficient to help mitigate vulnerabilities in the shared code.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"23 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134395956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: GitHub is a collaborative platform for global software development, where Pull Requests (PRs) are essential to bridge code changes with version control. However, developers often trade software quality for faster implementation, incurring Technical Debt (TD). When developers undertake reviewers’ roles and evaluate PRs, they can often detect TD instances, leading to either PR rejection or discussions. Aims: We investigated whether Pull Request Comments (PRCs) indicate TD by assessing three large-scale repositories: Spark, Kafka, and React. Method: We combined manual classification with automated detection using machine learning and deep learning models. Results: We classified two datasets and found that 37.7 and 38.7% of PRCs indicate TD, respectively. Our best model achieved F1 = 0.85 when classifying TD during the validation phase. Conclusions: We faced several challenges during this process, which may hint that TD in PRCs is discussed differently from other software artifacts (e.g., code comments, commits, issues, or discussion forums). Thus, we present challenges and lessons learned to assist researchers in pursuing this area of research.
{"title":"An Experience Report on Technical Debt in Pull Requests: Challenges and Lessons Learned","authors":"Shubhashis Karmakar, Zadia Codabux, M. Vidoni","doi":"10.1145/3544902.3546637","DOIUrl":"https://doi.org/10.1145/3544902.3546637","url":null,"abstract":"Background: GitHub is a collaborative platform for global software development, where Pull Requests (PRs) are essential to bridge code changes with version control. However, developers often trade software quality for faster implementation, incurring Technical Debt (TD). When developers undertake reviewers’ roles and evaluate PRs, they can often detect TD instances, leading to either PR rejection or discussions. Aims: We investigated whether Pull Request Comments (PRCs) indicate TD by assessing three large-scale repositories: Spark, Kafka, and React. Method: We combined manual classification with automated detection using machine learning and deep learning models. Results: We classified two datasets and found that 37.7 and 38.7% of PRCs indicate TD, respectively. Our best model achieved F1 = 0.85 when classifying TD during the validation phase. Conclusions: We faced several challenges during this process, which may hint that TD in PRCs is discussed differently from other software artifacts (e.g., code comments, commits, issues, or discussion forums). Thus, we present challenges and lessons learned to assist researchers in pursuing this area of research.","PeriodicalId":220679,"journal":{"name":"Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134041006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}