Background: Developers spend a significant amount of time and effort to localize bugs. In the literature, many researchers proposed state-of-the-art bug localization models to help developers localize bugs easily. The practitioners, on the other hand, expect a bug localization tool to meet certain criteria, such as trustworthiness, scalability, and efficiency. The current models are not capable of meeting these criteria, making it harder to adopt these models in practice. Recently, deep-learning-based bug localization models have been proposed in the literature. They show a better performance than the state-of-the-art models. Aim: In this research, we would like to investigate whether deep learning models meet the expectations of practitioners or not. Method: We constructed a Convolution Neural Network and a Simple Logistic model to examine their effectiveness in localizing bugs. We train these models on five open source projects written in Java and compare their performance with the performance of other state-of-the-art models trained on these datasets. Results: Our experiments show that although the deep learning models perform better than classic machine learning models, they meet the adoption criteria set by the practitioners only partially. Conclusions: This work provides evidence that the practitioners should be cautious while using the current state of the art models for production-level use-cases. It also highlights the need for standardization of performance benchmarks to ensure that bug localization models are assessed equitably and realistically.
{"title":"On Usefulness of the Deep-Learning-Based Bug Localization Models to Practitioners","authors":"Sravya Polisetty, A. Miranskyy, A. Bener","doi":"10.1145/3345629.3345632","DOIUrl":"https://doi.org/10.1145/3345629.3345632","url":null,"abstract":"Background: Developers spend a significant amount of time and effort to localize bugs. In the literature, many researchers proposed state-of-the-art bug localization models to help developers localize bugs easily. The practitioners, on the other hand, expect a bug localization tool to meet certain criteria, such as trustworthiness, scalability, and efficiency. The current models are not capable of meeting these criteria, making it harder to adopt these models in practice. Recently, deep-learning-based bug localization models have been proposed in the literature. They show a better performance than the state-of-the-art models. Aim: In this research, we would like to investigate whether deep learning models meet the expectations of practitioners or not. Method: We constructed a Convolution Neural Network and a Simple Logistic model to examine their effectiveness in localizing bugs. We train these models on five open source projects written in Java and compare their performance with the performance of other state-of-the-art models trained on these datasets. Results: Our experiments show that although the deep learning models perform better than classic machine learning models, they meet the adoption criteria set by the practitioners only partially. Conclusions: This work provides evidence that the practitioners should be cautious while using the current state of the art models for production-level use-cases. It also highlights the need for standardization of performance benchmarks to ensure that bug localization models are assessed equitably and realistically.","PeriodicalId":424201,"journal":{"name":"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115010802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Open source requires participation of volunteer and commercial developers (users) in order to deliver functional high-quality components. Developers both contribute effort in the form of patches and demand effort from the component maintainers to resolve issues reported against it. Open source components depend on each other directly and transitively, and evidence suggests that more effort is required for reporting and resolving the issues reported further upstream in this supply chain. Aim: Identify and characterize patterns of effort contribution and demand throughout the open source supply chain and investigate if and how these patterns vary with developer activity; identify different groups of developers; and predict developers' company affiliation based on their participation patterns. Method: 1,376,946 issues and pull-requests created for 4433 NPM packages with over 10,000 monthly downloads and full (public) commit activity data of the 272,142 issue creators is obtained and analyzed and dependencies on NPM packages are identified. Fuzzy c-means clustering algorithm is used to find the groups among the users based on their effort contribution and demand patterns, and Random Forest is used as the predictive modeling technique to identify their company affiliations. Result: Users contribute and demand effort primarily from packages that they depend on directly with only a tiny fraction of contributions and demand going to transitive dependencies. A significant portion of demand goes into packages outside the users' respective supply chains (constructed based on publicly visible version control data). Three and two different groups of users are observed based on the effort demand and effort contribution patterns respectively. The Random Forest model used for identifying the company affiliation of the users gives a AUC-ROC value of 0.68, and variables representing aggregate participation patterns proved to be the important predictors. Conclusion: Our results give new insights into effort demand and supply at different parts of the supply chain of the NPM ecosystem and its users and suggests the need to increase visibility further upstream.
{"title":"Patterns of Effort Contribution and Demand and User Classification based on Participation Patterns in NPM Ecosystem","authors":"Tapajit Dey, Yuxing Ma, A. Mockus","doi":"10.1145/3345629.3345634","DOIUrl":"https://doi.org/10.1145/3345629.3345634","url":null,"abstract":"Background: Open source requires participation of volunteer and commercial developers (users) in order to deliver functional high-quality components. Developers both contribute effort in the form of patches and demand effort from the component maintainers to resolve issues reported against it. Open source components depend on each other directly and transitively, and evidence suggests that more effort is required for reporting and resolving the issues reported further upstream in this supply chain. Aim: Identify and characterize patterns of effort contribution and demand throughout the open source supply chain and investigate if and how these patterns vary with developer activity; identify different groups of developers; and predict developers' company affiliation based on their participation patterns. Method: 1,376,946 issues and pull-requests created for 4433 NPM packages with over 10,000 monthly downloads and full (public) commit activity data of the 272,142 issue creators is obtained and analyzed and dependencies on NPM packages are identified. Fuzzy c-means clustering algorithm is used to find the groups among the users based on their effort contribution and demand patterns, and Random Forest is used as the predictive modeling technique to identify their company affiliations. Result: Users contribute and demand effort primarily from packages that they depend on directly with only a tiny fraction of contributions and demand going to transitive dependencies. A significant portion of demand goes into packages outside the users' respective supply chains (constructed based on publicly visible version control data). Three and two different groups of users are observed based on the effort demand and effort contribution patterns respectively. The Random Forest model used for identifying the company affiliation of the users gives a AUC-ROC value of 0.68, and variables representing aggregate participation patterns proved to be the important predictors. Conclusion: Our results give new insights into effort demand and supply at different parts of the supply chain of the NPM ecosystem and its users and suggests the need to increase visibility further upstream.","PeriodicalId":424201,"journal":{"name":"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122473816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}