Chris Mills, Jevgenija Pantiuchina, Esteban Parra, G. Bavota, S. Haiduc
{"title":"Are Bug Reports Enough for Text Retrieval-Based Bug Localization?","authors":"Chris Mills, Jevgenija Pantiuchina, Esteban Parra, G. Bavota, S. Haiduc","doi":"10.1109/ICSME.2018.00046","DOIUrl":null,"url":null,"abstract":"Text Retrieval (TR) has been widely used to support many software engineering tasks, including bug localization (i.e., the activity of localizing buggy code starting from a bug report). Many studies show TR's effectiveness in lowering the manual effort required to perform this maintenance task; however, the actual usefulness of TR-based bug localization has been questioned in recent studies. These studies discuss (i) potential biases in the experimental design usually adopted to evaluate TRbased bug localization techniques and (ii) their poor performance in the scenario when they are needed most: when the bug report, which serves as the de facto query in most studies, does not contain localization hints (e.g., code snippets, method names, etc.) Fundamentally, these studies raise the question: do bug reports provide sufficient information to perform TR-based localization? In this work, we approach that question from two perspectives. First, we investigate potential biases in the evaluation of TR-based approaches which artificially boost the performance of these techniques, making them appear more successful than they are. Second, we analyze bug report text with and without localization hints using a genetic algorithm to derive a near-optimal query that provides insight into the potential of that bug report for use in TR-based localization. Through this analysis we show that in most cases the bug report vocabulary (i.e., the terms contained in the bug title and description) is all we need to formulate effective queries, making TR-based bug localization successful without supplementary query expansion. Most notably, this also holds when localization hints are completely absent from the bug report. In fact, our results suggest that the next major step in improving TR-based bug localization is the ability to formulate these near-optimal queries.","PeriodicalId":6572,"journal":{"name":"2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"14 1","pages":"381-392"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME.2018.00046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43
Abstract
Text Retrieval (TR) has been widely used to support many software engineering tasks, including bug localization (i.e., the activity of localizing buggy code starting from a bug report). Many studies show TR's effectiveness in lowering the manual effort required to perform this maintenance task; however, the actual usefulness of TR-based bug localization has been questioned in recent studies. These studies discuss (i) potential biases in the experimental design usually adopted to evaluate TRbased bug localization techniques and (ii) their poor performance in the scenario when they are needed most: when the bug report, which serves as the de facto query in most studies, does not contain localization hints (e.g., code snippets, method names, etc.) Fundamentally, these studies raise the question: do bug reports provide sufficient information to perform TR-based localization? In this work, we approach that question from two perspectives. First, we investigate potential biases in the evaluation of TR-based approaches which artificially boost the performance of these techniques, making them appear more successful than they are. Second, we analyze bug report text with and without localization hints using a genetic algorithm to derive a near-optimal query that provides insight into the potential of that bug report for use in TR-based localization. Through this analysis we show that in most cases the bug report vocabulary (i.e., the terms contained in the bug title and description) is all we need to formulate effective queries, making TR-based bug localization successful without supplementary query expansion. Most notably, this also holds when localization hints are completely absent from the bug report. In fact, our results suggest that the next major step in improving TR-based bug localization is the ability to formulate these near-optimal queries.