{"title":"An Empirical Study of the Bug Link Rate","authors":"Chenglin Li, Yangyang Zhao, Yibiao Yang","doi":"10.1109/QRS57517.2022.00028","DOIUrl":null,"url":null,"abstract":"Defect data is critical for software defect prediction. To collect defect data, it is essential to establish links between bugs and their fixes. Missing links (i.e. low link rate) can cause false negatives in the defect dataset, and bias the experimental results. Despite the importance of bug links, little prior work has used bug link rate as a criterion for selecting subjects, and there is no empirical evidence to know whether there are simpler alternative criteria for evaluating a project’s link rate to aid selection. To this end, we conduct a comprehensive study on the bug link rate. Based on 34 open-source projects, we make a detailed statistical analysis of the actual link rates of the projects, and examine the factors affecting link rates from both quantitative and qualitative perspectives. The findings could improve the understanding of bug link rates, and guide the selection of better subjects for defect prediction.","PeriodicalId":143812,"journal":{"name":"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS57517.2022.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Defect data is critical for software defect prediction. To collect defect data, it is essential to establish links between bugs and their fixes. Missing links (i.e. low link rate) can cause false negatives in the defect dataset, and bias the experimental results. Despite the importance of bug links, little prior work has used bug link rate as a criterion for selecting subjects, and there is no empirical evidence to know whether there are simpler alternative criteria for evaluating a project’s link rate to aid selection. To this end, we conduct a comprehensive study on the bug link rate. Based on 34 open-source projects, we make a detailed statistical analysis of the actual link rates of the projects, and examine the factors affecting link rates from both quantitative and qualitative perspectives. The findings could improve the understanding of bug link rates, and guide the selection of better subjects for defect prediction.