{"title":"使用文本错误报告预测软件错误的故障类别","authors":"Thomas Hirsch, Birgit Hofer","doi":"10.1016/j.array.2022.100189","DOIUrl":null,"url":null,"abstract":"<div><p>Debugging is a time-consuming and expensive process. Developers have to select appropriate tools, methods and approaches in order to efficiently reproduce, localize and fix bugs. These choices are based on the developers’ assessment of the type of fault for a given bug report. This paper proposes a machine learning (ML) based approach that predicts the fault type for a given textual bug report. We built a dataset from 70+ projects for training and evaluation of our approach. Further, we performed a user study to establish a baseline for non-expert human performance on this task. Our models, incorporating our custom preprocessing approaches, reach up to 0.69% macro average F1 score on this bug classification problem. We demonstrate inter-project transferability of our approach. Further, we identify and discuss issues and limitations of ML classification approaches applied on textual bug reports. Our models can support researchers in data collection efforts, as for example bug benchmark creation. In future, such models could aid inexperienced developers in debugging tool selection, helping save time and resources.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"15 ","pages":"Article 100189"},"PeriodicalIF":2.3000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S259000562200042X/pdfft?md5=d85c4e5667d78881a25943827c972501&pid=1-s2.0-S259000562200042X-main.pdf","citationCount":"2","resultStr":"{\"title\":\"Using textual bug reports to predict the fault category of software bugs\",\"authors\":\"Thomas Hirsch, Birgit Hofer\",\"doi\":\"10.1016/j.array.2022.100189\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Debugging is a time-consuming and expensive process. Developers have to select appropriate tools, methods and approaches in order to efficiently reproduce, localize and fix bugs. These choices are based on the developers’ assessment of the type of fault for a given bug report. This paper proposes a machine learning (ML) based approach that predicts the fault type for a given textual bug report. We built a dataset from 70+ projects for training and evaluation of our approach. Further, we performed a user study to establish a baseline for non-expert human performance on this task. Our models, incorporating our custom preprocessing approaches, reach up to 0.69% macro average F1 score on this bug classification problem. We demonstrate inter-project transferability of our approach. Further, we identify and discuss issues and limitations of ML classification approaches applied on textual bug reports. Our models can support researchers in data collection efforts, as for example bug benchmark creation. In future, such models could aid inexperienced developers in debugging tool selection, helping save time and resources.</p></div>\",\"PeriodicalId\":8417,\"journal\":{\"name\":\"Array\",\"volume\":\"15 \",\"pages\":\"Article 100189\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2022-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S259000562200042X/pdfft?md5=d85c4e5667d78881a25943827c972501&pid=1-s2.0-S259000562200042X-main.pdf\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Array\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S259000562200042X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S259000562200042X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Using textual bug reports to predict the fault category of software bugs
Debugging is a time-consuming and expensive process. Developers have to select appropriate tools, methods and approaches in order to efficiently reproduce, localize and fix bugs. These choices are based on the developers’ assessment of the type of fault for a given bug report. This paper proposes a machine learning (ML) based approach that predicts the fault type for a given textual bug report. We built a dataset from 70+ projects for training and evaluation of our approach. Further, we performed a user study to establish a baseline for non-expert human performance on this task. Our models, incorporating our custom preprocessing approaches, reach up to 0.69% macro average F1 score on this bug classification problem. We demonstrate inter-project transferability of our approach. Further, we identify and discuss issues and limitations of ML classification approaches applied on textual bug reports. Our models can support researchers in data collection efforts, as for example bug benchmark creation. In future, such models could aid inexperienced developers in debugging tool selection, helping save time and resources.