{"title":"层次狄利克雷过程与潜在狄利克雷分配在bug报告多类分类中的比较","authors":"Nachai Limsettho, Hideaki Hata, Ken-ichi Matsumoto","doi":"10.1109/SNPD.2014.6888695","DOIUrl":null,"url":null,"abstract":"Bug reports play essential roles in many software engineering tasks. Since validity and performance of these tasks definitely rely on the quality of bug reports, accurate information from bug reports is very important. However, as found in previous study, significant numbers of reports classified as bug are not really a bug. Recent studies proposed techniques to automatically classify bug reports into binary classes, yet there is still more to desire. These bug reports can be classified into multiple classes, which could help to identify what these reports are actually about. Moreover, previous study only looks into one possibility of topic modeling, that is, Latent Dirichlet Allocation (LDA). While LDA has its advantage, parameter tuning is required. In this paper, we propose a nonparametric approach to automatically classify bug reports with, another topic modeling method, Hierarchical Dirichlet Process (HDP). The result indicates that our nonparametric approach performance is comparable to the parametric one. We also examine various aspects of LDA to provide more thoroughly understanding of this process.","PeriodicalId":272932,"journal":{"name":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Comparing hierarchical dirichlet process with latent dirichlet allocation in bug report multiclass classification\",\"authors\":\"Nachai Limsettho, Hideaki Hata, Ken-ichi Matsumoto\",\"doi\":\"10.1109/SNPD.2014.6888695\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bug reports play essential roles in many software engineering tasks. Since validity and performance of these tasks definitely rely on the quality of bug reports, accurate information from bug reports is very important. However, as found in previous study, significant numbers of reports classified as bug are not really a bug. Recent studies proposed techniques to automatically classify bug reports into binary classes, yet there is still more to desire. These bug reports can be classified into multiple classes, which could help to identify what these reports are actually about. Moreover, previous study only looks into one possibility of topic modeling, that is, Latent Dirichlet Allocation (LDA). While LDA has its advantage, parameter tuning is required. In this paper, we propose a nonparametric approach to automatically classify bug reports with, another topic modeling method, Hierarchical Dirichlet Process (HDP). The result indicates that our nonparametric approach performance is comparable to the parametric one. We also examine various aspects of LDA to provide more thoroughly understanding of this process.\",\"PeriodicalId\":272932,\"journal\":{\"name\":\"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SNPD.2014.6888695\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNPD.2014.6888695","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparing hierarchical dirichlet process with latent dirichlet allocation in bug report multiclass classification
Bug reports play essential roles in many software engineering tasks. Since validity and performance of these tasks definitely rely on the quality of bug reports, accurate information from bug reports is very important. However, as found in previous study, significant numbers of reports classified as bug are not really a bug. Recent studies proposed techniques to automatically classify bug reports into binary classes, yet there is still more to desire. These bug reports can be classified into multiple classes, which could help to identify what these reports are actually about. Moreover, previous study only looks into one possibility of topic modeling, that is, Latent Dirichlet Allocation (LDA). While LDA has its advantage, parameter tuning is required. In this paper, we propose a nonparametric approach to automatically classify bug reports with, another topic modeling method, Hierarchical Dirichlet Process (HDP). The result indicates that our nonparametric approach performance is comparable to the parametric one. We also examine various aspects of LDA to provide more thoroughly understanding of this process.