{"title":"Duplicate issue detection for the Android open source project","authors":"Kasthuri Jayarajah, Meera Radhakrishnan, Camellia Zakaria","doi":"10.1145/2975961.2975965","DOIUrl":null,"url":null,"abstract":"The Android Open Source Project(AOSP) has seen tremendous traction over the past decade, and as such, the bug repository is growing in scale. With this growth, the effort required for project members to triage incoming new reports to identify whether it is a duplicate issue that has already been addressed, or receiving attention, is also on the rise. In this work, we create dataset of issues from the Android issue tracker, and use standard IR techniques such as VSM and LDA to understand their capability in such similar issue retrieval. Further, we combine VSM and LDA to evaluate its usefulness. We find that, overall, VSM performs better with this dataset.","PeriodicalId":106703,"journal":{"name":"Proceedings of the 5th International Workshop on Software Mining","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Workshop on Software Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2975961.2975965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The Android Open Source Project(AOSP) has seen tremendous traction over the past decade, and as such, the bug repository is growing in scale. With this growth, the effort required for project members to triage incoming new reports to identify whether it is a duplicate issue that has already been addressed, or receiving attention, is also on the rise. In this work, we create dataset of issues from the Android issue tracker, and use standard IR techniques such as VSM and LDA to understand their capability in such similar issue retrieval. Further, we combine VSM and LDA to evaluate its usefulness. We find that, overall, VSM performs better with this dataset.