{"title":"重复Bug报告中上下文特征使用的新方法:基于主题曼哈顿距离相似度的维度扩展","authors":"Behzad Soleimani Neysiani, Seyed Morteza Babamir","doi":"10.1109/ICWR.2019.8765296","DOIUrl":null,"url":null,"abstract":"Duplicate bug report detection is one of the major problems in software triage systems like Bugzilla to deal with end user requests. User request contains some categorical and especially textual fields which need feature extraction for duplicate detection. Contextual and topical features are acquired using calculating cosine similarity between term frequency or inverse document frequency or BM25F technique from a pair of bug reports against some topics. This research proposes the individual Manhattan distance similarity approach instead of cosine distance similarity for every topic in contextual features to expand the feature dimension which can increase the accuracy of the duplicate bug report detection process. The four famous datasets of bug reports have used for evaluation of the proposed method including Android, Eclipse, Mozilla, and Open Office which the experimental results indicate performance improvement for four contextual features including general, cryptography, network, and Java topics.","PeriodicalId":6680,"journal":{"name":"2019 5th International Conference on Web Research (ICWR)","volume":"112 1","pages":"178-183"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"New Methodology for Contextual Features Usage in Duplicate Bug Reports Detection : Dimension Expansion based on Manhattan Distance Similarity of Topics\",\"authors\":\"Behzad Soleimani Neysiani, Seyed Morteza Babamir\",\"doi\":\"10.1109/ICWR.2019.8765296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Duplicate bug report detection is one of the major problems in software triage systems like Bugzilla to deal with end user requests. User request contains some categorical and especially textual fields which need feature extraction for duplicate detection. Contextual and topical features are acquired using calculating cosine similarity between term frequency or inverse document frequency or BM25F technique from a pair of bug reports against some topics. This research proposes the individual Manhattan distance similarity approach instead of cosine distance similarity for every topic in contextual features to expand the feature dimension which can increase the accuracy of the duplicate bug report detection process. The four famous datasets of bug reports have used for evaluation of the proposed method including Android, Eclipse, Mozilla, and Open Office which the experimental results indicate performance improvement for four contextual features including general, cryptography, network, and Java topics.\",\"PeriodicalId\":6680,\"journal\":{\"name\":\"2019 5th International Conference on Web Research (ICWR)\",\"volume\":\"112 1\",\"pages\":\"178-183\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 5th International Conference on Web Research (ICWR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICWR.2019.8765296\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 5th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR.2019.8765296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
New Methodology for Contextual Features Usage in Duplicate Bug Reports Detection : Dimension Expansion based on Manhattan Distance Similarity of Topics
Duplicate bug report detection is one of the major problems in software triage systems like Bugzilla to deal with end user requests. User request contains some categorical and especially textual fields which need feature extraction for duplicate detection. Contextual and topical features are acquired using calculating cosine similarity between term frequency or inverse document frequency or BM25F technique from a pair of bug reports against some topics. This research proposes the individual Manhattan distance similarity approach instead of cosine distance similarity for every topic in contextual features to expand the feature dimension which can increase the accuracy of the duplicate bug report detection process. The four famous datasets of bug reports have used for evaluation of the proposed method including Android, Eclipse, Mozilla, and Open Office which the experimental results indicate performance improvement for four contextual features including general, cryptography, network, and Java topics.