{"title":"基于bert的GitHub问题报告分类","authors":"Mohammed Latif Siddiq, Joanna C. S. Santos","doi":"10.1145/3528588.3528660","DOIUrl":null,"url":null,"abstract":"Issue tracking is one of the integral parts of software development, especially for open source projects. GitHub, a commonly used software management tool, provides its own issue tracking system. Each issue can have various tags, which are manually assigned by the project’s developers. However, manually labeling software reports is a time-consuming and error-prone task. In this paper, we describe a BERT-based classification technique to automatically label issues as questions, bugs, or enhancements. We evaluate our approach using a dataset containing over 800,000 labeled issues from real open source projects available on GitHub. Our approach classified reported issues with an average F1-score of 0.8571. Our technique outperforms a previous machine learning technique based on FastText.","PeriodicalId":313397,"journal":{"name":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"BERT-Based GitHub Issue Report Classification\",\"authors\":\"Mohammed Latif Siddiq, Joanna C. S. Santos\",\"doi\":\"10.1145/3528588.3528660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Issue tracking is one of the integral parts of software development, especially for open source projects. GitHub, a commonly used software management tool, provides its own issue tracking system. Each issue can have various tags, which are manually assigned by the project’s developers. However, manually labeling software reports is a time-consuming and error-prone task. In this paper, we describe a BERT-based classification technique to automatically label issues as questions, bugs, or enhancements. We evaluate our approach using a dataset containing over 800,000 labeled issues from real open source projects available on GitHub. Our approach classified reported issues with an average F1-score of 0.8571. Our technique outperforms a previous machine learning technique based on FastText.\",\"PeriodicalId\":313397,\"journal\":{\"name\":\"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)\",\"volume\":\"68 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3528588.3528660\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3528588.3528660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Issue tracking is one of the integral parts of software development, especially for open source projects. GitHub, a commonly used software management tool, provides its own issue tracking system. Each issue can have various tags, which are manually assigned by the project’s developers. However, manually labeling software reports is a time-consuming and error-prone task. In this paper, we describe a BERT-based classification technique to automatically label issues as questions, bugs, or enhancements. We evaluate our approach using a dataset containing over 800,000 labeled issues from real open source projects available on GitHub. Our approach classified reported issues with an average F1-score of 0.8571. Our technique outperforms a previous machine learning technique based on FastText.