{"title":"基于N-Gram IDF的自承认技术债务自动分类","authors":"Supatsara Wattanakriengkrai, Napat Srisermphoak, Sahawat Sintoplertchaikul, Morakot Choetkiertikul, Chaiyong Ragkhitwetsagul, T. Sunetnanta, Hideaki Hata, Ken-ichi Matsumoto","doi":"10.1109/APSEC48747.2019.00050","DOIUrl":null,"url":null,"abstract":"Technical Debt (TD) introduces a quality problem and increases maintenance cost since it may require improvements in the future. Several studies show that it is possible to automatically detect TD from source code comments that developers intentionally created, so-called self-admitted technical debt (SATD). Those studies proposed to use binary classification technique to predict whether a comment shows SATD. However, SATD has different types (e.g. design SATD and requirement SATD). In this paper, we therefore propose an approach using N-gram Inverse Document Frequency (IDF) and employ a multi-class classification technique to build a model that can identify different types of SATD. From the empirical evaluation on 10 open-source projects, our approach outperforms alternative methods (e.g. using BOW and TF-IDF). Our approach also improves the prediction performance over the baseline benchmark by 33%.","PeriodicalId":325642,"journal":{"name":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","volume":"411 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Automatic Classifying Self-Admitted Technical Debt Using N-Gram IDF\",\"authors\":\"Supatsara Wattanakriengkrai, Napat Srisermphoak, Sahawat Sintoplertchaikul, Morakot Choetkiertikul, Chaiyong Ragkhitwetsagul, T. Sunetnanta, Hideaki Hata, Ken-ichi Matsumoto\",\"doi\":\"10.1109/APSEC48747.2019.00050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Technical Debt (TD) introduces a quality problem and increases maintenance cost since it may require improvements in the future. Several studies show that it is possible to automatically detect TD from source code comments that developers intentionally created, so-called self-admitted technical debt (SATD). Those studies proposed to use binary classification technique to predict whether a comment shows SATD. However, SATD has different types (e.g. design SATD and requirement SATD). In this paper, we therefore propose an approach using N-gram Inverse Document Frequency (IDF) and employ a multi-class classification technique to build a model that can identify different types of SATD. From the empirical evaluation on 10 open-source projects, our approach outperforms alternative methods (e.g. using BOW and TF-IDF). Our approach also improves the prediction performance over the baseline benchmark by 33%.\",\"PeriodicalId\":325642,\"journal\":{\"name\":\"2019 26th Asia-Pacific Software Engineering Conference (APSEC)\",\"volume\":\"411 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 26th Asia-Pacific Software Engineering Conference (APSEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSEC48747.2019.00050\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 26th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC48747.2019.00050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic Classifying Self-Admitted Technical Debt Using N-Gram IDF
Technical Debt (TD) introduces a quality problem and increases maintenance cost since it may require improvements in the future. Several studies show that it is possible to automatically detect TD from source code comments that developers intentionally created, so-called self-admitted technical debt (SATD). Those studies proposed to use binary classification technique to predict whether a comment shows SATD. However, SATD has different types (e.g. design SATD and requirement SATD). In this paper, we therefore propose an approach using N-gram Inverse Document Frequency (IDF) and employ a multi-class classification technique to build a model that can identify different types of SATD. From the empirical evaluation on 10 open-source projects, our approach outperforms alternative methods (e.g. using BOW and TF-IDF). Our approach also improves the prediction performance over the baseline benchmark by 33%.