{"title":"Experiments on keyword list generation by term distribution clustering for text classification","authors":"Wilson Fonda, A. Purwarianti","doi":"10.1109/ICACSIS.2014.7065879","DOIUrl":null,"url":null,"abstract":"Text classification is a useful task in text mining. Most researchers employ one word weight type in the text classification. Here, we proposed to build a keyword list by combining several word weights for a rule based multi label text classification. Through this research, we conducted experiments on the term distribution clustering to produce the best automatic generated keyword list. We compared several term weights such as TFxIDF, MI, IG, and DF. As for the case study, we implemented the text classification of authority classification in complaint management system using the generated keyword list. The experiments on 245 Twitter data using keyword list generated from 2325 Twitter data showed that the best accuracy was achieved by using all term weights compared to only one term weight in the term distribution clustering.","PeriodicalId":443250,"journal":{"name":"2014 International Conference on Advanced Computer Science and Information System","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Advanced Computer Science and Information System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACSIS.2014.7065879","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Text classification is a useful task in text mining. Most researchers employ one word weight type in the text classification. Here, we proposed to build a keyword list by combining several word weights for a rule based multi label text classification. Through this research, we conducted experiments on the term distribution clustering to produce the best automatic generated keyword list. We compared several term weights such as TFxIDF, MI, IG, and DF. As for the case study, we implemented the text classification of authority classification in complaint management system using the generated keyword list. The experiments on 245 Twitter data using keyword list generated from 2325 Twitter data showed that the best accuracy was achieved by using all term weights compared to only one term weight in the term distribution clustering.