{"title":"基于词分布聚类的文本分类关键字列表生成实验","authors":"Wilson Fonda, A. Purwarianti","doi":"10.1109/ICACSIS.2014.7065879","DOIUrl":null,"url":null,"abstract":"Text classification is a useful task in text mining. Most researchers employ one word weight type in the text classification. Here, we proposed to build a keyword list by combining several word weights for a rule based multi label text classification. Through this research, we conducted experiments on the term distribution clustering to produce the best automatic generated keyword list. We compared several term weights such as TFxIDF, MI, IG, and DF. As for the case study, we implemented the text classification of authority classification in complaint management system using the generated keyword list. The experiments on 245 Twitter data using keyword list generated from 2325 Twitter data showed that the best accuracy was achieved by using all term weights compared to only one term weight in the term distribution clustering.","PeriodicalId":443250,"journal":{"name":"2014 International Conference on Advanced Computer Science and Information System","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Experiments on keyword list generation by term distribution clustering for text classification\",\"authors\":\"Wilson Fonda, A. Purwarianti\",\"doi\":\"10.1109/ICACSIS.2014.7065879\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification is a useful task in text mining. Most researchers employ one word weight type in the text classification. Here, we proposed to build a keyword list by combining several word weights for a rule based multi label text classification. Through this research, we conducted experiments on the term distribution clustering to produce the best automatic generated keyword list. We compared several term weights such as TFxIDF, MI, IG, and DF. As for the case study, we implemented the text classification of authority classification in complaint management system using the generated keyword list. The experiments on 245 Twitter data using keyword list generated from 2325 Twitter data showed that the best accuracy was achieved by using all term weights compared to only one term weight in the term distribution clustering.\",\"PeriodicalId\":443250,\"journal\":{\"name\":\"2014 International Conference on Advanced Computer Science and Information System\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Advanced Computer Science and Information System\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACSIS.2014.7065879\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Advanced Computer Science and Information System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACSIS.2014.7065879","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Experiments on keyword list generation by term distribution clustering for text classification
Text classification is a useful task in text mining. Most researchers employ one word weight type in the text classification. Here, we proposed to build a keyword list by combining several word weights for a rule based multi label text classification. Through this research, we conducted experiments on the term distribution clustering to produce the best automatic generated keyword list. We compared several term weights such as TFxIDF, MI, IG, and DF. As for the case study, we implemented the text classification of authority classification in complaint management system using the generated keyword list. The experiments on 245 Twitter data using keyword list generated from 2325 Twitter data showed that the best accuracy was achieved by using all term weights compared to only one term weight in the term distribution clustering.