Muhammad Diaphan Nizam Arusada, N. Putri, A. Alamsyah
{"title":"Training data optimization strategy for multiclass text classification","authors":"Muhammad Diaphan Nizam Arusada, N. Putri, A. Alamsyah","doi":"10.1109/ICOICT.2017.8074652","DOIUrl":null,"url":null,"abstract":"Big data has been widely spread throughout social media in this digital era. Indeed, it is a good chance for business to get the information in real time. Since the data from social media is unstructured, thus we need to process it beforehand. Machine learning needs proper training data that makes the classification model perform accurately. In order to actualize it, we need a qualified domain knowledge and the right strategy to make an optimal training data. This paper shows the strategy to make optimal training data by using customer's complaint data from Twitter. We use both Naive Bayes and Support Vector Machine as classifiers. The experimental result shows that our strategy of training data optimization can give good performance for multi-class text classification model.","PeriodicalId":244500,"journal":{"name":"2017 5th International Conference on Information and Communication Technology (ICoIC7)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 5th International Conference on Information and Communication Technology (ICoIC7)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOICT.2017.8074652","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25
Abstract
Big data has been widely spread throughout social media in this digital era. Indeed, it is a good chance for business to get the information in real time. Since the data from social media is unstructured, thus we need to process it beforehand. Machine learning needs proper training data that makes the classification model perform accurately. In order to actualize it, we need a qualified domain knowledge and the right strategy to make an optimal training data. This paper shows the strategy to make optimal training data by using customer's complaint data from Twitter. We use both Naive Bayes and Support Vector Machine as classifiers. The experimental result shows that our strategy of training data optimization can give good performance for multi-class text classification model.