{"title":"研究文档级tweets情感分类的主动学习技术","authors":"Ayush Kumar, Chaitanya Kansal, Asif Ekbal","doi":"10.1109/COMSNETS.2015.7098727","DOIUrl":null,"url":null,"abstract":"Active Learning is a technique to automatically select the useful instances from the unlabelled data in such a way that, when these are augmented to the training data, overall classification performance improves. The creation of training examples otherwise involves significant amount of costs and efforts and hence, is a major constraint in the supervised algorithms. In this paper, we investigate the effectiveness of active learning for sentiment classification of Tweets. The algorithm selects the informative unlabelled data based on the concept of uncertainty sampling which dictates that only those Tweets be added to the training set for which the classifier can quickly refine its decision boundary. Our experiments on a benchmark dataset of Tweets show an overall accuracy of 83.95%, which is an increment of 6.75% over the baseline model, constructed by training a Support Vector Machine (SVM) with all the available set of features. The approach, being very general, is scalable, domain-adaptable and easy to implement for a wide variety of problems.","PeriodicalId":277593,"journal":{"name":"2015 7th International Conference on Communication Systems and Networks (COMSNETS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Investigating active learning techniques for document level sentiment classification of tweets\",\"authors\":\"Ayush Kumar, Chaitanya Kansal, Asif Ekbal\",\"doi\":\"10.1109/COMSNETS.2015.7098727\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Active Learning is a technique to automatically select the useful instances from the unlabelled data in such a way that, when these are augmented to the training data, overall classification performance improves. The creation of training examples otherwise involves significant amount of costs and efforts and hence, is a major constraint in the supervised algorithms. In this paper, we investigate the effectiveness of active learning for sentiment classification of Tweets. The algorithm selects the informative unlabelled data based on the concept of uncertainty sampling which dictates that only those Tweets be added to the training set for which the classifier can quickly refine its decision boundary. Our experiments on a benchmark dataset of Tweets show an overall accuracy of 83.95%, which is an increment of 6.75% over the baseline model, constructed by training a Support Vector Machine (SVM) with all the available set of features. The approach, being very general, is scalable, domain-adaptable and easy to implement for a wide variety of problems.\",\"PeriodicalId\":277593,\"journal\":{\"name\":\"2015 7th International Conference on Communication Systems and Networks (COMSNETS)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 7th International Conference on Communication Systems and Networks (COMSNETS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMSNETS.2015.7098727\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 7th International Conference on Communication Systems and Networks (COMSNETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSNETS.2015.7098727","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Investigating active learning techniques for document level sentiment classification of tweets
Active Learning is a technique to automatically select the useful instances from the unlabelled data in such a way that, when these are augmented to the training data, overall classification performance improves. The creation of training examples otherwise involves significant amount of costs and efforts and hence, is a major constraint in the supervised algorithms. In this paper, we investigate the effectiveness of active learning for sentiment classification of Tweets. The algorithm selects the informative unlabelled data based on the concept of uncertainty sampling which dictates that only those Tweets be added to the training set for which the classifier can quickly refine its decision boundary. Our experiments on a benchmark dataset of Tweets show an overall accuracy of 83.95%, which is an increment of 6.75% over the baseline model, constructed by training a Support Vector Machine (SVM) with all the available set of features. The approach, being very general, is scalable, domain-adaptable and easy to implement for a wide variety of problems.