{"title":"MUTUAL:基于不确定性采样的多领域情感分类","authors":"K. Katsarou, Roxana Jeney, K. Stefanidis","doi":"10.1145/3555776.3577765","DOIUrl":null,"url":null,"abstract":"Multi-domain sentiment classification trains a classifier using multiple domains and then tests the classifier on one of the domains. Importantly, no domain is assumed to have sufficient labeled data; instead, the goal is leveraging information between domains, making multi-domain sentiment classification a very realistic scenario. Typically, labeled data is costly because humans must classify it manually. In this context, we propose the MUTUAL approach that learns general and domain-specific sentence embeddings that are also context-aware due to the attention mechanism. In this work, we propose using a stacked BiLSTM-based Autoencoder with an attention mechanism to generate the two above-mentioned types of sentence embeddings. Then, using the Jensen-Shannon (JS) distance, the general sentence embeddings of the four most similar domains to the target domain are selected. The selected general sentence embeddings and the domain-specific embeddings are concatenated and fed into a dense layer for training. Evaluation results on public datasets with 16 different domains demonstrate the efficiency of our model. In addition, we propose an active learning algorithm that first applies the elliptic envelope for outlier removal to a pool of unlabeled data that the MUTUAL model then classifies. Next, the most uncertain data points are selected to be labeled based on the least confidence metric. The experiments show higher accuracy for querying 38% of the original data than random sampling.","PeriodicalId":42971,"journal":{"name":"Applied Computing Review","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"MUTUAL: Multi-Domain Sentiment Classification via Uncertainty Sampling\",\"authors\":\"K. Katsarou, Roxana Jeney, K. Stefanidis\",\"doi\":\"10.1145/3555776.3577765\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-domain sentiment classification trains a classifier using multiple domains and then tests the classifier on one of the domains. Importantly, no domain is assumed to have sufficient labeled data; instead, the goal is leveraging information between domains, making multi-domain sentiment classification a very realistic scenario. Typically, labeled data is costly because humans must classify it manually. In this context, we propose the MUTUAL approach that learns general and domain-specific sentence embeddings that are also context-aware due to the attention mechanism. In this work, we propose using a stacked BiLSTM-based Autoencoder with an attention mechanism to generate the two above-mentioned types of sentence embeddings. Then, using the Jensen-Shannon (JS) distance, the general sentence embeddings of the four most similar domains to the target domain are selected. The selected general sentence embeddings and the domain-specific embeddings are concatenated and fed into a dense layer for training. Evaluation results on public datasets with 16 different domains demonstrate the efficiency of our model. In addition, we propose an active learning algorithm that first applies the elliptic envelope for outlier removal to a pool of unlabeled data that the MUTUAL model then classifies. Next, the most uncertain data points are selected to be labeled based on the least confidence metric. The experiments show higher accuracy for querying 38% of the original data than random sampling.\",\"PeriodicalId\":42971,\"journal\":{\"name\":\"Applied Computing Review\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2023-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computing Review\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3555776.3577765\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing Review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555776.3577765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
MUTUAL: Multi-Domain Sentiment Classification via Uncertainty Sampling
Multi-domain sentiment classification trains a classifier using multiple domains and then tests the classifier on one of the domains. Importantly, no domain is assumed to have sufficient labeled data; instead, the goal is leveraging information between domains, making multi-domain sentiment classification a very realistic scenario. Typically, labeled data is costly because humans must classify it manually. In this context, we propose the MUTUAL approach that learns general and domain-specific sentence embeddings that are also context-aware due to the attention mechanism. In this work, we propose using a stacked BiLSTM-based Autoencoder with an attention mechanism to generate the two above-mentioned types of sentence embeddings. Then, using the Jensen-Shannon (JS) distance, the general sentence embeddings of the four most similar domains to the target domain are selected. The selected general sentence embeddings and the domain-specific embeddings are concatenated and fed into a dense layer for training. Evaluation results on public datasets with 16 different domains demonstrate the efficiency of our model. In addition, we propose an active learning algorithm that first applies the elliptic envelope for outlier removal to a pool of unlabeled data that the MUTUAL model then classifies. Next, the most uncertain data points are selected to be labeled based on the least confidence metric. The experiments show higher accuracy for querying 38% of the original data than random sampling.