基于神经句嵌入的微博情感分类相似度增强方法

2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET) Pub Date : 2020-09-26 DOI:10.1109/IICAIET49801.2020.9257826

Yong Kuan Shyang, Jasy Liew Suet Yan

{"title":"基于神经句嵌入的微博情感分类相似度增强方法","authors":"Yong Kuan Shyang, Jasy Liew Suet Yan","doi":"10.1109/IICAIET49801.2020.9257826","DOIUrl":null,"url":null,"abstract":"Machine learning models for fine-grained emotion classification can benefit from a larger pool of training data but manually expanding the emotion corpus for training is labor-intensive and time-consuming. While distant supervision provides a viable alternative, the self-labeled emotion corpus is susceptible to a high level of noise. This paper introduces a text augmentation method that can be used to efficiently expand the size of positive examples for the purpose of training by harnessing tweets collected from distant supervision (DS) that are similar to a small set of gold standard seed tweets. Tweets labeled with happiness in EmoTweet-28 (ET) are used as gold standard seeds to augment the training data to include similar DS tweets containing the happiness hashtags. Three pre-trained sentence encoders are used to encode the tweets into multidimensional vectors for similarity scoring between each DS:ET-seed pair. DS tweets with similarity scores exceeding a predefined threshold are added into an augmented set that is subsequently used to train a linear SVM classifier to distinguish between happiness and non-happiness. Our proposed text augmentation method proved to be a more effective approach that can leverage quality training data in larger quantities contributed by both carefully curated and distant supervision emotion corpora.","PeriodicalId":300885,"journal":{"name":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Text Augmentation Approach using Similarity Measures based on Neural Sentence Embeddings for Emotion Classification on Microblogs\",\"authors\":\"Yong Kuan Shyang, Jasy Liew Suet Yan\",\"doi\":\"10.1109/IICAIET49801.2020.9257826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning models for fine-grained emotion classification can benefit from a larger pool of training data but manually expanding the emotion corpus for training is labor-intensive and time-consuming. While distant supervision provides a viable alternative, the self-labeled emotion corpus is susceptible to a high level of noise. This paper introduces a text augmentation method that can be used to efficiently expand the size of positive examples for the purpose of training by harnessing tweets collected from distant supervision (DS) that are similar to a small set of gold standard seed tweets. Tweets labeled with happiness in EmoTweet-28 (ET) are used as gold standard seeds to augment the training data to include similar DS tweets containing the happiness hashtags. Three pre-trained sentence encoders are used to encode the tweets into multidimensional vectors for similarity scoring between each DS:ET-seed pair. DS tweets with similarity scores exceeding a predefined threshold are added into an augmented set that is subsequently used to train a linear SVM classifier to distinguish between happiness and non-happiness. Our proposed text augmentation method proved to be a more effective approach that can leverage quality training data in larger quantities contributed by both carefully curated and distant supervision emotion corpora.\",\"PeriodicalId\":300885,\"journal\":{\"name\":\"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IICAIET49801.2020.9257826\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IICAIET49801.2020.9257826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

用于细粒度情感分类的机器学习模型可以从更大的训练数据池中受益，但手动扩展用于训练的情感语料库是劳动密集型和耗时的。虽然远程监督提供了一个可行的选择，但自我标记的情感语料库容易受到高水平噪音的影响。本文介绍了一种文本增强方法，该方法可以通过利用从远程监督(DS)收集的推文来有效地扩展用于训练目的的正例的大小，这些推文类似于一小组金标准种子推文。在EmoTweet-28 (ET)中标记为幸福的推文被用作金标准种子来增强训练数据，以包括包含幸福标签的类似DS推文。使用三个预训练的句子编码器将推文编码成多维向量，用于DS: et种子对之间的相似性评分。相似度得分超过预定义阈值的DS推文被添加到增强集中，该增强集随后用于训练线性SVM分类器来区分快乐和不快乐。我们提出的文本增强方法被证明是一种更有效的方法，可以利用精心策划和远程监督情感语料库提供的大量高质量训练数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Text Augmentation Approach using Similarity Measures based on Neural Sentence Embeddings for Emotion Classification on Microblogs

Machine learning models for fine-grained emotion classification can benefit from a larger pool of training data but manually expanding the emotion corpus for training is labor-intensive and time-consuming. While distant supervision provides a viable alternative, the self-labeled emotion corpus is susceptible to a high level of noise. This paper introduces a text augmentation method that can be used to efficiently expand the size of positive examples for the purpose of training by harnessing tweets collected from distant supervision (DS) that are similar to a small set of gold standard seed tweets. Tweets labeled with happiness in EmoTweet-28 (ET) are used as gold standard seeds to augment the training data to include similar DS tweets containing the happiness hashtags. Three pre-trained sentence encoders are used to encode the tweets into multidimensional vectors for similarity scoring between each DS:ET-seed pair. DS tweets with similarity scores exceeding a predefined threshold are added into an augmented set that is subsequently used to train a linear SVM classifier to distinguish between happiness and non-happiness. Our proposed text augmentation method proved to be a more effective approach that can leverage quality training data in larger quantities contributed by both carefully curated and distant supervision emotion corpora.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)

自引率

0.00%

发文量