基于神经句嵌入的微博情感分类相似度增强方法

Yong Kuan Shyang, Jasy Liew Suet Yan
{"title":"基于神经句嵌入的微博情感分类相似度增强方法","authors":"Yong Kuan Shyang, Jasy Liew Suet Yan","doi":"10.1109/IICAIET49801.2020.9257826","DOIUrl":null,"url":null,"abstract":"Machine learning models for fine-grained emotion classification can benefit from a larger pool of training data but manually expanding the emotion corpus for training is labor-intensive and time-consuming. While distant supervision provides a viable alternative, the self-labeled emotion corpus is susceptible to a high level of noise. This paper introduces a text augmentation method that can be used to efficiently expand the size of positive examples for the purpose of training by harnessing tweets collected from distant supervision (DS) that are similar to a small set of gold standard seed tweets. Tweets labeled with happiness in EmoTweet-28 (ET) are used as gold standard seeds to augment the training data to include similar DS tweets containing the happiness hashtags. Three pre-trained sentence encoders are used to encode the tweets into multidimensional vectors for similarity scoring between each DS:ET-seed pair. DS tweets with similarity scores exceeding a predefined threshold are added into an augmented set that is subsequently used to train a linear SVM classifier to distinguish between happiness and non-happiness. Our proposed text augmentation method proved to be a more effective approach that can leverage quality training data in larger quantities contributed by both carefully curated and distant supervision emotion corpora.","PeriodicalId":300885,"journal":{"name":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Text Augmentation Approach using Similarity Measures based on Neural Sentence Embeddings for Emotion Classification on Microblogs\",\"authors\":\"Yong Kuan Shyang, Jasy Liew Suet Yan\",\"doi\":\"10.1109/IICAIET49801.2020.9257826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning models for fine-grained emotion classification can benefit from a larger pool of training data but manually expanding the emotion corpus for training is labor-intensive and time-consuming. While distant supervision provides a viable alternative, the self-labeled emotion corpus is susceptible to a high level of noise. This paper introduces a text augmentation method that can be used to efficiently expand the size of positive examples for the purpose of training by harnessing tweets collected from distant supervision (DS) that are similar to a small set of gold standard seed tweets. Tweets labeled with happiness in EmoTweet-28 (ET) are used as gold standard seeds to augment the training data to include similar DS tweets containing the happiness hashtags. Three pre-trained sentence encoders are used to encode the tweets into multidimensional vectors for similarity scoring between each DS:ET-seed pair. DS tweets with similarity scores exceeding a predefined threshold are added into an augmented set that is subsequently used to train a linear SVM classifier to distinguish between happiness and non-happiness. Our proposed text augmentation method proved to be a more effective approach that can leverage quality training data in larger quantities contributed by both carefully curated and distant supervision emotion corpora.\",\"PeriodicalId\":300885,\"journal\":{\"name\":\"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IICAIET49801.2020.9257826\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IICAIET49801.2020.9257826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

用于细粒度情感分类的机器学习模型可以从更大的训练数据池中受益,但手动扩展用于训练的情感语料库是劳动密集型和耗时的。虽然远程监督提供了一个可行的选择,但自我标记的情感语料库容易受到高水平噪音的影响。本文介绍了一种文本增强方法,该方法可以通过利用从远程监督(DS)收集的推文来有效地扩展用于训练目的的正例的大小,这些推文类似于一小组金标准种子推文。在EmoTweet-28 (ET)中标记为幸福的推文被用作金标准种子来增强训练数据,以包括包含幸福标签的类似DS推文。使用三个预训练的句子编码器将推文编码成多维向量,用于DS: et种子对之间的相似性评分。相似度得分超过预定义阈值的DS推文被添加到增强集中,该增强集随后用于训练线性SVM分类器来区分快乐和不快乐。我们提出的文本增强方法被证明是一种更有效的方法,可以利用精心策划和远程监督情感语料库提供的大量高质量训练数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Text Augmentation Approach using Similarity Measures based on Neural Sentence Embeddings for Emotion Classification on Microblogs
Machine learning models for fine-grained emotion classification can benefit from a larger pool of training data but manually expanding the emotion corpus for training is labor-intensive and time-consuming. While distant supervision provides a viable alternative, the self-labeled emotion corpus is susceptible to a high level of noise. This paper introduces a text augmentation method that can be used to efficiently expand the size of positive examples for the purpose of training by harnessing tweets collected from distant supervision (DS) that are similar to a small set of gold standard seed tweets. Tweets labeled with happiness in EmoTweet-28 (ET) are used as gold standard seeds to augment the training data to include similar DS tweets containing the happiness hashtags. Three pre-trained sentence encoders are used to encode the tweets into multidimensional vectors for similarity scoring between each DS:ET-seed pair. DS tweets with similarity scores exceeding a predefined threshold are added into an augmented set that is subsequently used to train a linear SVM classifier to distinguish between happiness and non-happiness. Our proposed text augmentation method proved to be a more effective approach that can leverage quality training data in larger quantities contributed by both carefully curated and distant supervision emotion corpora.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Estimating the Number of Cameras Required for a Given Classroom for Face-based Smart Attendance System Stock Market Prediction using Ensemble of Deep Neural Networks Timed Cellular Automata for Flight Delay Scheduling Optimization Experimenting Deep Convolutional Visual Feature Learning using Compositional Subspace Representation and Fashion-MNIST An Investigation of the Effect of Different Number of Electrodes on EIT Reconstructed Images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1