Adaptive training instance selection for cross-domain emotion identification

Wenbo Wang, Lu Chen, Keke Chen, K. Thirunarayan, A. Sheth
{"title":"Adaptive training instance selection for cross-domain emotion identification","authors":"Wenbo Wang, Lu Chen, Keke Chen, K. Thirunarayan, A. Sheth","doi":"10.1145/3106426.3106457","DOIUrl":null,"url":null,"abstract":"This paper exploits a large number of self-labeled emotion tweets as the training data from the source domain to improve emotion identification in target domains (i.e., blogs and fairy tales), where there is a short supply of labeled data. Due to the noisy and ambiguous nature of self-labeled emotion training data, the existing domain adaptation methods that typically depend on high-quality labeled source-domain data do not work satisfactorily. This paper describes an adaptive source-domain training instance selection method to address the problem of noisy source-domain training data. The proposed approach can effectively identify the most informative training examples based on three carefully designed measures: consistency, diversity, and similarity. It uses an iterative method that consists of the following steps in each iteration: selecting informative samples from the source domain with the informativeness measures, merging with the target-domain training data, evaluating the performance of learned classifier for the target domain, and updating the informativeness measures for the next iteration. It stops until no new training instance is selected or in a designated number of iterations. Experiments show that our approach performs effectively for cross-domain emotion identification and consistently outperforms baseline approaches across four domains.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":"294 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106426.3106457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

This paper exploits a large number of self-labeled emotion tweets as the training data from the source domain to improve emotion identification in target domains (i.e., blogs and fairy tales), where there is a short supply of labeled data. Due to the noisy and ambiguous nature of self-labeled emotion training data, the existing domain adaptation methods that typically depend on high-quality labeled source-domain data do not work satisfactorily. This paper describes an adaptive source-domain training instance selection method to address the problem of noisy source-domain training data. The proposed approach can effectively identify the most informative training examples based on three carefully designed measures: consistency, diversity, and similarity. It uses an iterative method that consists of the following steps in each iteration: selecting informative samples from the source domain with the informativeness measures, merging with the target-domain training data, evaluating the performance of learned classifier for the target domain, and updating the informativeness measures for the next iteration. It stops until no new training instance is selected or in a designated number of iterations. Experiments show that our approach performs effectively for cross-domain emotion identification and consistently outperforms baseline approaches across four domains.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
跨领域情感识别的自适应训练实例选择
本文利用大量自标记的情感推文作为源域的训练数据,改进了标记数据不足的目标域(即博客和童话)的情感识别。由于自标记情感训练数据具有噪声和模糊性,现有的基于高质量标记源域数据的领域自适应方法效果不理想。针对源域训练数据中存在噪声的问题,提出了一种自适应源域训练实例选择方法。所提出的方法可以基于三个精心设计的度量:一致性、多样性和相似性,有效地识别信息最多的训练样例。它采用迭代方法,在每次迭代中包括以下步骤:从具有信息度量的源域中选择信息样本,与目标域训练数据合并,评估学习到的分类器在目标域的性能,更新下一次迭代的信息度量。它会停止,直到没有新的训练实例被选择或在指定的迭代次数。实验表明,我们的方法在跨领域情感识别方面表现有效,并且在四个领域中始终优于基线方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
WIMS 2020: The 10th International Conference on Web Intelligence, Mining and Semantics, Biarritz, France, June 30 - July 3, 2020 A deep learning approach for web service interactions Partial sums-based P-Rank computation in information networks Mining ordinal data under human response uncertainty Haste makes waste: a case to favour voting bots
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1