LEIA: Linguistic Embeddings for the Identification of Affect.

IF 3 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS EPJ Data Science Pub Date : 2023-01-01 Epub Date: 2023-11-16 DOI:10.1140/epjds/s13688-023-00427-0
Segun Taofeek Aroyehun, Lukas Malik, Hannah Metzler, Nikolas Haimerl, Anna Di Natale, David Garcia
{"title":"LEIA: Linguistic Embeddings for the Identification of Affect.","authors":"Segun Taofeek Aroyehun, Lukas Malik, Hannah Metzler, Nikolas Haimerl, Anna Di Natale, David Garcia","doi":"10.1140/epjds/s13688-023-00427-0","DOIUrl":null,"url":null,"abstract":"<p><p>The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the production of labels used in model development. We present LEIA, a model for emotion identification in text that has been trained on a dataset of more than 6 million posts with self-annotated emotion labels for happiness, affection, sadness, anger, and fear. LEIA is based on a word masking method that enhances the learning of emotion words during model pre-training. LEIA achieves macro-F1 values of approximately 73 on three in-domain test datasets, outperforming other supervised and unsupervised methods in a strong benchmark that shows that LEIA generalizes across posts, users, and time periods. We further perform an out-of-domain evaluation on five different datasets of social media and other sources, showing LEIA's robust performance across media, data collection methods, and annotation schemes. Our results show that LEIA generalizes its classification of anger, happiness, and sadness beyond the domain it was trained on. LEIA can be applied in future research to provide better identification of emotions in text from the perspective of the writer.</p>","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"12 1","pages":"52"},"PeriodicalIF":3.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10654159/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EPJ Data Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1140/epjds/s13688-023-00427-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/11/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

The wealth of text data generated by social media has enabled new kinds of analysis of emotions with language models. These models are often trained on small and costly datasets of text annotations produced by readers who guess the emotions expressed by others in social media posts. This affects the quality of emotion identification methods due to training data size limitations and noise in the production of labels used in model development. We present LEIA, a model for emotion identification in text that has been trained on a dataset of more than 6 million posts with self-annotated emotion labels for happiness, affection, sadness, anger, and fear. LEIA is based on a word masking method that enhances the learning of emotion words during model pre-training. LEIA achieves macro-F1 values of approximately 73 on three in-domain test datasets, outperforming other supervised and unsupervised methods in a strong benchmark that shows that LEIA generalizes across posts, users, and time periods. We further perform an out-of-domain evaluation on five different datasets of social media and other sources, showing LEIA's robust performance across media, data collection methods, and annotation schemes. Our results show that LEIA generalizes its classification of anger, happiness, and sadness beyond the domain it was trained on. LEIA can be applied in future research to provide better identification of emotions in text from the perspective of the writer.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
情感识别的语言嵌入。
社交媒体产生的大量文本数据使得用语言模型分析情绪成为可能。这些模型通常是在小而昂贵的文本注释数据集上进行训练的,这些数据集是由读者在社交媒体帖子中猜测他人表达的情绪而产生的。由于训练数据大小的限制和模型开发中使用的标签生产中的噪声,这影响了情绪识别方法的质量。我们提出了LEIA,这是一个用于文本情感识别的模型,该模型已经在超过600万篇文章的数据集上进行了训练,这些帖子具有自我注释的情绪标签,包括快乐、情感、悲伤、愤怒和恐惧。LEIA是一种基于词掩蔽的方法,该方法在模型预训练过程中增强了情绪词的学习。LEIA在三个域内测试数据集上实现了大约73的宏f1值,在一个强大的基准测试中优于其他有监督和无监督的方法,这表明LEIA泛化了帖子、用户和时间段。我们进一步对社交媒体和其他来源的五个不同数据集进行了域外评估,显示了LEIA在媒体、数据收集方法和注释方案上的稳健性能。我们的结果表明,LEIA将其对愤怒、快乐和悲伤的分类推广到了它所训练的领域之外。LEIA可以应用于未来的研究,从作者的角度更好地识别文本中的情绪。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
EPJ Data Science
EPJ Data Science MATHEMATICS, INTERDISCIPLINARY APPLICATIONS -
CiteScore
6.10
自引率
5.60%
发文量
53
审稿时长
13 weeks
期刊介绍: EPJ Data Science covers a broad range of research areas and applications and particularly encourages contributions from techno-socio-economic systems, where it comprises those research lines that now regard the digital “tracks” of human beings as first-order objects for scientific investigation. Topics include, but are not limited to, human behavior, social interaction (including animal societies), economic and financial systems, management and business networks, socio-technical infrastructure, health and environmental systems, the science of science, as well as general risk and crisis scenario forecasting up to and including policy advice.
期刊最新文献
Estimating work engagement from online chat tools Language and the use of law are predictive of judge gender and seniority Connection between climatic change and international food prices: evidence from robust long-range cross-correlation and variable-lag transfer entropy with sliding windows approach Keep your friends close, and your enemies closer: structural properties of negative relationships on Twitter Analyzing user ideologies and shared news during the 2019 argentinian elections
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1