佛兰德语Twitter情感分析的不同建模技术比较

Manon Reusens, Michael Reusens, Marc Callens, S. vanden Broucke, B. Baesens
{"title":"佛兰德语Twitter情感分析的不同建模技术比较","authors":"Manon Reusens, Michael Reusens, Marc Callens, S. vanden Broucke, B. Baesens","doi":"10.3390/analytics1020009","DOIUrl":null,"url":null,"abstract":"Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset containing Flemish tweets. The key contribution of our paper lies in its innovative experimental design: we compared different preprocessing techniques and vector representations to find the best-performing combination for a Flemish dataset. We compared models belonging to four different categories: lexicon-based methods, traditional machine-learning models, neural networks, and attention-based models. We found that more preprocessing leads to better results, but the best-performing vector representation approach depends on the model applied. Moreover, an immense gap was observed between the performances of the lexicon-based approaches and those of the other models. The traditional machine learning approaches and the neural networks produced similar results, but the attention-based model was the best-performing technique. Nevertheless, a tradeoff should be made between computational expenses and performance gains.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"249 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Comparison of Different Modeling Techniques for Flemish Twitter Sentiment Analysis\",\"authors\":\"Manon Reusens, Michael Reusens, Marc Callens, S. vanden Broucke, B. Baesens\",\"doi\":\"10.3390/analytics1020009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset containing Flemish tweets. The key contribution of our paper lies in its innovative experimental design: we compared different preprocessing techniques and vector representations to find the best-performing combination for a Flemish dataset. We compared models belonging to four different categories: lexicon-based methods, traditional machine-learning models, neural networks, and attention-based models. We found that more preprocessing leads to better results, but the best-performing vector representation approach depends on the model applied. Moreover, an immense gap was observed between the performances of the lexicon-based approaches and those of the other models. The traditional machine learning approaches and the neural networks produced similar results, but the attention-based model was the best-performing technique. Nevertheless, a tradeoff should be made between computational expenses and performance gains.\",\"PeriodicalId\":93078,\"journal\":{\"name\":\"Big data analytics\",\"volume\":\"249 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Big data analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/analytics1020009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big data analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/analytics1020009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在过去的几十年里,像Twitter这样的微博网站使得情感分析研究越来越受欢迎。然而,大多数研究都集中在英语上,这使得其他语言的代表性不足。因此,在本文中,我们使用包含弗拉芒语推文的新数据集比较了几种情感分析建模技术。我们论文的关键贡献在于其创新的实验设计:我们比较了不同的预处理技术和向量表示,以找到弗拉芒语数据集的最佳表现组合。我们比较了四种不同类别的模型:基于词典的方法、传统的机器学习模型、神经网络和基于注意力的模型。我们发现,更多的预处理会带来更好的结果,但最佳表现的向量表示方法取决于所应用的模型。此外,基于词典的方法与其他模型的性能之间存在巨大差距。传统的机器学习方法和神经网络产生了类似的结果,但基于注意力的模型是表现最好的技术。然而,应该在计算开销和性能增益之间进行权衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comparison of Different Modeling Techniques for Flemish Twitter Sentiment Analysis
Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset containing Flemish tweets. The key contribution of our paper lies in its innovative experimental design: we compared different preprocessing techniques and vector representations to find the best-performing combination for a Flemish dataset. We compared models belonging to four different categories: lexicon-based methods, traditional machine-learning models, neural networks, and attention-based models. We found that more preprocessing leads to better results, but the best-performing vector representation approach depends on the model applied. Moreover, an immense gap was observed between the performances of the lexicon-based approaches and those of the other models. The traditional machine learning approaches and the neural networks produced similar results, but the attention-based model was the best-performing technique. Nevertheless, a tradeoff should be made between computational expenses and performance gains.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
5 weeks
期刊最新文献
A Comparative Analysis of VirLock and Bacteriophage ϕ6 through the Lens of Game Theory Can Oral Grades Predict Final Examination Scores? Case Study in a Higher Education Military Academy Relating the Ramsay Quotient Model to the Classical D-Scoring Rule An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market A Novel Curve Clustering Method for Functional Data: Applications to COVID-19 and Financial Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1