佛兰德语Twitter情感分析的不同建模技术比较

Big data analytics Pub Date : 2022-10-18 DOI:10.3390/analytics1020009

Manon Reusens, Michael Reusens, Marc Callens, S. vanden Broucke, B. Baesens

{"title":"佛兰德语Twitter情感分析的不同建模技术比较","authors":"Manon Reusens, Michael Reusens, Marc Callens, S. vanden Broucke, B. Baesens","doi":"10.3390/analytics1020009","DOIUrl":null,"url":null,"abstract":"Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset containing Flemish tweets. The key contribution of our paper lies in its innovative experimental design: we compared different preprocessing techniques and vector representations to find the best-performing combination for a Flemish dataset. We compared models belonging to four different categories: lexicon-based methods, traditional machine-learning models, neural networks, and attention-based models. We found that more preprocessing leads to better results, but the best-performing vector representation approach depends on the model applied. Moreover, an immense gap was observed between the performances of the lexicon-based approaches and those of the other models. The traditional machine learning approaches and the neural networks produced similar results, but the attention-based model was the best-performing technique. Nevertheless, a tradeoff should be made between computational expenses and performance gains.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"249 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Comparison of Different Modeling Techniques for Flemish Twitter Sentiment Analysis\",\"authors\":\"Manon Reusens, Michael Reusens, Marc Callens, S. vanden Broucke, B. Baesens\",\"doi\":\"10.3390/analytics1020009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset containing Flemish tweets. The key contribution of our paper lies in its innovative experimental design: we compared different preprocessing techniques and vector representations to find the best-performing combination for a Flemish dataset. We compared models belonging to four different categories: lexicon-based methods, traditional machine-learning models, neural networks, and attention-based models. We found that more preprocessing leads to better results, but the best-performing vector representation approach depends on the model applied. Moreover, an immense gap was observed between the performances of the lexicon-based approaches and those of the other models. The traditional machine learning approaches and the neural networks produced similar results, but the attention-based model was the best-performing technique. Nevertheless, a tradeoff should be made between computational expenses and performance gains.\",\"PeriodicalId\":93078,\"journal\":{\"name\":\"Big data analytics\",\"volume\":\"249 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Big data analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/analytics1020009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big data analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/analytics1020009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在过去的几十年里，像Twitter这样的微博网站使得情感分析研究越来越受欢迎。然而，大多数研究都集中在英语上，这使得其他语言的代表性不足。因此，在本文中，我们使用包含弗拉芒语推文的新数据集比较了几种情感分析建模技术。我们论文的关键贡献在于其创新的实验设计:我们比较了不同的预处理技术和向量表示，以找到弗拉芒语数据集的最佳表现组合。我们比较了四种不同类别的模型:基于词典的方法、传统的机器学习模型、神经网络和基于注意力的模型。我们发现，更多的预处理会带来更好的结果，但最佳表现的向量表示方法取决于所应用的模型。此外，基于词典的方法与其他模型的性能之间存在巨大差距。传统的机器学习方法和神经网络产生了类似的结果，但基于注意力的模型是表现最好的技术。然而，应该在计算开销和性能增益之间进行权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparison of Different Modeling Techniques for Flemish Twitter Sentiment Analysis

Microblogging websites such as Twitter have caused sentiment analysis research to increase in popularity over the last several decades. However, most studies focus on the English language, which leaves other languages underrepresented. Therefore, in this paper, we compare several modeling techniques for sentiment analysis using a new dataset containing Flemish tweets. The key contribution of our paper lies in its innovative experimental design: we compared different preprocessing techniques and vector representations to find the best-performing combination for a Flemish dataset. We compared models belonging to four different categories: lexicon-based methods, traditional machine-learning models, neural networks, and attention-based models. We found that more preprocessing leads to better results, but the best-performing vector representation approach depends on the model applied. Moreover, an immense gap was observed between the performances of the lexicon-based approaches and those of the other models. The traditional machine learning approaches and the neural networks produced similar results, but the attention-based model was the best-performing technique. Nevertheless, a tradeoff should be made between computational expenses and performance gains.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Big data analytics

自引率

0.00%

发文量

审稿时长

5 weeks