Combining sentiment analysis classifiers to explore multilingual news articles covering London 2012 and Rio 2016 Olympics.

Caio Mello, Gullal S Cheema, Gaurish Thakkar
{"title":"Combining sentiment analysis classifiers to explore multilingual news articles covering London 2012 and Rio 2016 Olympics.","authors":"Caio Mello, Gullal S Cheema, Gaurish Thakkar","doi":"10.1007/s42803-022-00052-9","DOIUrl":null,"url":null,"abstract":"<p><p>This study aims to present an approach for the challenges of working with Sentiment Analysis (SA) applied to news articles in a multilingual corpus. It looks at the use and combination of multiple algorithms to explore news articles published in English and Portuguese. It presents a methodology that starts by evaluating and combining four SA algorithms (SenticNet, SentiStrength, Vader and BERT, being BERT trained in two datasets) to improve the quality of outputs. A thorough review of the algorithms' limitations is conducted using SHAP, an explainable AI tool, resulting in a list of issues that researchers must consider before using SA to interpret texts. We propose a combination of the three best classifiers (Vader, Amazon BERT and Sent140 BERT) to identify contradictory results, improving the quality of the positive, neutral and negative labels assigned to the texts. Challenges with translation are addressed, indicating possible solutions for non-English corpora. As a case study, the method is applied to the study of the media coverage of London 2012 and Rio 2016 Olympic legacies. The combination of different classifiers has proved to be efficient, revealing the unbalance between the media coverage of London 2012, much more positive, and Rio 2016, more negative.</p>","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":" ","pages":"1-27"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9667437/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of digital humanities","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s42803-022-00052-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This study aims to present an approach for the challenges of working with Sentiment Analysis (SA) applied to news articles in a multilingual corpus. It looks at the use and combination of multiple algorithms to explore news articles published in English and Portuguese. It presents a methodology that starts by evaluating and combining four SA algorithms (SenticNet, SentiStrength, Vader and BERT, being BERT trained in two datasets) to improve the quality of outputs. A thorough review of the algorithms' limitations is conducted using SHAP, an explainable AI tool, resulting in a list of issues that researchers must consider before using SA to interpret texts. We propose a combination of the three best classifiers (Vader, Amazon BERT and Sent140 BERT) to identify contradictory results, improving the quality of the positive, neutral and negative labels assigned to the texts. Challenges with translation are addressed, indicating possible solutions for non-English corpora. As a case study, the method is applied to the study of the media coverage of London 2012 and Rio 2016 Olympic legacies. The combination of different classifiers has proved to be efficient, revealing the unbalance between the media coverage of London 2012, much more positive, and Rio 2016, more negative.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
结合情感分析分类器,探索报道 2012 年伦敦奥运会和 2016 年里约奥运会的多语言新闻文章。
本研究旨在提出一种方法,以应对将情感分析(SA)应用于多语言语料库中的新闻文章所带来的挑战。它着眼于使用和组合多种算法来探索用英语和葡萄牙语发表的新闻文章。它提出了一种方法,首先评估和组合四种情感分析算法(SenticNet、SentiStrength、Vader 和 BERT,其中 BERT 在两个数据集中进行了训练),以提高输出的质量。我们使用 SHAP(一种可解释的人工智能工具)对算法的局限性进行了全面审查,得出了研究人员在使用 SA 解释文本之前必须考虑的一系列问题。我们建议将三种最佳分类器(Vader、Amazon BERT 和 Sent140 BERT)结合起来,以识别相互矛盾的结果,从而提高分配给文本的正面、中性和负面标签的质量。该方法解决了翻译方面的难题,并指出了非英语语料库的可能解决方案。作为案例研究,该方法被应用于 2012 年伦敦奥运会和 2016 年里约奥运会遗产的媒体报道研究。事实证明,不同分类器的组合是有效的,揭示了 2012 年伦敦奥运会媒体报道(正面报道较多)和 2016 年里约奥运会媒体报道(负面报道较多)之间的不平衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
RelChronVis: an interactive web application for visualizing the relative chronology of language changes DOLD: a digital platform for conducting online language experiments and surveys Correction: Committing to reproducibility and explainability: using Git as a research journal Correction: Committing to reproducibility and explainability: using Git as a research journal Open Times: The future of critique in the age of (un)replicability
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1