Distributed Sentiment Analysis for Geo-Tagged Twitter Data

Muhammed Said Zengin, Rabia Arslan, Mehmet Burak Akgün
{"title":"Distributed Sentiment Analysis for Geo-Tagged Twitter Data","authors":"Muhammed Said Zengin, Rabia Arslan, Mehmet Burak Akgün","doi":"10.1109/SIU55565.2022.9864702","DOIUrl":null,"url":null,"abstract":"The ever-increasing frequency of sharing on social media makes these platforms one of the primary sources of data for computational social science studies. Similarly, examining and analyzing large scale social media data-sets is crucial for governments as well as companies. However, as the amount of data increases, insights that need to be derived from the data using artificial intelligence based models becomes more and more demanding in terms of processing power. In fact, hardware requirements might dramatically increase if the insights are needed under real-time or near-real time constraints. In this study, we developed a distributed sentiment analysis model that utilizes a large social media data-set. 16 million tweets have been collected and grouped by the originating city. The sentiment analysis model was produced by fine-tuning the pre-trained BERT model. Distributed big data analytics engine, Apache Spark, is used to execute the trained model in a distributed fashion. For evaluation purposes, the prediction time on a single compute unit is compared with the distributed prediction time. Sentiment analysis model has been executed separately for each of the data-groups corresponding to 81 provinces. The data-set containing 16 million tweets used in this study, the Turkish sentiment analysis model produced, the distributed prediction code developed for Apache Spark and all the results of the study can be accessed from the address https://distributed-sentiment-analysis.github.io/.","PeriodicalId":115446,"journal":{"name":"2022 30th Signal Processing and Communications Applications Conference (SIU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU55565.2022.9864702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The ever-increasing frequency of sharing on social media makes these platforms one of the primary sources of data for computational social science studies. Similarly, examining and analyzing large scale social media data-sets is crucial for governments as well as companies. However, as the amount of data increases, insights that need to be derived from the data using artificial intelligence based models becomes more and more demanding in terms of processing power. In fact, hardware requirements might dramatically increase if the insights are needed under real-time or near-real time constraints. In this study, we developed a distributed sentiment analysis model that utilizes a large social media data-set. 16 million tweets have been collected and grouped by the originating city. The sentiment analysis model was produced by fine-tuning the pre-trained BERT model. Distributed big data analytics engine, Apache Spark, is used to execute the trained model in a distributed fashion. For evaluation purposes, the prediction time on a single compute unit is compared with the distributed prediction time. Sentiment analysis model has been executed separately for each of the data-groups corresponding to 81 provinces. The data-set containing 16 million tweets used in this study, the Turkish sentiment analysis model produced, the distributed prediction code developed for Apache Spark and all the results of the study can be accessed from the address https://distributed-sentiment-analysis.github.io/.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
地理标记Twitter数据的分布式情感分析
社交媒体上不断增加的分享频率使这些平台成为计算社会科学研究的主要数据来源之一。同样,检查和分析大规模的社交媒体数据集对政府和公司都至关重要。然而,随着数据量的增加,需要使用基于人工智能的模型从数据中获得的见解在处理能力方面变得越来越苛刻。事实上,如果在实时或接近实时的限制下需要洞察,硬件需求可能会急剧增加。在这项研究中,我们开发了一个利用大型社交媒体数据集的分布式情感分析模型。已经收集了1600万条推文,并按发推城市进行了分组。情感分析模型是通过对预训练的BERT模型进行微调而产生的。使用分布式大数据分析引擎Apache Spark以分布式方式执行训练好的模型。为了评估目的,将单个计算单元上的预测时间与分布式预测时间进行比较。对81个省份对应的每个数据组分别执行情感分析模型。本研究中使用的包含1600万条tweet的数据集、生成的土耳其情绪分析模型、为Apache Spark开发的分布式预测代码以及所有研究结果都可以从https://distributed-sentiment-analysis.github.io/访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Traffic Prediction with Peak-Aware Temporal Graph Convolutional Networks Artificial Neural Network Based Fault Diagnostic System for Wind Turbines Remaining Useful Life Prediction on C-MAPSS Dataset via Joint Autoencoder-Regression Architecture A New Fast Walsh Hadamard Transform Spread UW-Optical-OFDM Waveform Indoor Localization with Transfer Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1