Sentiment Analysis of "AUTOSTRADA.INFO/RU" Users’ Comments

Q3 Mathematics SPIIRAS Proceedings Pub Date : 2019-04-12 DOI:10.15622/SP.18.2.354-389
Y. Seliverstov, Viktoriya Chigur, Arseniy Sazanov, S. Seliverstov, A. Svistunova
{"title":"Sentiment Analysis of \"AUTOSTRADA.INFO/RU\" Users’ Comments","authors":"Y. Seliverstov, Viktoriya Chigur, Arseniy Sazanov, S. Seliverstov, A. Svistunova","doi":"10.15622/SP.18.2.354-389","DOIUrl":null,"url":null,"abstract":"As a result of the analysis, it was revealed that social networks (Vkontakte, Facebook), thematic communities in microblogging networks (Twitter), resources for travelers (TripAdvisor), transport portals (Autostrada) are a source of up-to-date and operational information about the traffic situation, the quality of transport services and passenger satisfaction with the quality of levels of transport services. However, the existing transport monitoring systems do not contain software tools capable of collecting and analyzing traffic information located in the Internet environment. This paper discusses the task of building a system for automatically retrieving and classifying road traffic information from transport Internet portals and testing the developed system for analyzing the transport networks of Crimea and the city of Sevastopol. To solve this problem, an analysis of open source libraries for thematic data collection and analysis was carried out. An algorithm for extracting and analyzing texts has been developed. A crawler was developed using the Scrapy package in Python3, and user feedback from the portal http://autostrada.info/ru was collected on the state of the transport system of Crimea and the city of Sevastopol. For texts lemmatization and vector text transformation, the tf, idf, tf-idf methods and their implementation in the Scikit-Learn library were considered: CountVectorizer and TF-IDF Vectorizer. For word processing, Bag-of-Words and n-gram methods were considered. During the development of the classifier model, the naive Bayes algorithm (MultinomialNB) and the linear classifier model with optimization of the stochastic gradient descent (SGDClassifier) were used. As a training sample, a corpus of 225,000 labeled texts from the Twitter resource was used. The classifier was trained, during which the cross-validation strategy and the ShuffleSplit method were used. Testing and comparison of the results of the pitch classification were carried out. According to the results of validation, the linear model with the n-gram scheme [1, 3] and the vectorizer TF-IDF turned out to be the best. During the approbation of the developed system, the collection and analysis of reviews related to the quality of transport networks of the Republic of Crimea and the city of Sevastopol were conducted. Conclusions are drawn and prospects for further functional development of the developed tools are defined.","PeriodicalId":53447,"journal":{"name":"SPIIRAS Proceedings","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SPIIRAS Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15622/SP.18.2.354-389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 4

Abstract

As a result of the analysis, it was revealed that social networks (Vkontakte, Facebook), thematic communities in microblogging networks (Twitter), resources for travelers (TripAdvisor), transport portals (Autostrada) are a source of up-to-date and operational information about the traffic situation, the quality of transport services and passenger satisfaction with the quality of levels of transport services. However, the existing transport monitoring systems do not contain software tools capable of collecting and analyzing traffic information located in the Internet environment. This paper discusses the task of building a system for automatically retrieving and classifying road traffic information from transport Internet portals and testing the developed system for analyzing the transport networks of Crimea and the city of Sevastopol. To solve this problem, an analysis of open source libraries for thematic data collection and analysis was carried out. An algorithm for extracting and analyzing texts has been developed. A crawler was developed using the Scrapy package in Python3, and user feedback from the portal http://autostrada.info/ru was collected on the state of the transport system of Crimea and the city of Sevastopol. For texts lemmatization and vector text transformation, the tf, idf, tf-idf methods and their implementation in the Scikit-Learn library were considered: CountVectorizer and TF-IDF Vectorizer. For word processing, Bag-of-Words and n-gram methods were considered. During the development of the classifier model, the naive Bayes algorithm (MultinomialNB) and the linear classifier model with optimization of the stochastic gradient descent (SGDClassifier) were used. As a training sample, a corpus of 225,000 labeled texts from the Twitter resource was used. The classifier was trained, during which the cross-validation strategy and the ShuffleSplit method were used. Testing and comparison of the results of the pitch classification were carried out. According to the results of validation, the linear model with the n-gram scheme [1, 3] and the vectorizer TF-IDF turned out to be the best. During the approbation of the developed system, the collection and analysis of reviews related to the quality of transport networks of the Republic of Crimea and the city of Sevastopol were conducted. Conclusions are drawn and prospects for further functional development of the developed tools are defined.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AUTOSTRADA的情感分析。INFO/RU“用户评论”
分析结果显示,社交网络(Vkontakte、Facebook)、微博网络中的主题社区(Twitter)、旅行者资源(TripAdvisor)、交通门户网站(Autostrada)是交通状况、交通服务质量和乘客对交通服务质量水平满意度的最新和运营信息来源。然而,现有的交通监控系统不包含能够收集和分析位于互联网环境中的交通信息的软件工具。本文讨论了建立一个自动检索和分类来自交通互联网门户的道路交通信息系统的任务,并测试开发的系统分析克里米亚和塞瓦斯托波尔市的交通网络。为解决这一问题,对开源库进行专题数据收集和分析。本文提出了一种文本提取和分析算法。使用Python3中的Scrapy包开发了一个爬虫,并从门户网站http://autostrada.info/ru收集了有关克里米亚和塞瓦斯托波尔市交通系统状况的用户反馈。对于文本词序化和矢量文本转换,考虑了tf, idf, tf-idf方法及其在Scikit-Learn库中的实现:CountVectorizer和tf-idf Vectorizer。对于文字处理,我们考虑了Bag-of-Words和n-gram方法。在分类器模型的开发过程中,使用了朴素贝叶斯算法(MultinomialNB)和随机梯度下降优化的线性分类器模型(SGDClassifier)。作为训练样本,使用了来自Twitter资源的225,000个标记文本的语料库。对分类器进行训练,训练过程中使用交叉验证策略和ShuffleSplit方法。对音高分类结果进行了测试和比较。根据验证结果,采用n-gram方案的线性模型[1,3]和矢量器TF-IDF是最好的。在批准开发的系统期间,收集和分析了与克里米亚共和国和塞瓦斯托波尔市运输网络质量有关的审查。最后给出了结论,并对所开发工具的进一步功能开发进行了展望。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
SPIIRAS Proceedings
SPIIRAS Proceedings Mathematics-Applied Mathematics
CiteScore
1.90
自引率
0.00%
发文量
0
审稿时长
14 weeks
期刊介绍: The SPIIRAS Proceedings journal publishes scientific, scientific-educational, scientific-popular papers relating to computer science, automation, applied mathematics, interdisciplinary research, as well as information technology, the theoretical foundations of computer science (such as mathematical and related to other scientific disciplines), information security and information protection, decision making and artificial intelligence, mathematical modeling, informatization.
期刊最新文献
Applied Aspects of Optimization of Orbital Structures of Satellite Systems by Specifying Parameters of Orbital Motion Mathematical Modeling of Optimal Measures to Counter Economic Sanctions Methodology for Development of Event-driven Software Systems using CIAO Specification Language Complex of Models for Network Security Assessment of Industrial Automated Control Systems Automated Search for Locations of Detention Groups to Reduce Security Activity Risk
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1