使用机器学习技术分析Covid-19发病的情绪

IF 1.6 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal Pub Date : 2022-06-06 DOI:10.14201/adcaij.27348

Vishakha Arya, A. Mishra, Alfonso González-Briones

{"title":"使用机器学习技术分析Covid-19发病的情绪","authors":"Vishakha Arya, A. Mishra, Alfonso González-Briones","doi":"10.14201/adcaij.27348","DOIUrl":null,"url":null,"abstract":"The novel coronavirus (Covid-19) pandemic has struck the whole world and is one of the most striking topics on social media platforms. Sentiment outbreak on social media enduring various thoughts, opinions, and emotions about the Covid-19 disease, expressing views they are feeling presently. Analyzing sentiments helps to yield better results. Gathering data from different blogging sites like Facebook, Twitter, Weibo, YouTube, Instagram, etc., and Twitter is the largest repository. Videos, text, and audio were also collected from repositories. Sentiment analysis uses opinion mining to acquire the sentiments of its users and categorizes them accordingly as positive, negative, and neutral. Analytical and machine learning classification is implemented to 3586 tweets collected in different time frames. In this paper, sentiment analysis was performed on tweets accumulated during the Covid-19 pandemic, Coronavirus disease. Tweets are collected from the Twitter database using Hydrator a web-based application. Data-preprocessing removes all the noise, outliers from the raw data. With Natural Language Toolkit (NLTK), text classification for sentiment analysis and calculate the score subjective polarity, counts, and sentiment distribution. N-gram is used in textual mining -and Natural Language Processing for a continuous sequence of words in a text or document applying uni-gram, bi-gram, and tri-gram for statistical computation. Term frequency and Inverse document frequency (TF-IDF) is a feature extraction technique that converts textual data into numeric form. Vectorize data feed to our model to obtain insights from linguistic data. Linear SVC, MultinomialNB, GBM, and Random Forest classifier with Tfidf classification model applied to our proposed model. Linear Support Vector classification performs better than the other two classifiers. Results depict that RF performs better.\n ","PeriodicalId":42597,"journal":{"name":"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal","volume":"190 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Analysis of sentiments on the onset of Covid-19 using Machine Learning Techniques\",\"authors\":\"Vishakha Arya, A. Mishra, Alfonso González-Briones\",\"doi\":\"10.14201/adcaij.27348\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The novel coronavirus (Covid-19) pandemic has struck the whole world and is one of the most striking topics on social media platforms. Sentiment outbreak on social media enduring various thoughts, opinions, and emotions about the Covid-19 disease, expressing views they are feeling presently. Analyzing sentiments helps to yield better results. Gathering data from different blogging sites like Facebook, Twitter, Weibo, YouTube, Instagram, etc., and Twitter is the largest repository. Videos, text, and audio were also collected from repositories. Sentiment analysis uses opinion mining to acquire the sentiments of its users and categorizes them accordingly as positive, negative, and neutral. Analytical and machine learning classification is implemented to 3586 tweets collected in different time frames. In this paper, sentiment analysis was performed on tweets accumulated during the Covid-19 pandemic, Coronavirus disease. Tweets are collected from the Twitter database using Hydrator a web-based application. Data-preprocessing removes all the noise, outliers from the raw data. With Natural Language Toolkit (NLTK), text classification for sentiment analysis and calculate the score subjective polarity, counts, and sentiment distribution. N-gram is used in textual mining -and Natural Language Processing for a continuous sequence of words in a text or document applying uni-gram, bi-gram, and tri-gram for statistical computation. Term frequency and Inverse document frequency (TF-IDF) is a feature extraction technique that converts textual data into numeric form. Vectorize data feed to our model to obtain insights from linguistic data. Linear SVC, MultinomialNB, GBM, and Random Forest classifier with Tfidf classification model applied to our proposed model. Linear Support Vector classification performs better than the other two classifiers. Results depict that RF performs better.\\n \",\"PeriodicalId\":42597,\"journal\":{\"name\":\"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal\",\"volume\":\"190 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2022-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14201/adcaij.27348\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14201/adcaij.27348","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 12

摘要

新型冠状病毒(Covid-19)大流行席卷全球，成为社交媒体平台上最引人注目的话题之一。围绕新冠肺炎的各种想法、意见、情绪，在社交媒体(sns)上表达自己目前的感受的情绪爆发。分析情绪有助于产生更好的结果。从不同的博客网站收集数据，如Facebook、Twitter、微博、YouTube、Instagram等，Twitter是最大的存储库。视频、文本和音频也从存储库中收集。情感分析使用意见挖掘来获取用户的情感，并相应地将其分类为积极、消极和中立。对在不同时间框架内收集的3586条推文进行分析和机器学习分类。本文对Covid-19大流行期间积累的推文进行了情绪分析。使用基于web的应用程序Hydrator从Twitter数据库收集Tweets。数据预处理消除了原始数据中的所有噪声和异常值。利用自然语言工具包(NLTK)，对文本分类进行情感分析，并计算得分主观极性、计数和情感分布。N-gram用于文本挖掘和自然语言处理，用于文本或文档中的连续单词序列，应用单gram、双gram和三gram进行统计计算。术语频率和逆文档频率(TF-IDF)是一种将文本数据转换为数字形式的特征提取技术。向量化数据馈送到我们的模型中，以从语言数据中获得洞察力。线性SVC、多项式nb、GBM和随机森林分类器与Tfidf分类模型应用于我们提出的模型。线性支持向量分类比其他两种分类器性能更好。结果表明，射频性能较好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Analysis of sentiments on the onset of Covid-19 using Machine Learning Techniques

The novel coronavirus (Covid-19) pandemic has struck the whole world and is one of the most striking topics on social media platforms. Sentiment outbreak on social media enduring various thoughts, opinions, and emotions about the Covid-19 disease, expressing views they are feeling presently. Analyzing sentiments helps to yield better results. Gathering data from different blogging sites like Facebook, Twitter, Weibo, YouTube, Instagram, etc., and Twitter is the largest repository. Videos, text, and audio were also collected from repositories. Sentiment analysis uses opinion mining to acquire the sentiments of its users and categorizes them accordingly as positive, negative, and neutral. Analytical and machine learning classification is implemented to 3586 tweets collected in different time frames. In this paper, sentiment analysis was performed on tweets accumulated during the Covid-19 pandemic, Coronavirus disease. Tweets are collected from the Twitter database using Hydrator a web-based application. Data-preprocessing removes all the noise, outliers from the raw data. With Natural Language Toolkit (NLTK), text classification for sentiment analysis and calculate the score subjective polarity, counts, and sentiment distribution. N-gram is used in textual mining -and Natural Language Processing for a continuous sequence of words in a text or document applying uni-gram, bi-gram, and tri-gram for statistical computation. Term frequency and Inverse document frequency (TF-IDF) is a feature extraction technique that converts textual data into numeric form. Vectorize data feed to our model to obtain insights from linguistic data. Linear SVC, MultinomialNB, GBM, and Random Forest classifier with Tfidf classification model applied to our proposed model. Linear Support Vector classification performs better than the other two classifiers. Results depict that RF performs better.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊