基于文本过采样的特定领域变压器模型用于COVID-19疫苗社交媒体帖子的情感分析

IF 0.3 Q4 COMPUTER SCIENCE, THEORY & METHODS Computer Science-AGH Pub Date : 2023-03-10 DOI:10.7494/csci.2023.24.2.4761

Anmol Bansal, Arjun Choudhry, Anubhav Sharma, Seba Susan

{"title":"基于文本过采样的特定领域变压器模型用于COVID-19疫苗社交媒体帖子的情感分析","authors":"Anmol Bansal, Arjun Choudhry, Anubhav Sharma, Seba Susan","doi":"10.7494/csci.2023.24.2.4761","DOIUrl":null,"url":null,"abstract":"Covid-19 has spread across the world and many different vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, this paper aims to fine-tune pre-trained transformer models on tweets associated with different Covid vaccines, specifically RoBERTa, XLNet and BERT which are recently introduced state-of-the-art bi-directional transformer models, and domain-specific transformer models BERTweet and CT-BERT that are pre-trained on Covid-19 tweets. We further explore the option of data augmentation by text oversampling using LMOTE to improve the accuracies of these models, specifically, for small sample datasets where there is an imbalanced class distribution among the positive, negative and neutral sentiment classes. Our results summarize our findings on the suitability of text oversampling for imbalanced, small sample datasets that are used to fine-tune state-of-the-art pre-trained transformer models, and the utility of having domain-specific transformer models for the classification task.","PeriodicalId":41917,"journal":{"name":"Computer Science-AGH","volume":"40 1","pages":"0"},"PeriodicalIF":0.3000,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"ADAPTATION OF DOMAIN-SPECIFIC TRANSFORMER MODELS WITH TEXT OVERSAMPLING FOR SENTIMENT ANALYSIS OF SOCIAL MEDIA POSTS ON COVID-19 VACCINE\",\"authors\":\"Anmol Bansal, Arjun Choudhry, Anubhav Sharma, Seba Susan\",\"doi\":\"10.7494/csci.2023.24.2.4761\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Covid-19 has spread across the world and many different vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, this paper aims to fine-tune pre-trained transformer models on tweets associated with different Covid vaccines, specifically RoBERTa, XLNet and BERT which are recently introduced state-of-the-art bi-directional transformer models, and domain-specific transformer models BERTweet and CT-BERT that are pre-trained on Covid-19 tweets. We further explore the option of data augmentation by text oversampling using LMOTE to improve the accuracies of these models, specifically, for small sample datasets where there is an imbalanced class distribution among the positive, negative and neutral sentiment classes. Our results summarize our findings on the suitability of text oversampling for imbalanced, small sample datasets that are used to fine-tune state-of-the-art pre-trained transformer models, and the utility of having domain-specific transformer models for the classification task.\",\"PeriodicalId\":41917,\"journal\":{\"name\":\"Computer Science-AGH\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2023-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Science-AGH\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7494/csci.2023.24.2.4761\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science-AGH","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7494/csci.2023.24.2.4761","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 2

摘要

Covid-19已经蔓延到世界各地，人们已经开发了许多不同的疫苗来应对其激增。为了从社交媒体帖子中识别与疫苗相关的正确情绪，本文旨在对与不同Covid疫苗相关的推文上的预训练变压器模型进行微调，特别是最近推出的最先进的双向变压器模型RoBERTa、XLNet和BERT，以及针对Covid-19推文进行预训练的特定领域变压器模型BERTweet和CT-BERT。我们进一步探索了使用LMOTE通过文本过采样来增强数据的选项，以提高这些模型的准确性，特别是对于小样本数据集，其中在积极，消极和中性情绪类别之间存在不平衡的类分布。我们的结果总结了我们关于文本过采样对不平衡、小样本数据集的适用性的发现，这些数据集用于微调最先进的预训练变压器模型，以及为分类任务使用特定领域的变压器模型的效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ADAPTATION OF DOMAIN-SPECIFIC TRANSFORMER MODELS WITH TEXT OVERSAMPLING FOR SENTIMENT ANALYSIS OF SOCIAL MEDIA POSTS ON COVID-19 VACCINE

Covid-19 has spread across the world and many different vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, this paper aims to fine-tune pre-trained transformer models on tweets associated with different Covid vaccines, specifically RoBERTa, XLNet and BERT which are recently introduced state-of-the-art bi-directional transformer models, and domain-specific transformer models BERTweet and CT-BERT that are pre-trained on Covid-19 tweets. We further explore the option of data augmentation by text oversampling using LMOTE to improve the accuracies of these models, specifically, for small sample datasets where there is an imbalanced class distribution among the positive, negative and neutral sentiment classes. Our results summarize our findings on the suitability of text oversampling for imbalanced, small sample datasets that are used to fine-tune state-of-the-art pre-trained transformer models, and the utility of having domain-specific transformer models for the classification task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊