基于文本过采样的特定领域变压器模型用于COVID-19疫苗社交媒体帖子的情感分析

IF 0.3 Q4 COMPUTER SCIENCE, THEORY & METHODS Computer Science-AGH Pub Date : 2023-03-10 DOI:10.7494/csci.2023.24.2.4761
Anmol Bansal, Arjun Choudhry, Anubhav Sharma, Seba Susan
{"title":"基于文本过采样的特定领域变压器模型用于COVID-19疫苗社交媒体帖子的情感分析","authors":"Anmol Bansal, Arjun Choudhry, Anubhav Sharma, Seba Susan","doi":"10.7494/csci.2023.24.2.4761","DOIUrl":null,"url":null,"abstract":"Covid-19 has spread across the world and many different vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, this paper aims to fine-tune pre-trained transformer models on tweets associated with different Covid vaccines, specifically RoBERTa, XLNet and BERT which are recently introduced state-of-the-art bi-directional transformer models, and domain-specific transformer models BERTweet and CT-BERT that are pre-trained on Covid-19 tweets. We further explore the option of data augmentation by text oversampling using LMOTE to improve the accuracies of these models, specifically, for small sample datasets where there is an imbalanced class distribution among the positive, negative and neutral sentiment classes. Our results summarize our findings on the suitability of text oversampling for imbalanced, small sample datasets that are used to fine-tune state-of-the-art pre-trained transformer models, and the utility of having domain-specific transformer models for the classification task.","PeriodicalId":41917,"journal":{"name":"Computer Science-AGH","volume":"40 1","pages":"0"},"PeriodicalIF":0.3000,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"ADAPTATION OF DOMAIN-SPECIFIC TRANSFORMER MODELS WITH TEXT OVERSAMPLING FOR SENTIMENT ANALYSIS OF SOCIAL MEDIA POSTS ON COVID-19 VACCINE\",\"authors\":\"Anmol Bansal, Arjun Choudhry, Anubhav Sharma, Seba Susan\",\"doi\":\"10.7494/csci.2023.24.2.4761\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Covid-19 has spread across the world and many different vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, this paper aims to fine-tune pre-trained transformer models on tweets associated with different Covid vaccines, specifically RoBERTa, XLNet and BERT which are recently introduced state-of-the-art bi-directional transformer models, and domain-specific transformer models BERTweet and CT-BERT that are pre-trained on Covid-19 tweets. We further explore the option of data augmentation by text oversampling using LMOTE to improve the accuracies of these models, specifically, for small sample datasets where there is an imbalanced class distribution among the positive, negative and neutral sentiment classes. Our results summarize our findings on the suitability of text oversampling for imbalanced, small sample datasets that are used to fine-tune state-of-the-art pre-trained transformer models, and the utility of having domain-specific transformer models for the classification task.\",\"PeriodicalId\":41917,\"journal\":{\"name\":\"Computer Science-AGH\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2023-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Science-AGH\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7494/csci.2023.24.2.4761\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science-AGH","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7494/csci.2023.24.2.4761","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 2

摘要

Covid-19已经蔓延到世界各地,人们已经开发了许多不同的疫苗来应对其激增。为了从社交媒体帖子中识别与疫苗相关的正确情绪,本文旨在对与不同Covid疫苗相关的推文上的预训练变压器模型进行微调,特别是最近推出的最先进的双向变压器模型RoBERTa、XLNet和BERT,以及针对Covid-19推文进行预训练的特定领域变压器模型BERTweet和CT-BERT。我们进一步探索了使用LMOTE通过文本过采样来增强数据的选项,以提高这些模型的准确性,特别是对于小样本数据集,其中在积极,消极和中性情绪类别之间存在不平衡的类分布。我们的结果总结了我们关于文本过采样对不平衡、小样本数据集的适用性的发现,这些数据集用于微调最先进的预训练变压器模型,以及为分类任务使用特定领域的变压器模型的效用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ADAPTATION OF DOMAIN-SPECIFIC TRANSFORMER MODELS WITH TEXT OVERSAMPLING FOR SENTIMENT ANALYSIS OF SOCIAL MEDIA POSTS ON COVID-19 VACCINE
Covid-19 has spread across the world and many different vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, this paper aims to fine-tune pre-trained transformer models on tweets associated with different Covid vaccines, specifically RoBERTa, XLNet and BERT which are recently introduced state-of-the-art bi-directional transformer models, and domain-specific transformer models BERTweet and CT-BERT that are pre-trained on Covid-19 tweets. We further explore the option of data augmentation by text oversampling using LMOTE to improve the accuracies of these models, specifically, for small sample datasets where there is an imbalanced class distribution among the positive, negative and neutral sentiment classes. Our results summarize our findings on the suitability of text oversampling for imbalanced, small sample datasets that are used to fine-tune state-of-the-art pre-trained transformer models, and the utility of having domain-specific transformer models for the classification task.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer Science-AGH
Computer Science-AGH COMPUTER SCIENCE, THEORY & METHODS-
CiteScore
1.40
自引率
0.00%
发文量
18
审稿时长
20 weeks
期刊最新文献
A Nature Inspired Hybrid Partitional Clustering Method Based on Grey Wolf Optimization and JAYA Algorithm Database Replication for Disconnected Operations with Quasi Real-Time Synchronization Hybrid Variable Neighborhood Search for Solving School Bus-Driver Problem with Resource Constraints A Survey on Multi-Objective Based Parameter Optimization for Deep Learning Melanoma Skin Cancer and Nevus Mole Classification using Intensity Value Estimation with Convolutional Neural Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1