基于监督项加权的情感分析多类分类中不平衡数据的改进

J. Polpinij, K. Namee
{"title":"基于监督项加权的情感分析多类分类中不平衡数据的改进","authors":"J. Polpinij, K. Namee","doi":"10.1109/RI2C51727.2021.9559797","DOIUrl":null,"url":null,"abstract":"Sentiment classification (SC) is an ongoing field of research, which involves computing opinions, sentiments, and the subjectivity of a text. It has recently been proven that imbalanced classification is challenging for the SC research community. Most existing studies assume that the balance between negative and positive samples may not be true in reality. This work describes a method to improve the problem of imbalanced sentiment classification using supervised term weighting schemes and shows how these weighting schemes can improve the performance of sentiment classification with imbalanced data, especially in the domain of multi-class classification. Nonetheless, to obtain the most appropriate term weighting schemes, five term weighting schemes are comparatively studied, namely tf-idf, tf-idf-icf, tf-rf, tf-igm, and sqrt_tf-igm. In addition to comparing several term weightings schemes, this work also compares four supervised machine learning algorithms to obtain an appropriate algorithm, including k-Nearest Neighbor (k-NN), Multinomial Naïve Bayes (MNB), Support Vector Machines (SVM) with linear, and SVM with RBF. After evaluating by F1, the performance of sqrt_tf-igm was superior to all other weighting schemes. Since the overall picture of sqrt_tf-igm returned better results than the tf-idf, tf-idf-icf, and tf-rf methods, with improved scores of F1 at 10.94%. Meanwhile, the result of sqrt_tf-igm was slightly better than tf-igm.","PeriodicalId":422981,"journal":{"name":"2021 Research, Invention, and Innovation Congress: Innovation Electricals and Electronics (RI2C)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving of Imbalanced Data in Multiclass Classification for Sentiment Analysis using Supervised Term Weighting\",\"authors\":\"J. Polpinij, K. Namee\",\"doi\":\"10.1109/RI2C51727.2021.9559797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment classification (SC) is an ongoing field of research, which involves computing opinions, sentiments, and the subjectivity of a text. It has recently been proven that imbalanced classification is challenging for the SC research community. Most existing studies assume that the balance between negative and positive samples may not be true in reality. This work describes a method to improve the problem of imbalanced sentiment classification using supervised term weighting schemes and shows how these weighting schemes can improve the performance of sentiment classification with imbalanced data, especially in the domain of multi-class classification. Nonetheless, to obtain the most appropriate term weighting schemes, five term weighting schemes are comparatively studied, namely tf-idf, tf-idf-icf, tf-rf, tf-igm, and sqrt_tf-igm. In addition to comparing several term weightings schemes, this work also compares four supervised machine learning algorithms to obtain an appropriate algorithm, including k-Nearest Neighbor (k-NN), Multinomial Naïve Bayes (MNB), Support Vector Machines (SVM) with linear, and SVM with RBF. After evaluating by F1, the performance of sqrt_tf-igm was superior to all other weighting schemes. Since the overall picture of sqrt_tf-igm returned better results than the tf-idf, tf-idf-icf, and tf-rf methods, with improved scores of F1 at 10.94%. Meanwhile, the result of sqrt_tf-igm was slightly better than tf-igm.\",\"PeriodicalId\":422981,\"journal\":{\"name\":\"2021 Research, Invention, and Innovation Congress: Innovation Electricals and Electronics (RI2C)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Research, Invention, and Innovation Congress: Innovation Electricals and Electronics (RI2C)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RI2C51727.2021.9559797\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Research, Invention, and Innovation Congress: Innovation Electricals and Electronics (RI2C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RI2C51727.2021.9559797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

情感分类(SC)是一个正在进行的研究领域,涉及计算观点、情感和文本的主观性。最近已经证明,不平衡分类是SC研究界面临的挑战。大多数现有的研究假设负和正样本之间的平衡在现实中可能并不正确。本文描述了一种使用监督项加权方案来改善不平衡情感分类问题的方法,并展示了这些加权方案如何提高不平衡数据的情感分类性能,特别是在多类分类领域。然而,为了获得最合适的术语加权方案,我们比较研究了5种术语加权方案,分别是tf-idf、tf-idf-icf、tf-rf、tf-igm和sqrt_tf-igm。除了比较几种术语加权方案外,本工作还比较了四种监督机器学习算法,以获得合适的算法,包括k-最近邻(k-NN)、多项式Naïve贝叶斯(MNB)、线性支持向量机(SVM)和RBF支持向量机(SVM)。经F1评价,sqrt_tf-igm的性能优于其他所有加权方案。由于sqrt_tf-igm的整体图像比tf-idf、tf-idf-icf和tf-rf方法返回的结果更好,F1分数提高了10.94%。同时,sqrt_tf-igm的结果略好于tf-igm。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Improving of Imbalanced Data in Multiclass Classification for Sentiment Analysis using Supervised Term Weighting
Sentiment classification (SC) is an ongoing field of research, which involves computing opinions, sentiments, and the subjectivity of a text. It has recently been proven that imbalanced classification is challenging for the SC research community. Most existing studies assume that the balance between negative and positive samples may not be true in reality. This work describes a method to improve the problem of imbalanced sentiment classification using supervised term weighting schemes and shows how these weighting schemes can improve the performance of sentiment classification with imbalanced data, especially in the domain of multi-class classification. Nonetheless, to obtain the most appropriate term weighting schemes, five term weighting schemes are comparatively studied, namely tf-idf, tf-idf-icf, tf-rf, tf-igm, and sqrt_tf-igm. In addition to comparing several term weightings schemes, this work also compares four supervised machine learning algorithms to obtain an appropriate algorithm, including k-Nearest Neighbor (k-NN), Multinomial Naïve Bayes (MNB), Support Vector Machines (SVM) with linear, and SVM with RBF. After evaluating by F1, the performance of sqrt_tf-igm was superior to all other weighting schemes. Since the overall picture of sqrt_tf-igm returned better results than the tf-idf, tf-idf-icf, and tf-rf methods, with improved scores of F1 at 10.94%. Meanwhile, the result of sqrt_tf-igm was slightly better than tf-igm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Mobile Warehouse Management and Transportation Planning System for Wheat Flour Design Guidelines of Passive Balancing Circuit for Li-Ion Battery for Bleeding Current Adjustment Using PWM Technique A Genetic Algorithm for Split Delivery Open Vehicle Routing Problem with Physical Workload Consideration Transfer of a scientific concept in the field of renewable energy with a creative group work Circularly Polarized Elliptical Patch Array Antennas for GPS
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1