The Improvement of Stress Level Detection in Twitter: Imbalance Classification Using SMOTE

Mohd Shahrul Nizam Mohd Danuri, Rohizah Abd Rahman, I. Mohamed, Azzan Amin
{"title":"The Improvement of Stress Level Detection in Twitter: Imbalance Classification Using SMOTE","authors":"Mohd Shahrul Nizam Mohd Danuri, Rohizah Abd Rahman, I. Mohamed, Azzan Amin","doi":"10.1109/ICOCO56118.2022.10031684","DOIUrl":null,"url":null,"abstract":"This study developed a model to improve stress level detection using Synthetic Minority Oversampling Technique (SMOTE) imbalanced data classification. SMOTE is a method to address imbalanced datasets to oversample the minority class. The data collected from Twitter may seem vague mainly due to the massive amount of data. This research used the framework model of Data, Experts Data Annotation, Text Pre-processing, and Text Representation and Classification. The Bag of Word (BoW), Term Frequency-Inverse Document Frequency (TFIDF), and Lemma were used for the text representation. The data were collected only from Twitter under certain circumstances. The Subject Matter Experts (SMEs) on mental health problems have annotated the text from the tweets based on four levels: Normal, Mild, Moderate, and Severe. The data group for the Normal stress level was relatively large compared to the other groups. Due to the imbalanced data group, the SMOTE technique was used for data argumentation. The result showed that the model classification using Support Vector Machine with SMOTE increased by improving the cardinality of the minority class label through the significant Macro Avg Recall and Macro Avg F1-Score analysis results compared to the baseline.","PeriodicalId":319652,"journal":{"name":"2022 IEEE International Conference on Computing (ICOCO)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Computing (ICOCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOCO56118.2022.10031684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

This study developed a model to improve stress level detection using Synthetic Minority Oversampling Technique (SMOTE) imbalanced data classification. SMOTE is a method to address imbalanced datasets to oversample the minority class. The data collected from Twitter may seem vague mainly due to the massive amount of data. This research used the framework model of Data, Experts Data Annotation, Text Pre-processing, and Text Representation and Classification. The Bag of Word (BoW), Term Frequency-Inverse Document Frequency (TFIDF), and Lemma were used for the text representation. The data were collected only from Twitter under certain circumstances. The Subject Matter Experts (SMEs) on mental health problems have annotated the text from the tweets based on four levels: Normal, Mild, Moderate, and Severe. The data group for the Normal stress level was relatively large compared to the other groups. Due to the imbalanced data group, the SMOTE technique was used for data argumentation. The result showed that the model classification using Support Vector Machine with SMOTE increased by improving the cardinality of the minority class label through the significant Macro Avg Recall and Macro Avg F1-Score analysis results compared to the baseline.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Twitter中应力水平检测的改进:基于SMOTE的不平衡分类
本文提出了一种基于合成少数派过采样技术(SMOTE)的不平衡数据分类改进应力水平检测的模型。SMOTE是一种解决不平衡数据集的方法,用于对少数类进行过采样。从Twitter上收集的数据可能看起来很模糊,主要是因为数据量很大。本研究采用数据、专家数据标注、文本预处理和文本表示与分类的框架模型。使用词包(BoW)、词频-逆文档频率(TFIDF)和引理进行文本表示。这些数据仅在特定情况下从Twitter收集。心理健康问题主题专家(sme)根据正常、轻度、中度和严重四个级别对推文中的文本进行了注释。与其他组相比,正常压力水平的数据组相对较大。由于数据组不平衡,采用SMOTE技术进行数据论证。结果表明,与基线相比,通过显著的Macro Avg Recall和Macro Avg F1-Score分析结果,使用SMOTE的支持向量机(Support Vector Machine)提高了少数类标签的基数,从而提高了模型分类的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Preliminary Study on the Effect of Traffic Representation on Accuracy Degradation in Machine Learning-based IoT Device Identification Residual Value Prediction A Framework for Supporting Deaf and Mute Learning Experience Through Extended Reality A Comparative Study of Monolithic and Microservices Architectures in Machine Learning Scenarios Salient feature extraction using Attention for Brain Tumor segmentation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1