基于Word2Vec和单对全支持向量机的网络欺凌情绪分析

Lionel Reinhart Halim, A. Suryadibrata
{"title":"基于Word2Vec和单对全支持向量机的网络欺凌情绪分析","authors":"Lionel Reinhart Halim, A. Suryadibrata","doi":"10.31937/ijnmt.v8i1.2047","DOIUrl":null,"url":null,"abstract":"Depression and social anxiety are the two main negative impacts of cyberbullying. Unfortunately, a survey conducted by UNICEF on 3rd September 2019 showed that 1 in 3 young people in 30 countries had been victims of cyberbullying. Sentiment analysis research will be conducted to detect a comment that contains cyberbullying. Dataset of cyberbullying is obtained from the Kaggle website, named, Toxic Comment Classification Challenge. The pre-processing process consists of 4 stages, namely comment generalization (convert text into lowercase and remove punctuation), tokenization, stop words removal, and lemmatization. Word Embedding will be used to conduct sentiment analysis by implementing Word2Vec. After that, One-Against-All (OAA) method with the Support Vector Machine (SVM) model will be used to make predictions in the form of multi labelling. The SVM model will go through a hyperparameter tuning process using Randomized Search CV. Then, evaluation will be carried out using Micro Averaged F1 Score to assess the prediction accuracy and Hamming Loss to assess the numbers of pairs of sample and label that are incorrectly classified. Implementation result of Word2Vec and OAA SVM model provide the best result for the data undergoing the process of pre-processing using comment generalization, tokenization, stop words removal, and lemmatization which is stored into 100 features in Word2Vec model. Micro Averaged F1 and Hamming Loss percentage that is produced by the tuned model is 83.40% and 15.13% respectively. \n  \nIndex Terms— Sentiment Analysis; Word Embedding; Word2Vec; One-Against-All; Support Vector Machine; Toxic Comment Classification Challenge; Multi Labelling","PeriodicalId":110831,"journal":{"name":"IJNMT (International Journal of New Media Technology)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Cyberbullying Sentiment Analysis with Word2Vec and One-Against-All Support Vector Machine\",\"authors\":\"Lionel Reinhart Halim, A. Suryadibrata\",\"doi\":\"10.31937/ijnmt.v8i1.2047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Depression and social anxiety are the two main negative impacts of cyberbullying. Unfortunately, a survey conducted by UNICEF on 3rd September 2019 showed that 1 in 3 young people in 30 countries had been victims of cyberbullying. Sentiment analysis research will be conducted to detect a comment that contains cyberbullying. Dataset of cyberbullying is obtained from the Kaggle website, named, Toxic Comment Classification Challenge. The pre-processing process consists of 4 stages, namely comment generalization (convert text into lowercase and remove punctuation), tokenization, stop words removal, and lemmatization. Word Embedding will be used to conduct sentiment analysis by implementing Word2Vec. After that, One-Against-All (OAA) method with the Support Vector Machine (SVM) model will be used to make predictions in the form of multi labelling. The SVM model will go through a hyperparameter tuning process using Randomized Search CV. Then, evaluation will be carried out using Micro Averaged F1 Score to assess the prediction accuracy and Hamming Loss to assess the numbers of pairs of sample and label that are incorrectly classified. Implementation result of Word2Vec and OAA SVM model provide the best result for the data undergoing the process of pre-processing using comment generalization, tokenization, stop words removal, and lemmatization which is stored into 100 features in Word2Vec model. Micro Averaged F1 and Hamming Loss percentage that is produced by the tuned model is 83.40% and 15.13% respectively. \\n  \\nIndex Terms— Sentiment Analysis; Word Embedding; Word2Vec; One-Against-All; Support Vector Machine; Toxic Comment Classification Challenge; Multi Labelling\",\"PeriodicalId\":110831,\"journal\":{\"name\":\"IJNMT (International Journal of New Media Technology)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IJNMT (International Journal of New Media Technology)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.31937/ijnmt.v8i1.2047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJNMT (International Journal of New Media Technology)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31937/ijnmt.v8i1.2047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

抑郁和社交焦虑是网络欺凌的两个主要负面影响。不幸的是,联合国儿童基金会于2019年9月3日进行的一项调查显示,30个国家中有三分之一的年轻人是网络欺凌的受害者。将进行情感分析研究,以发现含有网络欺凌的评论。网络欺凌的数据集来自Kaggle网站,命名为“有毒评论分类挑战”。预处理过程包括4个阶段,即注释泛化(将文本转换为小写并去除标点符号)、标记化、停止词去除和词序化。Word Embedding将通过实现Word2Vec来进行情感分析。然后,使用支持向量机(SVM)模型的One-Against-All (OAA)方法以多标签的形式进行预测。SVM模型将通过随机搜索CV的超参数调整过程。然后,用Micro average F1 Score评估预测的准确性,用Hamming Loss评估样本和标签错误分类的对数。Word2Vec和OAA支持向量机模型的实现结果为经过评论泛化、标记化、停用词去除和词序化预处理的数据提供了最好的结果,这些数据存储在Word2Vec模型的100个特征中。调整后模型产生的微平均F1和Hamming Loss百分比分别为83.40%和15.13%。指数术语-情绪分析;字嵌入;Word2Vec;One-Against-All;支持向量机;有毒评论分类挑战;多标签
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Cyberbullying Sentiment Analysis with Word2Vec and One-Against-All Support Vector Machine
Depression and social anxiety are the two main negative impacts of cyberbullying. Unfortunately, a survey conducted by UNICEF on 3rd September 2019 showed that 1 in 3 young people in 30 countries had been victims of cyberbullying. Sentiment analysis research will be conducted to detect a comment that contains cyberbullying. Dataset of cyberbullying is obtained from the Kaggle website, named, Toxic Comment Classification Challenge. The pre-processing process consists of 4 stages, namely comment generalization (convert text into lowercase and remove punctuation), tokenization, stop words removal, and lemmatization. Word Embedding will be used to conduct sentiment analysis by implementing Word2Vec. After that, One-Against-All (OAA) method with the Support Vector Machine (SVM) model will be used to make predictions in the form of multi labelling. The SVM model will go through a hyperparameter tuning process using Randomized Search CV. Then, evaluation will be carried out using Micro Averaged F1 Score to assess the prediction accuracy and Hamming Loss to assess the numbers of pairs of sample and label that are incorrectly classified. Implementation result of Word2Vec and OAA SVM model provide the best result for the data undergoing the process of pre-processing using comment generalization, tokenization, stop words removal, and lemmatization which is stored into 100 features in Word2Vec model. Micro Averaged F1 and Hamming Loss percentage that is produced by the tuned model is 83.40% and 15.13% respectively.   Index Terms— Sentiment Analysis; Word Embedding; Word2Vec; One-Against-All; Support Vector Machine; Toxic Comment Classification Challenge; Multi Labelling
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Bibliographic Computer Science Indexing Review with Disease Covid 19 Analysis Sentiment Cyberbullying In Instagram Comments with XGBoost Method Optimization of Process Variables in 3D Printing on Dimensional Accuracy Using Nylon Filaments The Design of Microcontroller Based Early Warning Fire Detection System for Home Monitoring Implementation of OCR and Face Recognition on Mobile Based Voting System Application in Indonesia
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1