Cyberbullying detection of resource constrained language from social media using transformer-based approach

Syed Sihab-Us-Sakib , Md. Rashadur Rahman , Md. Shafiul Alam Forhad , Md. Atiq Aziz
{"title":"Cyberbullying detection of resource constrained language from social media using transformer-based approach","authors":"Syed Sihab-Us-Sakib ,&nbsp;Md. Rashadur Rahman ,&nbsp;Md. Shafiul Alam Forhad ,&nbsp;Md. Atiq Aziz","doi":"10.1016/j.nlp.2024.100104","DOIUrl":null,"url":null,"abstract":"<div><div>The rise of the internet and social media has facilitated diverse interactions among individuals, but it has also led to an increase in cyberbullying—a phenomenon with detrimental effects on mental health, including the potential to induce suicidal thoughts. To combat this issue, we have developed the Cyberbullying Bengali Dataset (CBD), a novel resource containing 2751 manually labeled texts categorized into five classes, including various forms of cyberbullying and non-bullying instances. In our study on cyberbullying detection, we conducted an extensive evaluation of various machine learning and deep learning models. Specifically, we examined Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), and Random Forest (RF) among the traditional machine learning models. For deep learning models, we explored Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM). We have also experimented with state-of-the-art transformer architectures, including m-BERT, BanglaBERT, and XLM-RoBERTa. After rigorous experimentation, XLM-RoBERTa emerged as the most effective model, achieving a significant F1-score of 0.83 and an accuracy of 82.61%, outperforming all other models. Our work provides insights into effective cyberbullying detection on platforms like Facebook, YouTube, and Instagram.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"9 ","pages":"Article 100104"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The rise of the internet and social media has facilitated diverse interactions among individuals, but it has also led to an increase in cyberbullying—a phenomenon with detrimental effects on mental health, including the potential to induce suicidal thoughts. To combat this issue, we have developed the Cyberbullying Bengali Dataset (CBD), a novel resource containing 2751 manually labeled texts categorized into five classes, including various forms of cyberbullying and non-bullying instances. In our study on cyberbullying detection, we conducted an extensive evaluation of various machine learning and deep learning models. Specifically, we examined Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), and Random Forest (RF) among the traditional machine learning models. For deep learning models, we explored Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM). We have also experimented with state-of-the-art transformer architectures, including m-BERT, BanglaBERT, and XLM-RoBERTa. After rigorous experimentation, XLM-RoBERTa emerged as the most effective model, achieving a significant F1-score of 0.83 and an accuracy of 82.61%, outperforming all other models. Our work provides insights into effective cyberbullying detection on platforms like Facebook, YouTube, and Instagram.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用基于变换器的方法从社交媒体中检测资源受限语言的网络欺凌行为
互联网和社交媒体的兴起促进了人与人之间的多样化互动,但也导致了网络欺凌现象的增加--这种现象会对心理健康产生有害影响,包括可能诱发自杀念头。为了解决这一问题,我们开发了网络欺凌孟加拉语数据集(CBD),这是一种新颖的资源,包含 2751 个人工标注的文本,分为五类,包括各种形式的网络欺凌和非欺凌实例。在网络欺凌检测研究中,我们对各种机器学习和深度学习模型进行了广泛评估。具体来说,我们研究了传统机器学习模型中的支持向量机(SVM)、多项式奈何贝叶斯(MNB)和随机森林(RF)。在深度学习模型方面,我们探索了门控循环单元(GRU)、卷积神经网络(CNN)、长短期记忆(LSTM)和双向 LSTM(BiLSTM)。我们还试验了最先进的变压器架构,包括 m-BERT、BanglaBERT 和 XLM-RoBERTa。经过严格的实验,XLM-RoBERTa 成为最有效的模型,其 F1 分数高达 0.83,准确率高达 82.61%,优于其他所有模型。我们的工作为在 Facebook、YouTube 和 Instagram 等平台上有效检测网络欺凌提供了见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
CapsF: Capsule Fusion for Extracting psychiatric stressors for suicide from Twitter Token and part-of-speech fusion for pretraining of transformers with application in automatic cyberbullying detection A comparative analysis of encoder only and decoder only models for challenging LLM-generated STEM MCQs using a self-evaluation approach Machine learning vs. rule-based methods for document classification of electronic health records within mental health care—A systematic literature review A survey on chatbots and large language models: Testing and evaluation techniques
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1