A BERT-based Hate Speech Classifier from Transcribed Online Short-Form Videos

2021 5th International Conference on E-Society, E-Education and E-Technology Pub Date : 2021-08-21 DOI:10.1145/3485768.3485806

Rommel Hernandez Urbano Jr., Jeffrey Uy Ajero, Angelic Legaspi Angeles, Maria Nikki Hacar Quintos, Joseph Marvin Regalado Imperial, Ramon Llabanes Rodriguez

{"title":"A BERT-based Hate Speech Classifier from Transcribed Online Short-Form Videos","authors":"Rommel Hernandez Urbano Jr., Jeffrey Uy Ajero, Angelic Legaspi Angeles, Maria Nikki Hacar Quintos, Joseph Marvin Regalado Imperial, Ramon Llabanes Rodriguez","doi":"10.1145/3485768.3485806","DOIUrl":null,"url":null,"abstract":"With the rise of human-centric technologies such as social media platforms, the amount of hate also continues to grow proportionally with the increasing number of users worldwide. TikTok is one of the most-used social media platforms due to its feature that allows users to express themselves via creating and sharing short-form videos based on any desired topic and content. In addition, it has also become a platform for political discourse and mudslinging as users can freely express an opinion and indirectly debate with random people online. In this study, we propose the use of BERT, a complex bidirectional transformer-based model, for the task of automatic hate speech detection from speech transcribed from Tagalog TikTok videos. Results of our experiments show that a BERT-based hate speech classifier scores 61% F1. We also extended the task beyond several algorithms such as LSTM, Naïve Bayes, and Decision Tree and found out that traditional methods such as a simple Bernoulli Naïve Bayes approach remain at par with the BERT model.","PeriodicalId":328771,"journal":{"name":"2021 5th International Conference on E-Society, E-Education and E-Technology","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 5th International Conference on E-Society, E-Education and E-Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3485768.3485806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

With the rise of human-centric technologies such as social media platforms, the amount of hate also continues to grow proportionally with the increasing number of users worldwide. TikTok is one of the most-used social media platforms due to its feature that allows users to express themselves via creating and sharing short-form videos based on any desired topic and content. In addition, it has also become a platform for political discourse and mudslinging as users can freely express an opinion and indirectly debate with random people online. In this study, we propose the use of BERT, a complex bidirectional transformer-based model, for the task of automatic hate speech detection from speech transcribed from Tagalog TikTok videos. Results of our experiments show that a BERT-based hate speech classifier scores 61% F1. We also extended the task beyond several algorithms such as LSTM, Naïve Bayes, and Decision Tree and found out that traditional methods such as a simple Bernoulli Naïve Bayes approach remain at par with the BERT model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于bert的在线短视频仇恨语音分类器

随着社交媒体平台等以人为中心的技术的兴起，仇恨的数量也随着全球用户数量的增加而继续成比例地增长。TikTok是最常用的社交媒体平台之一，因为它允许用户根据任何想要的主题和内容创建和分享短视频来表达自己。此外，它也成为政治话语和诽谤的平台，因为用户可以自由地表达意见，并在网上与随机的人间接辩论。在本研究中，我们提出使用BERT(一种复杂的基于双向变压器的模型)来完成从他加禄语TikTok视频转录的语音中自动检测仇恨语音的任务。实验结果表明，基于bert的仇恨语音分类器的F1得分为61%。我们还将任务扩展到LSTM, Naïve贝叶斯和决策树等几种算法之外，并发现传统方法(如简单的伯努利Naïve贝叶斯方法)仍然与BERT模型相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 5th International Conference on E-Society, E-Education and E-Technology

自引率

0.00%

发文量