使用IndoBERTweet和BiLSTM在Twitter上检测印度尼西亚仇恨言论

Juanietto Forry Kusuma, Andry Chowanda
{"title":"使用IndoBERTweet和BiLSTM在Twitter上检测印度尼西亚仇恨言论","authors":"Juanietto Forry Kusuma, Andry Chowanda","doi":"10.30630/joiv.7.3.1035","DOIUrl":null,"url":null,"abstract":"Hate speech is an act of speech to spread hate to other people. In this digital era where everyone connects with social media, hate speech is growing rapidly and uncontrollably. Many people do not realize they are giving hate speech when critics something on social media due to a lack of awareness of the difference between hate speech and free speech. The results make victims feel alienated from society, and the people who spread it would often face the law. Detection in the sentences to identify whether it contains hate speech is essential to counter people's ignorance. For detecting such sentences, a machine learning algorithm is widely used to help identify each sentence. In this paper, we used a subset from machine learning named deep learning with the latest IndoBERT model named IndoBERTweet and combined it with RNN layer named BiLSTM. The appearance of IndoBERTweet opened more chances to further improve text classification performance with the addition of BiLSTM layer. The model first made a token representative from the sentence, then calculated it to analyze and made the classification based on the calculation. For this model to be effective, we trained our model with the labeled public dataset retrieved from Twitter. These datasets are classified into hate speech and non-hate speech, and these labels are applied to the models. We evaluated our model and achieved an accuracy of 93.7%, an improvement for classifying hate speech sentences from previous research.","PeriodicalId":32468,"journal":{"name":"JOIV International Journal on Informatics Visualization","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Indonesian Hate Speech Detection Using IndoBERTweet and BiLSTM on Twitter\",\"authors\":\"Juanietto Forry Kusuma, Andry Chowanda\",\"doi\":\"10.30630/joiv.7.3.1035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hate speech is an act of speech to spread hate to other people. In this digital era where everyone connects with social media, hate speech is growing rapidly and uncontrollably. Many people do not realize they are giving hate speech when critics something on social media due to a lack of awareness of the difference between hate speech and free speech. The results make victims feel alienated from society, and the people who spread it would often face the law. Detection in the sentences to identify whether it contains hate speech is essential to counter people's ignorance. For detecting such sentences, a machine learning algorithm is widely used to help identify each sentence. In this paper, we used a subset from machine learning named deep learning with the latest IndoBERT model named IndoBERTweet and combined it with RNN layer named BiLSTM. The appearance of IndoBERTweet opened more chances to further improve text classification performance with the addition of BiLSTM layer. The model first made a token representative from the sentence, then calculated it to analyze and made the classification based on the calculation. For this model to be effective, we trained our model with the labeled public dataset retrieved from Twitter. These datasets are classified into hate speech and non-hate speech, and these labels are applied to the models. We evaluated our model and achieved an accuracy of 93.7%, an improvement for classifying hate speech sentences from previous research.\",\"PeriodicalId\":32468,\"journal\":{\"name\":\"JOIV International Journal on Informatics Visualization\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JOIV International Journal on Informatics Visualization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30630/joiv.7.3.1035\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JOIV International Journal on Informatics Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30630/joiv.7.3.1035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Decision Sciences","Score":null,"Total":0}
引用次数: 0

摘要

仇恨言论是一种向他人传播仇恨的言论行为。在这个人人都与社交媒体联系的数字时代,仇恨言论正在迅速增长,无法控制。由于缺乏对仇恨言论和自由言论的区别的认识,许多人在社交媒体上批评某些东西时,并没有意识到他们正在发表仇恨言论。结果使受害者感到与社会疏远,传播它的人往往会面临法律制裁。检测句子中是否含有仇恨言论,对于反击人们的无知至关重要。为了检测这样的句子,机器学习算法被广泛用于帮助识别每个句子。在本文中,我们使用机器学习中的一个子集深度学习和最新的IndoBERT模型IndoBERTweet,并将其与RNN层BiLSTM相结合。IndoBERTweet的出现,通过加入BiLSTM层,为进一步提高文本分类性能提供了更多的机会。该模型首先从句子中得到一个token代表,然后对其进行计算分析,并在此基础上进行分类。为了使该模型有效,我们使用从Twitter检索的标记公共数据集训练我们的模型。这些数据集被分为仇恨言论和非仇恨言论,并将这些标签应用到模型中。我们对我们的模型进行了评估,准确率达到了93.7%,这是对以前研究中仇恨言论句子进行分类的一个改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Indonesian Hate Speech Detection Using IndoBERTweet and BiLSTM on Twitter
Hate speech is an act of speech to spread hate to other people. In this digital era where everyone connects with social media, hate speech is growing rapidly and uncontrollably. Many people do not realize they are giving hate speech when critics something on social media due to a lack of awareness of the difference between hate speech and free speech. The results make victims feel alienated from society, and the people who spread it would often face the law. Detection in the sentences to identify whether it contains hate speech is essential to counter people's ignorance. For detecting such sentences, a machine learning algorithm is widely used to help identify each sentence. In this paper, we used a subset from machine learning named deep learning with the latest IndoBERT model named IndoBERTweet and combined it with RNN layer named BiLSTM. The appearance of IndoBERTweet opened more chances to further improve text classification performance with the addition of BiLSTM layer. The model first made a token representative from the sentence, then calculated it to analyze and made the classification based on the calculation. For this model to be effective, we trained our model with the labeled public dataset retrieved from Twitter. These datasets are classified into hate speech and non-hate speech, and these labels are applied to the models. We evaluated our model and achieved an accuracy of 93.7%, an improvement for classifying hate speech sentences from previous research.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JOIV International Journal on Informatics Visualization
JOIV International Journal on Informatics Visualization Decision Sciences-Information Systems and Management
CiteScore
1.40
自引率
0.00%
发文量
100
审稿时长
16 weeks
期刊最新文献
Composition Model of Organic Waste Raw Materials Image-Based To Obtain Charcoal Briquette Energy Potential Visualization Mapping of the Socio-Technical Architecture based on Tongkonan Traditional House Skew Correction and Image Cleaning Handwriting Recognition Using a Convolutional Neural Network 433Mhz based Robot using PID (Proportional Integral Derivative) for Precise Facing Direction Closer Look at Image Classification for Indonesian Sign Language with Few-Shot Learning Using Matching Network Approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1