使用IndoBERTweet和BiLSTM在Twitter上检测印度尼西亚仇恨言论

Q3 Decision Sciences JOIV International Journal on Informatics Visualization Pub Date : 2023-09-10 DOI:10.30630/joiv.7.3.1035

Juanietto Forry Kusuma, Andry Chowanda

{"title":"使用IndoBERTweet和BiLSTM在Twitter上检测印度尼西亚仇恨言论","authors":"Juanietto Forry Kusuma, Andry Chowanda","doi":"10.30630/joiv.7.3.1035","DOIUrl":null,"url":null,"abstract":"Hate speech is an act of speech to spread hate to other people. In this digital era where everyone connects with social media, hate speech is growing rapidly and uncontrollably. Many people do not realize they are giving hate speech when critics something on social media due to a lack of awareness of the difference between hate speech and free speech. The results make victims feel alienated from society, and the people who spread it would often face the law. Detection in the sentences to identify whether it contains hate speech is essential to counter people's ignorance. For detecting such sentences, a machine learning algorithm is widely used to help identify each sentence. In this paper, we used a subset from machine learning named deep learning with the latest IndoBERT model named IndoBERTweet and combined it with RNN layer named BiLSTM. The appearance of IndoBERTweet opened more chances to further improve text classification performance with the addition of BiLSTM layer. The model first made a token representative from the sentence, then calculated it to analyze and made the classification based on the calculation. For this model to be effective, we trained our model with the labeled public dataset retrieved from Twitter. These datasets are classified into hate speech and non-hate speech, and these labels are applied to the models. We evaluated our model and achieved an accuracy of 93.7%, an improvement for classifying hate speech sentences from previous research.","PeriodicalId":32468,"journal":{"name":"JOIV International Journal on Informatics Visualization","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Indonesian Hate Speech Detection Using IndoBERTweet and BiLSTM on Twitter\",\"authors\":\"Juanietto Forry Kusuma, Andry Chowanda\",\"doi\":\"10.30630/joiv.7.3.1035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hate speech is an act of speech to spread hate to other people. In this digital era where everyone connects with social media, hate speech is growing rapidly and uncontrollably. Many people do not realize they are giving hate speech when critics something on social media due to a lack of awareness of the difference between hate speech and free speech. The results make victims feel alienated from society, and the people who spread it would often face the law. Detection in the sentences to identify whether it contains hate speech is essential to counter people's ignorance. For detecting such sentences, a machine learning algorithm is widely used to help identify each sentence. In this paper, we used a subset from machine learning named deep learning with the latest IndoBERT model named IndoBERTweet and combined it with RNN layer named BiLSTM. The appearance of IndoBERTweet opened more chances to further improve text classification performance with the addition of BiLSTM layer. The model first made a token representative from the sentence, then calculated it to analyze and made the classification based on the calculation. For this model to be effective, we trained our model with the labeled public dataset retrieved from Twitter. These datasets are classified into hate speech and non-hate speech, and these labels are applied to the models. We evaluated our model and achieved an accuracy of 93.7%, an improvement for classifying hate speech sentences from previous research.\",\"PeriodicalId\":32468,\"journal\":{\"name\":\"JOIV International Journal on Informatics Visualization\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JOIV International Journal on Informatics Visualization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30630/joiv.7.3.1035\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JOIV International Journal on Informatics Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30630/joiv.7.3.1035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Decision Sciences","Score":null,"Total":0}

引用次数: 0

摘要

仇恨言论是一种向他人传播仇恨的言论行为。在这个人人都与社交媒体联系的数字时代，仇恨言论正在迅速增长，无法控制。由于缺乏对仇恨言论和自由言论的区别的认识，许多人在社交媒体上批评某些东西时，并没有意识到他们正在发表仇恨言论。结果使受害者感到与社会疏远，传播它的人往往会面临法律制裁。检测句子中是否含有仇恨言论，对于反击人们的无知至关重要。为了检测这样的句子，机器学习算法被广泛用于帮助识别每个句子。在本文中，我们使用机器学习中的一个子集深度学习和最新的IndoBERT模型IndoBERTweet，并将其与RNN层BiLSTM相结合。IndoBERTweet的出现，通过加入BiLSTM层，为进一步提高文本分类性能提供了更多的机会。该模型首先从句子中得到一个token代表，然后对其进行计算分析，并在此基础上进行分类。为了使该模型有效，我们使用从Twitter检索的标记公共数据集训练我们的模型。这些数据集被分为仇恨言论和非仇恨言论，并将这些标签应用到模型中。我们对我们的模型进行了评估，准确率达到了93.7%，这是对以前研究中仇恨言论句子进行分类的一个改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Indonesian Hate Speech Detection Using IndoBERTweet and BiLSTM on Twitter

Hate speech is an act of speech to spread hate to other people. In this digital era where everyone connects with social media, hate speech is growing rapidly and uncontrollably. Many people do not realize they are giving hate speech when critics something on social media due to a lack of awareness of the difference between hate speech and free speech. The results make victims feel alienated from society, and the people who spread it would often face the law. Detection in the sentences to identify whether it contains hate speech is essential to counter people's ignorance. For detecting such sentences, a machine learning algorithm is widely used to help identify each sentence. In this paper, we used a subset from machine learning named deep learning with the latest IndoBERT model named IndoBERTweet and combined it with RNN layer named BiLSTM. The appearance of IndoBERTweet opened more chances to further improve text classification performance with the addition of BiLSTM layer. The model first made a token representative from the sentence, then calculated it to analyze and made the classification based on the calculation. For this model to be effective, we trained our model with the labeled public dataset retrieved from Twitter. These datasets are classified into hate speech and non-hate speech, and these labels are applied to the models. We evaluated our model and achieved an accuracy of 93.7%, an improvement for classifying hate speech sentences from previous research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊