Mohammad Dehghani, Diyana Tehrany Dehkordy, M. Bahrani
{"title":"Abusive words Detection in Persian tweets using machine learning and deep learning techniques","authors":"Mohammad Dehghani, Diyana Tehrany Dehkordy, M. Bahrani","doi":"10.1109/ICSPIS54653.2021.9729390","DOIUrl":null,"url":null,"abstract":"Regarding the development of the web and increasing user interaction, different users' opinions about different phenomena have been observed. In recent years, the detection of Abusive language in online content used by users has become a necessity. Twitter is a platform in which users can share text messages. On Twitter, different people express their opinion on different topics with different kinds of literature, some of which are accompanied by Abusive words. On the one hand, Abusive comments can be derogatory and harmful to those who share content. On the other hand, filtering these comments in languages other than English is difficult and time-consuming. Most social media platforms are still looking for more efficient ways to filter comments because the manual method is expensive, slow, and risky. Automating helps better identify and filter Abusive comments and increase user safety. In the present article, a deep learning method is presented to detect users' Abusive words in Persian tweets. Due to the lack of appropriate data in Persian, we created a database of 33338 Persian tweets, of which 10% contained Abusive words and 90% were non-Abusive. Perhaps the easiest way is to use a fixed list and filter comments. So, a list of 648 Abusive words in Persian was prepared and used to test the database (accuracy of 76%). Finally, a deep neural network is implemented to detect Abusive words using the Bert language model, and it had the best performance with an accuracy of 97.7%.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPIS54653.2021.9729390","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Regarding the development of the web and increasing user interaction, different users' opinions about different phenomena have been observed. In recent years, the detection of Abusive language in online content used by users has become a necessity. Twitter is a platform in which users can share text messages. On Twitter, different people express their opinion on different topics with different kinds of literature, some of which are accompanied by Abusive words. On the one hand, Abusive comments can be derogatory and harmful to those who share content. On the other hand, filtering these comments in languages other than English is difficult and time-consuming. Most social media platforms are still looking for more efficient ways to filter comments because the manual method is expensive, slow, and risky. Automating helps better identify and filter Abusive comments and increase user safety. In the present article, a deep learning method is presented to detect users' Abusive words in Persian tweets. Due to the lack of appropriate data in Persian, we created a database of 33338 Persian tweets, of which 10% contained Abusive words and 90% were non-Abusive. Perhaps the easiest way is to use a fixed list and filter comments. So, a list of 648 Abusive words in Persian was prepared and used to test the database (accuracy of 76%). Finally, a deep neural network is implemented to detect Abusive words using the Bert language model, and it had the best performance with an accuracy of 97.7%.