Saja Al-Dabet, A. Elmassry, Ban AlOmar, Abdullah Alshamsi
{"title":"Transformer-based Arabic Offensive Speech Detection","authors":"Saja Al-Dabet, A. Elmassry, Ban AlOmar, Abdullah Alshamsi","doi":"10.1109/ESCI56872.2023.10100134","DOIUrl":null,"url":null,"abstract":"The prevalence of social media platforms prompted detecting any language that is intended to harm or intimidate another person or group of people in online posts and comments. On Twitter, for instance, users are susceptible to cyberbullying and hate speech, which may develop into physical and psychological violence. A transformer-based approach is presented in this study to address the offensive speech detection issue. This model employs versions of the CAMeLBERT model and is validated using a mixture of four benchmark Twitter Arabic datasets annotated for hate speech detection task, including the (OSACT5 2022) workshop shared task dataset. The presented model was capable of recognizing Arabic tweets containing offensive speech with 87.15 % accuracy and 83.6 % F1 score.","PeriodicalId":441215,"journal":{"name":"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESCI56872.2023.10100134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The prevalence of social media platforms prompted detecting any language that is intended to harm or intimidate another person or group of people in online posts and comments. On Twitter, for instance, users are susceptible to cyberbullying and hate speech, which may develop into physical and psychological violence. A transformer-based approach is presented in this study to address the offensive speech detection issue. This model employs versions of the CAMeLBERT model and is validated using a mixture of four benchmark Twitter Arabic datasets annotated for hate speech detection task, including the (OSACT5 2022) workshop shared task dataset. The presented model was capable of recognizing Arabic tweets containing offensive speech with 87.15 % accuracy and 83.6 % F1 score.