{"title":"Cyberbullying Detection in Urdu Language Using Machine Learning","authors":"Sara Khan, Amna Qureshi","doi":"10.1109/ETECTE55893.2022.10007379","DOIUrl":null,"url":null,"abstract":"Cyberbullying has become a significant problem with the surge in the use of social media. The most basic way to prevent cyberbullying on these social media platforms is to identify and remove offensive comments. However, it is hard for humans to read and remove all the comments manually. Current research work focuses on using machine learning to detect and eliminate cyberbullying. Although most of the work has been conducted on English texts to detect cyberbullying, limited to no work can be found in Urdu. This paper aims to detect cyberbullying from the users' comments posted in Urdu on Twitter using machine learning and Natural Language Processing (NLP) techniques. To the best of our knowledge, cyberbullying detection on Urdu text comments has not been performed due to the lack of a publicly available standard Urdu dataset. In this paper, we created a dataset of offensive user-generated Urdu comments from Twitter. The comments in the dataset are classified into five categories. n-gram techniques are used to extract features at character and word levels. Various supervised machine-learning techniques are applied to the dataset to detect cyberbullying. Evaluation metrics such as precision, recall, accuracy and F1 scores are used to analyse the performance of machine learning techniques.","PeriodicalId":131572,"journal":{"name":"2022 International Conference on Emerging Trends in Electrical, Control, and Telecommunication Engineering (ETECTE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Emerging Trends in Electrical, Control, and Telecommunication Engineering (ETECTE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ETECTE55893.2022.10007379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Cyberbullying has become a significant problem with the surge in the use of social media. The most basic way to prevent cyberbullying on these social media platforms is to identify and remove offensive comments. However, it is hard for humans to read and remove all the comments manually. Current research work focuses on using machine learning to detect and eliminate cyberbullying. Although most of the work has been conducted on English texts to detect cyberbullying, limited to no work can be found in Urdu. This paper aims to detect cyberbullying from the users' comments posted in Urdu on Twitter using machine learning and Natural Language Processing (NLP) techniques. To the best of our knowledge, cyberbullying detection on Urdu text comments has not been performed due to the lack of a publicly available standard Urdu dataset. In this paper, we created a dataset of offensive user-generated Urdu comments from Twitter. The comments in the dataset are classified into five categories. n-gram techniques are used to extract features at character and word levels. Various supervised machine-learning techniques are applied to the dataset to detect cyberbullying. Evaluation metrics such as precision, recall, accuracy and F1 scores are used to analyse the performance of machine learning techniques.