Xiang Li, Zhi Zeng, Mingmin Wu, Zhongqiang Huang, Ying Sha, Lei Shi
{"title":"An Offensive Language Identification Based on Deep Semantic Feature Fusion","authors":"Xiang Li, Zhi Zeng, Mingmin Wu, Zhongqiang Huang, Ying Sha, Lei Shi","doi":"10.1109/ICCC56324.2022.10066011","DOIUrl":null,"url":null,"abstract":"Various forms of social interactions are often char-acterized by toxic or offensive words that can be collectively referred to as offensive languages, which has become a unique linguistic phenomenon in social media platforms. How to detect and identify these offensive languages in social media platforms has become one of the important research in the field of natural language processing. Existing methods utilize machine learning algorithms or text representation models based on deep learning to learn the features of offensive languages and identify them, which have achieved good performances. However, traditional machine learning-based methods mainly rely on keyword identi-fication and blocking, deep learning-based methods do not ade-quately explore the fused deep semantic features of the content by combining word-level embeddings and sentence-level deep semantic feature representations of sentences, which cannot ef-fectively identify offensive languages that do not contain common offensive words but indicate offensive meanings. In this research, we propose a novel offensive language identification model based on deep semantic feature fusion, which uses the pre-trained model Bert to obtain word-level embedding representations of offensive languages, and then integrates the RCNN that combines with the attention mechanism to extract the fused deep semantic feature representations of offensive languages, and label encoder and offensive predictor to improve the identification accuracy and generalization ability of the model so that the performances of the model do not rely on the offensive language lexicon entirely and can identify offensive languages that do not contain common offensive words but indicate offensive meanings. Experimental results on Wikipedia and Twitter comment datasets show that our proposed model can better understand the context and discover potential offensive meanings, and outperforms existing methods.","PeriodicalId":263098,"journal":{"name":"2022 IEEE 8th International Conference on Computer and Communications (ICCC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 8th International Conference on Computer and Communications (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCC56324.2022.10066011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Various forms of social interactions are often char-acterized by toxic or offensive words that can be collectively referred to as offensive languages, which has become a unique linguistic phenomenon in social media platforms. How to detect and identify these offensive languages in social media platforms has become one of the important research in the field of natural language processing. Existing methods utilize machine learning algorithms or text representation models based on deep learning to learn the features of offensive languages and identify them, which have achieved good performances. However, traditional machine learning-based methods mainly rely on keyword identi-fication and blocking, deep learning-based methods do not ade-quately explore the fused deep semantic features of the content by combining word-level embeddings and sentence-level deep semantic feature representations of sentences, which cannot ef-fectively identify offensive languages that do not contain common offensive words but indicate offensive meanings. In this research, we propose a novel offensive language identification model based on deep semantic feature fusion, which uses the pre-trained model Bert to obtain word-level embedding representations of offensive languages, and then integrates the RCNN that combines with the attention mechanism to extract the fused deep semantic feature representations of offensive languages, and label encoder and offensive predictor to improve the identification accuracy and generalization ability of the model so that the performances of the model do not rely on the offensive language lexicon entirely and can identify offensive languages that do not contain common offensive words but indicate offensive meanings. Experimental results on Wikipedia and Twitter comment datasets show that our proposed model can better understand the context and discover potential offensive meanings, and outperforms existing methods.