{"title":"Improved Bi-GRU Model for Imbalanced English Toxic Comments Dataset","authors":"Zhongguo Wang, Bao Zhang","doi":"10.1145/3508230.3508234","DOIUrl":null,"url":null,"abstract":"Deep learning is widely used in the study of English toxic comment classification. However, most existing studies failed to consider data imbalance. Aiming at an imbalanced English Toxic Comments Dataset, we propose an improved Bi-gated recurrent unit (GRU) model that combines an oversampling and cost-sensitive method. We use random oversampling in the improved model to reduce the data imbalance, introduce a cost-sensitive method, and propose a new loss function for the Bi-GRU model. Experimental results show that the improved Bi-GRU model demonstrates a significantly improved classification performance in the imbalanced English Toxic Comments Dataset.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"142 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508230.3508234","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Deep learning is widely used in the study of English toxic comment classification. However, most existing studies failed to consider data imbalance. Aiming at an imbalanced English Toxic Comments Dataset, we propose an improved Bi-gated recurrent unit (GRU) model that combines an oversampling and cost-sensitive method. We use random oversampling in the improved model to reduce the data imbalance, introduce a cost-sensitive method, and propose a new loss function for the Bi-GRU model. Experimental results show that the improved Bi-GRU model demonstrates a significantly improved classification performance in the imbalanced English Toxic Comments Dataset.