{"title":"基于深度学习的使用Electra的有毒评论严重性自动评分","authors":"Tiancong Zhang","doi":"10.1145/3556677.3556693","DOIUrl":null,"url":null,"abstract":"With the increasing popularity of the Internet, social media plays a crucial role in people's daily communication. However, due to the anonymity of Internet, toxic comments emerge in an endless stream on the Internet, which seriously affects the health of online social environment. To effectively reduce the impact of toxic comments, automated scoring methods for the severity of toxic comments are in great demand. For that purpose, a deep-learning-based natural language processing technique is proposed using ELECTRA to automatically score the toxicity of a comment in this work. The backbone of our model is the ELECTRA discriminator, and the downstream regression task is accomplished by the following head layer. Three head layers are implemented separately: multi-layer perceptron, convolutional neural network, and attention. The dataset used for model training is from the Kaggle competition Toxic Comment Classification Challenge, and the model performance is evaluated through another Kaggle competition Jigsaw Rate Severity of Toxic Comments. By a boost from the K-Fold cross validation and an ensemble of three models with different head layers, our method can reach a competition score 0.80343. Such score ranks 71/2301 (top 3.1%) in the leaderboard and can get a silver medal in the competition. The results in this work would help filter the toxic comments and harmful text information automatically and effectively on the Internet, and could greatly reduce the cost of manual review and help build a healthier Internet environment.","PeriodicalId":350340,"journal":{"name":"Proceedings of the 2022 6th International Conference on Deep Learning Technologies","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep-Learning-Based Automated Scoring for the Severity of Toxic Comments Using Electra\",\"authors\":\"Tiancong Zhang\",\"doi\":\"10.1145/3556677.3556693\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increasing popularity of the Internet, social media plays a crucial role in people's daily communication. However, due to the anonymity of Internet, toxic comments emerge in an endless stream on the Internet, which seriously affects the health of online social environment. To effectively reduce the impact of toxic comments, automated scoring methods for the severity of toxic comments are in great demand. For that purpose, a deep-learning-based natural language processing technique is proposed using ELECTRA to automatically score the toxicity of a comment in this work. The backbone of our model is the ELECTRA discriminator, and the downstream regression task is accomplished by the following head layer. Three head layers are implemented separately: multi-layer perceptron, convolutional neural network, and attention. The dataset used for model training is from the Kaggle competition Toxic Comment Classification Challenge, and the model performance is evaluated through another Kaggle competition Jigsaw Rate Severity of Toxic Comments. By a boost from the K-Fold cross validation and an ensemble of three models with different head layers, our method can reach a competition score 0.80343. Such score ranks 71/2301 (top 3.1%) in the leaderboard and can get a silver medal in the competition. The results in this work would help filter the toxic comments and harmful text information automatically and effectively on the Internet, and could greatly reduce the cost of manual review and help build a healthier Internet environment.\",\"PeriodicalId\":350340,\"journal\":{\"name\":\"Proceedings of the 2022 6th International Conference on Deep Learning Technologies\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 6th International Conference on Deep Learning Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3556677.3556693\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 6th International Conference on Deep Learning Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3556677.3556693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep-Learning-Based Automated Scoring for the Severity of Toxic Comments Using Electra
With the increasing popularity of the Internet, social media plays a crucial role in people's daily communication. However, due to the anonymity of Internet, toxic comments emerge in an endless stream on the Internet, which seriously affects the health of online social environment. To effectively reduce the impact of toxic comments, automated scoring methods for the severity of toxic comments are in great demand. For that purpose, a deep-learning-based natural language processing technique is proposed using ELECTRA to automatically score the toxicity of a comment in this work. The backbone of our model is the ELECTRA discriminator, and the downstream regression task is accomplished by the following head layer. Three head layers are implemented separately: multi-layer perceptron, convolutional neural network, and attention. The dataset used for model training is from the Kaggle competition Toxic Comment Classification Challenge, and the model performance is evaluated through another Kaggle competition Jigsaw Rate Severity of Toxic Comments. By a boost from the K-Fold cross validation and an ensemble of three models with different head layers, our method can reach a competition score 0.80343. Such score ranks 71/2301 (top 3.1%) in the leaderboard and can get a silver medal in the competition. The results in this work would help filter the toxic comments and harmful text information automatically and effectively on the Internet, and could greatly reduce the cost of manual review and help build a healthier Internet environment.