{"title":"IITK@Detox at SemEval-2021 Task 5: Semi-Supervised Learning and Dice Loss for Toxic Spans Detection","authors":"Archit Bansal, Abhay Kaushik, Ashutosh Modi","doi":"10.18653/v1/2021.semeval-1.24","DOIUrl":null,"url":null,"abstract":"In this work, we present our approach and findings for SemEval-2021 Task 5 - Toxic Spans Detection. The task’s main aim was to identify spans to which a given text’s toxicity could be attributed. The task is challenging mainly due to two constraints: the small training dataset and imbalanced class distribution. Our paper investigates two techniques, semi-supervised learning and learning with Self-Adjusting Dice Loss, for tackling these challenges. Our submitted system (ranked ninth on the leader board) consisted of an ensemble of various pre-trained Transformer Language Models trained using either of the above-proposed techniques.","PeriodicalId":444285,"journal":{"name":"International Workshop on Semantic Evaluation","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Semantic Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2021.semeval-1.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In this work, we present our approach and findings for SemEval-2021 Task 5 - Toxic Spans Detection. The task’s main aim was to identify spans to which a given text’s toxicity could be attributed. The task is challenging mainly due to two constraints: the small training dataset and imbalanced class distribution. Our paper investigates two techniques, semi-supervised learning and learning with Self-Adjusting Dice Loss, for tackling these challenges. Our submitted system (ranked ninth on the leader board) consisted of an ensemble of various pre-trained Transformer Language Models trained using either of the above-proposed techniques.
在这项工作中,我们介绍了SemEval-2021任务5 -毒性跨度检测的方法和发现。这项任务的主要目的是确定给定文本的毒性可归因于哪些范围。任务的挑战性主要来自两个方面的限制:训练数据集小和类分布不平衡。本文研究了半监督学习和自调整骰子损失学习两种技术来解决这些挑战。我们提交的系统(在排行榜上排名第九)由使用上述任一种技术训练的各种预训练的Transformer Language Models组成。