{"title":"Compression Methods for Transformers in Multidomain Sentiment Analysis","authors":"Wojciech Korczynski, Jan Kocoń","doi":"10.1109/ICDMW58026.2022.00062","DOIUrl":null,"url":null,"abstract":"Transformer models like BERT have significantly improved performance on many NLP tasks, e.g., sentiment analysis. However, their large number of parameters makes real-world applications difficult because of computational costs and latency. Many compression methods have been proposed to solve this problem using quantization, weight pruning, and knowledge distillation. In this work, we explore some of these task-specific and task-agnostic methods by comparing their effectiveness and quality on the MultiEmo sentiment analysis dataset. Additionally, we analyze their ability to generalize and capture sentiment features by conducting domain-sentiment experiments. The results show that the compression methods reduce the model size by 8.6 times and the inference time by 6.9 times compared to the original model while maintaining unimpaired quality. Smaller models perform better on tasks with fewer data and retain more remarkable generalization ability after fine-tuning because they are less prone to overfitting. The best trade-off is obtained using the task-agnostic XtremeDistil model.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW58026.2022.00062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Transformer models like BERT have significantly improved performance on many NLP tasks, e.g., sentiment analysis. However, their large number of parameters makes real-world applications difficult because of computational costs and latency. Many compression methods have been proposed to solve this problem using quantization, weight pruning, and knowledge distillation. In this work, we explore some of these task-specific and task-agnostic methods by comparing their effectiveness and quality on the MultiEmo sentiment analysis dataset. Additionally, we analyze their ability to generalize and capture sentiment features by conducting domain-sentiment experiments. The results show that the compression methods reduce the model size by 8.6 times and the inference time by 6.9 times compared to the original model while maintaining unimpaired quality. Smaller models perform better on tasks with fewer data and retain more remarkable generalization ability after fine-tuning because they are less prone to overfitting. The best trade-off is obtained using the task-agnostic XtremeDistil model.