Avi Chawla, Nidhi Mulay, Vikas Bishnoi, Gaurav Dhama
{"title":"Improving the performance of Transformer Context Encoders for NER","authors":"Avi Chawla, Nidhi Mulay, Vikas Bishnoi, Gaurav Dhama","doi":"10.23919/fusion49465.2021.9627061","DOIUrl":null,"url":null,"abstract":"Large Transformer based models have provided state-of-the-art results on a variety of Natural Language Processing (NLP) tasks. While these Transformer models perform exceptionally well on a wide range of NLP tasks, their usage in Sequence Labeling has been mostly muted. Although pretrained Transformer models such as BERT and XLNet have been successfully employed as input representation, the use of the Transformer model as a context encoder for sequence labeling is still minimal, and most recent works still use recurrent architecture as the context encoder. In this paper, we compare the performance of the Transformer and Recurrent architecture as context encoders on the Named Entity Recognition (NER) task. We vary the character-level representation module from the previously proposed NER models in literature and show how the modification can improve the NER model’s performance. We also explore data augmentation as a method for enhancing their performance. Experimental results on three NER datasets show that our proposed techniques established a new state-of-the-art using the Transformer Encoder over the previously proposed models in the literature using only non-contextualized embeddings.","PeriodicalId":226850,"journal":{"name":"2021 IEEE 24th International Conference on Information Fusion (FUSION)","volume":"397 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 24th International Conference on Information Fusion (FUSION)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/fusion49465.2021.9627061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Large Transformer based models have provided state-of-the-art results on a variety of Natural Language Processing (NLP) tasks. While these Transformer models perform exceptionally well on a wide range of NLP tasks, their usage in Sequence Labeling has been mostly muted. Although pretrained Transformer models such as BERT and XLNet have been successfully employed as input representation, the use of the Transformer model as a context encoder for sequence labeling is still minimal, and most recent works still use recurrent architecture as the context encoder. In this paper, we compare the performance of the Transformer and Recurrent architecture as context encoders on the Named Entity Recognition (NER) task. We vary the character-level representation module from the previously proposed NER models in literature and show how the modification can improve the NER model’s performance. We also explore data augmentation as a method for enhancing their performance. Experimental results on three NER datasets show that our proposed techniques established a new state-of-the-art using the Transformer Encoder over the previously proposed models in the literature using only non-contextualized embeddings.