{"title":"Zero-Shot Cross-Lingual Named Entity Recognition via Progressive Multi-Teacher Distillation","authors":"Zhuoran Li;Chunming Hu;Richong Zhang;Junfan Chen;Xiaohui Guo","doi":"10.1109/TASLP.2024.3449029","DOIUrl":null,"url":null,"abstract":"Cross-lingual learning aims to transfer knowledge from one natural language to another. Zero-shot cross-lingual named entity recognition (NER) tasks are to train an NER model on source languages and to identify named entities in other languages. Existing knowledge distillation-based models in a teacher-student manner leverage the unlabeled samples from the target languages and show their superiority in this setting. However, the valuable similarity information between tokens in the target language is ignored. And the teacher model trained solely on the source language generates low-quality pseudo-labels. These two facts impact the performance of cross-lingual NER. To improve the reliability of the teacher model, in this study, we first introduce one extra simple binary classification teacher model by similarity learning to measure if the inputs are from the same class. We note that this binary classification auxiliary task is easier, and the two teachers simultaneously supervise the student model for better performance. Furthermore, given such a stronger student model, we propose a progressive knowledge distillation framework that extensively fine-tunes the teacher model on the target-language pseudo-labels generated by the student model. Empirical studies on three datasets across seven different languages show that our presented model outperforms state-of-the-art methods.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4617-4630"},"PeriodicalIF":4.1000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10645066/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Cross-lingual learning aims to transfer knowledge from one natural language to another. Zero-shot cross-lingual named entity recognition (NER) tasks are to train an NER model on source languages and to identify named entities in other languages. Existing knowledge distillation-based models in a teacher-student manner leverage the unlabeled samples from the target languages and show their superiority in this setting. However, the valuable similarity information between tokens in the target language is ignored. And the teacher model trained solely on the source language generates low-quality pseudo-labels. These two facts impact the performance of cross-lingual NER. To improve the reliability of the teacher model, in this study, we first introduce one extra simple binary classification teacher model by similarity learning to measure if the inputs are from the same class. We note that this binary classification auxiliary task is easier, and the two teachers simultaneously supervise the student model for better performance. Furthermore, given such a stronger student model, we propose a progressive knowledge distillation framework that extensively fine-tunes the teacher model on the target-language pseudo-labels generated by the student model. Empirical studies on three datasets across seven different languages show that our presented model outperforms state-of-the-art methods.
跨语言学习旨在将知识从一种自然语言转移到另一种自然语言。零点跨语言命名实体识别(NER)任务是在源语言上训练 NER 模型,并识别其他语言中的命名实体。现有的基于知识提炼的模型以教师-学生的方式利用来自目标语言的未标记样本,并在这种情况下显示出其优越性。然而,目标语言中标记之间有价值的相似性信息却被忽略了。而且,仅根据源语言训练的教师模型会生成低质量的伪标签。这两个事实影响了跨语言 NER 的性能。为了提高教师模型的可靠性,在本研究中,我们首先通过相似性学习引入了一个额外的简单二元分类教师模型,以衡量输入是否来自同一类别。我们注意到,这种二元分类辅助任务比较简单,而且两个教师同时监督学生模型,可以获得更好的性能。此外,在学生模型更强的情况下,我们提出了一个渐进式知识提炼框架,在学生模型生成的目标语言伪标签上对教师模型进行广泛的微调。在七个不同语言的三个数据集上进行的实证研究表明,我们提出的模型优于最先进的方法。
期刊介绍:
The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.