Domain-specific knowledge distillation yields smaller and better models for conversational commerce

Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5) Pub Date : 1900-01-01 DOI:10.18653/v1/2022.ecnlp-1.18

Kristen Howell, Jian Wang, Akshay Hazare, Joe Bradley, Chris Brew, Xi Chen, Matthew Dunn, Beth-Ann Hockey, Andrew Maurer, D. Widdows

引用次数: 2

Abstract

We demonstrate that knowledge distillation can be used not only to reduce model size, but to simultaneously adapt a contextual language model to a specific domain. We use Multilingual BERT (mBERT; Devlin et al., 2019) as a starting point and follow the knowledge distillation approach of (Sahn et al., 2019) to train a smaller multilingual BERT model that is adapted to the domain at hand. We show that for in-domain tasks, the domain-specific model shows on average 2.3% improvement in F1 score, relative to a model distilled on domain-general data. Whereas much previous work with BERT has fine-tuned the encoder weights during task training, we show that the model improvements from distillation on in-domain data persist even when the encoder weights are frozen during task training, allowing a single encoder to support classifiers for multiple tasks and languages.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

特定领域的知识精馏为会话式商务产生更小、更好的模型

我们证明了知识蒸馏不仅可以用来减少模型的大小，而且可以同时使上下文语言模型适应特定的领域。我们使用多语言BERT (mBERT;Devlin等人，2019)作为起点，并遵循(Sahn等人，2019)的知识蒸馏方法来训练适应手头领域的较小的多语言BERT模型。我们表明，对于领域内任务，领域特定模型的F1分数平均提高了2.3%，相对于在领域通用数据上提炼的模型。尽管BERT之前的许多工作在任务训练期间对编码器权重进行了微调，但我们表明，即使在任务训练期间编码器权重冻结时，对域内数据进行蒸馏的模型改进仍然存在，从而允许单个编码器支持多个任务和语言的分类器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of The Fifth Workshop on e-Commerce and NLP (ECNLP 5)

自引率

0.00%

发文量