Balanced Knowledge Distillation with Contrastive Learning for Document Re-ranking

Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval Pub Date : 2023-08-09 DOI:10.1145/3578337.3605120

Yingrui Yang, Shanxiu He, Yifan Qiao, Wentai Xie, Tao Yang

引用次数: 0

Abstract

Knowledge distillation is commonly used in training a neural document ranking model by employing a teacher to guide model refinement. As a teacher may not be correct in all cases, over-calibration between the student and teacher models can make training less effective. This paper focuses on the KL divergence loss used for knowledge distillation in document re-ranking, and re-visits balancing of knowledge distillation with explicit contrastive learning. The proposed loss function takes a conservative approach in imitating teacher's behavior, and allows student to deviate from a teacher's model sometimes through training. This paper presents analytic results with an evaluation on MS MARCO passages to validate the usefulness of the proposed loss for the transformer-based ColBERT re-ranking.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

平衡知识精馏与对比学习的文献重排序

知识蒸馏是一种常用的训练神经文档排序模型的方法，通过教师来指导模型的细化。由于教师可能不会在所有情况下都是正确的，因此学生和教师模型之间的过度校准可能会降低培训的有效性。本文重点研究了知识蒸馏在文献重排序中的KL散度损失，以及知识蒸馏与显式对比学习的重访平衡。所提出的损失函数在模仿教师行为时采取保守的方法，并允许学生有时通过训练偏离教师的模型。本文给出了分析结果，并对MS MARCO通道进行了评估，以验证所提出的损失对基于变压器的ColBERT重新排序的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval

自引率

0.00%

发文量