基于检索的问答系统的跨领域知识蒸馏

Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI:10.1145/3442381.3449814

Cen Chen, Chengyu Wang, Minghui Qiu, D. Gao, Linbo Jin, Wang Li

{"title":"基于检索的问答系统的跨领域知识蒸馏","authors":"Cen Chen, Chengyu Wang, Minghui Qiu, D. Gao, Linbo Jin, Wang Li","doi":"10.1145/3442381.3449814","DOIUrl":null,"url":null,"abstract":"Question Answering (QA) systems have been extensively studied in both academia and the research community due to their wide real-world applications. When building such industrial-scale QA applications, we are facing two prominent challenges, i.e., i) lacking a sufficient amount of training data to learn an accurate model and ii) requiring high inference speed for online model serving. There are generally two ways to mitigate the above-mentioned problems. One is to adopt transfer learning to leverage information from other domains; the other is to distill the “dark knowledge” from a large teacher model to small student models. The former usually employs parameter sharing mechanisms for knowledge transfer, but does not utilize the “dark knowledge” of pre-trained large models. The latter usually does not consider the cross-domain information from other domains. We argue that these two types of methods can be complementary to each other. Hence in this work, we provide a new perspective on the potential of the teacher-student paradigm facilitating cross-domain transfer learning, where the teacher and student tasks belong to heterogeneous domains, with the goal to improve the student model’s performance in the target domain. Our framework considers the “dark knowledge” learned from large teacher models and also leverages the adaptive hints to alleviate the domain differences between teacher and student models. Extensive experiments have been conducted on two text matching tasks for retrieval-based QA systems. Results show the proposed method has better performance than the competing methods including the existing state-of-the-art transfer learning methods. We have also deployed our method in an online production system and observed significant improvements compared to the existing approaches in terms of both accuracy and cross-domain robustness.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Cross-domain Knowledge Distillation for Retrieval-based Question Answering Systems\",\"authors\":\"Cen Chen, Chengyu Wang, Minghui Qiu, D. Gao, Linbo Jin, Wang Li\",\"doi\":\"10.1145/3442381.3449814\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Question Answering (QA) systems have been extensively studied in both academia and the research community due to their wide real-world applications. When building such industrial-scale QA applications, we are facing two prominent challenges, i.e., i) lacking a sufficient amount of training data to learn an accurate model and ii) requiring high inference speed for online model serving. There are generally two ways to mitigate the above-mentioned problems. One is to adopt transfer learning to leverage information from other domains; the other is to distill the “dark knowledge” from a large teacher model to small student models. The former usually employs parameter sharing mechanisms for knowledge transfer, but does not utilize the “dark knowledge” of pre-trained large models. The latter usually does not consider the cross-domain information from other domains. We argue that these two types of methods can be complementary to each other. Hence in this work, we provide a new perspective on the potential of the teacher-student paradigm facilitating cross-domain transfer learning, where the teacher and student tasks belong to heterogeneous domains, with the goal to improve the student model’s performance in the target domain. Our framework considers the “dark knowledge” learned from large teacher models and also leverages the adaptive hints to alleviate the domain differences between teacher and student models. Extensive experiments have been conducted on two text matching tasks for retrieval-based QA systems. Results show the proposed method has better performance than the competing methods including the existing state-of-the-art transfer learning methods. We have also deployed our method in an online production system and observed significant improvements compared to the existing approaches in terms of both accuracy and cross-domain robustness.\",\"PeriodicalId\":106672,\"journal\":{\"name\":\"Proceedings of the Web Conference 2021\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Web Conference 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3442381.3449814\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442381.3449814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

问答(QA)系统由于其广泛的实际应用，在学术界和研究界得到了广泛的研究。在构建这种工业规模的QA应用程序时，我们面临着两个突出的挑战，即i)缺乏足够数量的训练数据来学习准确的模型，ii)在线模型服务需要很高的推理速度。一般有两种方法可以缓解上述问题。一是采用迁移学习来利用其他领域的信息;二是将“暗知识”从大的教师模型提炼到小的学生模型。前者通常采用参数共享机制进行知识转移，但不利用预训练大模型的“暗知识”。后者通常不考虑来自其他领域的跨领域信息。我们认为这两种方法可以相互补充。因此，在这项工作中，我们为师生范式促进跨领域迁移学习的潜力提供了一个新的视角，其中教师和学生的任务属于异质领域，目的是提高学生模型在目标领域的表现。我们的框架考虑了从大型教师模型中学到的“暗知识”，并利用自适应提示来缓解教师和学生模型之间的领域差异。在基于检索的QA系统中，对两个文本匹配任务进行了大量的实验。结果表明，该方法比现有的迁移学习方法具有更好的性能。我们还在一个在线生产系统中部署了我们的方法，并观察到与现有方法相比，在准确性和跨域鲁棒性方面有了显著的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Cross-domain Knowledge Distillation for Retrieval-based Question Answering Systems

Question Answering (QA) systems have been extensively studied in both academia and the research community due to their wide real-world applications. When building such industrial-scale QA applications, we are facing two prominent challenges, i.e., i) lacking a sufficient amount of training data to learn an accurate model and ii) requiring high inference speed for online model serving. There are generally two ways to mitigate the above-mentioned problems. One is to adopt transfer learning to leverage information from other domains; the other is to distill the “dark knowledge” from a large teacher model to small student models. The former usually employs parameter sharing mechanisms for knowledge transfer, but does not utilize the “dark knowledge” of pre-trained large models. The latter usually does not consider the cross-domain information from other domains. We argue that these two types of methods can be complementary to each other. Hence in this work, we provide a new perspective on the potential of the teacher-student paradigm facilitating cross-domain transfer learning, where the teacher and student tasks belong to heterogeneous domains, with the goal to improve the student model’s performance in the target domain. Our framework considers the “dark knowledge” learned from large teacher models and also leverages the adaptive hints to alleviate the domain differences between teacher and student models. Extensive experiments have been conducted on two text matching tasks for retrieval-based QA systems. Results show the proposed method has better performance than the competing methods including the existing state-of-the-art transfer learning methods. We have also deployed our method in an online production system and observed significant improvements compared to the existing approaches in terms of both accuracy and cross-domain robustness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助