基于查询视图的文档重排多级增强表示学习

Hai Liu, Xiaozhi Zhu, Yong Tang, Chaobo He, Tianyong Hao
{"title":"基于查询视图的文档重排多级增强表示学习","authors":"Hai Liu, Xiaozhi Zhu, Yong Tang, Chaobo He, Tianyong Hao","doi":"10.1007/s11280-024-01296-x","DOIUrl":null,"url":null,"abstract":"<p>The large-size language model is able to implicitly extract informative semantic features from queries and candidate documents to achieve impressive reranking performance. However, the large model relies on its own large number of parameters to achieve it and it is not known exactly what semantic information has been learned. In this paper, we propose a multi-stage enhanced representation learning method based on Query-View (MERL) with Intra-query stage and Inter-query stage to guide the model to explicitly learn the semantic relationship between the query and documents. In the Intra-query training stage, a content-based contrastive learning module without considering the special token [CLS] of BERT is utilized to optimize the semantic similarity of query and relevant documents. In the Inter-query training stage, an entity-oriented masked query prediction for establish a semantic relation of query-document pairs and an Inter-query contrastive learning module for extracting similar matching pattern of query-relevant documents are employed. Extensive experiments on MS MARCO passage ranking and TREC DL datasets show that the MERL method obtain significant improvements with a low number of parameters compared to the baseline models.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-stage enhanced representation learning for document reranking based on query view\",\"authors\":\"Hai Liu, Xiaozhi Zhu, Yong Tang, Chaobo He, Tianyong Hao\",\"doi\":\"10.1007/s11280-024-01296-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The large-size language model is able to implicitly extract informative semantic features from queries and candidate documents to achieve impressive reranking performance. However, the large model relies on its own large number of parameters to achieve it and it is not known exactly what semantic information has been learned. In this paper, we propose a multi-stage enhanced representation learning method based on Query-View (MERL) with Intra-query stage and Inter-query stage to guide the model to explicitly learn the semantic relationship between the query and documents. In the Intra-query training stage, a content-based contrastive learning module without considering the special token [CLS] of BERT is utilized to optimize the semantic similarity of query and relevant documents. In the Inter-query training stage, an entity-oriented masked query prediction for establish a semantic relation of query-document pairs and an Inter-query contrastive learning module for extracting similar matching pattern of query-relevant documents are employed. Extensive experiments on MS MARCO passage ranking and TREC DL datasets show that the MERL method obtain significant improvements with a low number of parameters compared to the baseline models.</p>\",\"PeriodicalId\":501180,\"journal\":{\"name\":\"World Wide Web\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"World Wide Web\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s11280-024-01296-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Wide Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11280-024-01296-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型能够从查询和候选文档中隐含地提取信息丰富的语义特征,从而实现令人印象深刻的重新排序性能。然而,大模型是依靠自身的大量参数来实现的,而且不知道到底学到了哪些语义信息。在本文中,我们提出了一种基于查询视图(MERL)的多阶段增强表示学习方法,包括查询内阶段(Intra-query stage)和查询间阶段(Inter-query stage),以引导模型明确学习查询与文档之间的语义关系。在查询内训练阶段,利用基于内容的对比学习模块(不考虑 BERT 的特殊标记 [CLS])来优化查询和相关文档的语义相似性。在查询间训练阶段,利用面向实体的屏蔽查询预测建立查询-文档对的语义关系,并利用查询间对比学习模块提取查询-相关文档的相似匹配模式。在 MS MARCO 段落排序和 TREC DL 数据集上进行的大量实验表明,与基线模型相比,MERL 方法在参数数量较少的情况下就能获得显著的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multi-stage enhanced representation learning for document reranking based on query view

The large-size language model is able to implicitly extract informative semantic features from queries and candidate documents to achieve impressive reranking performance. However, the large model relies on its own large number of parameters to achieve it and it is not known exactly what semantic information has been learned. In this paper, we propose a multi-stage enhanced representation learning method based on Query-View (MERL) with Intra-query stage and Inter-query stage to guide the model to explicitly learn the semantic relationship between the query and documents. In the Intra-query training stage, a content-based contrastive learning module without considering the special token [CLS] of BERT is utilized to optimize the semantic similarity of query and relevant documents. In the Inter-query training stage, an entity-oriented masked query prediction for establish a semantic relation of query-document pairs and an Inter-query contrastive learning module for extracting similar matching pattern of query-relevant documents are employed. Extensive experiments on MS MARCO passage ranking and TREC DL datasets show that the MERL method obtain significant improvements with a low number of parameters compared to the baseline models.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
HetFS: a method for fast similarity search with ad-hoc meta-paths on heterogeneous information networks A SHAP-based controversy analysis through communities on Twitter pFind: Privacy-preserving lost object finding in vehicular crowdsensing Use of prompt-based learning for code-mixed and code-switched text classification Drug traceability system based on semantic blockchain and on a reputation method
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1