第一阶段检索的语义模型:综述

Yinqiong Cai, Yixing Fan, Jiafeng Guo, Fei Sun, Ruqing Zhang, Xueqi Cheng
{"title":"第一阶段检索的语义模型:综述","authors":"Yinqiong Cai, Yixing Fan, Jiafeng Guo, Fei Sun, Ruqing Zhang, Xueqi Cheng","doi":"10.1145/3486250","DOIUrl":null,"url":null,"abstract":"Multi-stage ranking pipelines have been a practical solution in modern search systems, where the first-stage retrieval is to return a subset of candidate documents and latter stages attempt to re-rank those candidates. Unlike re-ranking stages going through quick technique shifts over the past decades, the first-stage retrieval has long been dominated by classical term-based models. Unfortunately, these models suffer from the vocabulary mismatch problem, which may block re-ranking stages from relevant documents at the very beginning. Therefore, it has been a long-term desire to build semantic models for the first-stage retrieval that can achieve high recall efficiently. Recently, we have witnessed an explosive growth of research interests on the first-stage semantic retrieval models. We believe it is the right time to survey current status, learn from existing methods, and gain some insights for future development. In this article, we describe the current landscape of the first-stage retrieval models under a unified framework to clarify the connection between classical term-based retrieval methods, early semantic retrieval methods, and neural semantic retrieval methods. Moreover, we identify some open challenges and envision some future directions, with the hope of inspiring more research on these important yet less investigated topics.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"183 1","pages":"1 - 42"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":"{\"title\":\"Semantic Models for the First-Stage Retrieval: A Comprehensive Review\",\"authors\":\"Yinqiong Cai, Yixing Fan, Jiafeng Guo, Fei Sun, Ruqing Zhang, Xueqi Cheng\",\"doi\":\"10.1145/3486250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-stage ranking pipelines have been a practical solution in modern search systems, where the first-stage retrieval is to return a subset of candidate documents and latter stages attempt to re-rank those candidates. Unlike re-ranking stages going through quick technique shifts over the past decades, the first-stage retrieval has long been dominated by classical term-based models. Unfortunately, these models suffer from the vocabulary mismatch problem, which may block re-ranking stages from relevant documents at the very beginning. Therefore, it has been a long-term desire to build semantic models for the first-stage retrieval that can achieve high recall efficiently. Recently, we have witnessed an explosive growth of research interests on the first-stage semantic retrieval models. We believe it is the right time to survey current status, learn from existing methods, and gain some insights for future development. In this article, we describe the current landscape of the first-stage retrieval models under a unified framework to clarify the connection between classical term-based retrieval methods, early semantic retrieval methods, and neural semantic retrieval methods. Moreover, we identify some open challenges and envision some future directions, with the hope of inspiring more research on these important yet less investigated topics.\",\"PeriodicalId\":6934,\"journal\":{\"name\":\"ACM Transactions on Information Systems (TOIS)\",\"volume\":\"183 1\",\"pages\":\"1 - 42\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"68\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Information Systems (TOIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3486250\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems (TOIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3486250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 68

摘要

多阶段排序管道在现代搜索系统中已经成为一种实用的解决方案,其中第一阶段检索是返回候选文档的子集,后一阶段尝试对这些候选文档重新排序。与过去几十年快速技术转换的重新排序阶段不同,第一阶段检索长期以来一直由经典的基于术语的模型主导。不幸的是,这些模型存在词汇表不匹配问题,这可能会在一开始就阻碍相关文档的重新排序阶段。因此,为第一阶段检索建立能够有效实现高召回率的语义模型一直是一个长期的愿望。近年来,人们对第一阶段语义检索模型的研究兴趣呈爆炸式增长。我们认为,现在正是审视现状、借鉴现有方法、为未来发展提供一些启示的好时机。在本文中,我们在一个统一的框架下描述了第一阶段检索模型的现状,以澄清经典的基于术语的检索方法、早期语义检索方法和神经语义检索方法之间的联系。此外,我们确定了一些开放的挑战,并设想了一些未来的方向,希望在这些重要但研究较少的主题上激发更多的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Semantic Models for the First-Stage Retrieval: A Comprehensive Review
Multi-stage ranking pipelines have been a practical solution in modern search systems, where the first-stage retrieval is to return a subset of candidate documents and latter stages attempt to re-rank those candidates. Unlike re-ranking stages going through quick technique shifts over the past decades, the first-stage retrieval has long been dominated by classical term-based models. Unfortunately, these models suffer from the vocabulary mismatch problem, which may block re-ranking stages from relevant documents at the very beginning. Therefore, it has been a long-term desire to build semantic models for the first-stage retrieval that can achieve high recall efficiently. Recently, we have witnessed an explosive growth of research interests on the first-stage semantic retrieval models. We believe it is the right time to survey current status, learn from existing methods, and gain some insights for future development. In this article, we describe the current landscape of the first-stage retrieval models under a unified framework to clarify the connection between classical term-based retrieval methods, early semantic retrieval methods, and neural semantic retrieval methods. Moreover, we identify some open challenges and envision some future directions, with the hope of inspiring more research on these important yet less investigated topics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Collaborative Graph Learning for Session-based Recommendation GraphHINGE: Learning Interaction Models of Structured Neighborhood on Heterogeneous Information Network Scalable Representation Learning for Dynamic Heterogeneous Information Networks via Metagraphs Complex-valued Neural Network-based Quantum Language Models eFraudCom: An E-commerce Fraud Detection System via Competitive Graph Neural Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1