Injecting the BM25 Score as Text Improves BERT-Based Re-rankers

Arian Askari, Amin Abolghasemi, G. Pasi, Wessel Kraaij, S. Verberne
{"title":"Injecting the BM25 Score as Text Improves BERT-Based Re-rankers","authors":"Arian Askari, Amin Abolghasemi, G. Pasi, Wessel Kraaij, S. Verberne","doi":"10.48550/arXiv.2301.09728","DOIUrl":null,"url":null,"abstract":"In this paper we propose a novel approach for combining first-stage lexical retrieval models and Transformer-based re-rankers: we inject the relevance score of the lexical model as a token in the middle of the input of the cross-encoder re-ranker. It was shown in prior work that interpolation between the relevance score of lexical and BERT-based re-rankers may not consistently result in higher effectiveness. Our idea is motivated by the finding that BERT models can capture numeric information. We compare several representations of the BM25 score and inject them as text in the input of four different cross-encoders. We additionally analyze the effect for different query types, and investigate the effectiveness of our method for capturing exact matching relevance. Evaluation on the MSMARCO Passage collection and the TREC DL collections shows that the proposed method significantly improves over all cross-encoder re-rankers as well as the common interpolation methods. We show that the improvement is consistent for all query types. We also find an improvement in exact matching capabilities over both BM25 and the cross-encoders. Our findings indicate that cross-encoder re-rankers can efficiently be improved without additional computational burden and extra steps in the pipeline by explicitly adding the output of the first-stage ranker to the model input, and this effect is robust for different models and query types.","PeriodicalId":126309,"journal":{"name":"European Conference on Information Retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Conference on Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.09728","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

In this paper we propose a novel approach for combining first-stage lexical retrieval models and Transformer-based re-rankers: we inject the relevance score of the lexical model as a token in the middle of the input of the cross-encoder re-ranker. It was shown in prior work that interpolation between the relevance score of lexical and BERT-based re-rankers may not consistently result in higher effectiveness. Our idea is motivated by the finding that BERT models can capture numeric information. We compare several representations of the BM25 score and inject them as text in the input of four different cross-encoders. We additionally analyze the effect for different query types, and investigate the effectiveness of our method for capturing exact matching relevance. Evaluation on the MSMARCO Passage collection and the TREC DL collections shows that the proposed method significantly improves over all cross-encoder re-rankers as well as the common interpolation methods. We show that the improvement is consistent for all query types. We also find an improvement in exact matching capabilities over both BM25 and the cross-encoders. Our findings indicate that cross-encoder re-rankers can efficiently be improved without additional computational burden and extra steps in the pipeline by explicitly adding the output of the first-stage ranker to the model input, and this effect is robust for different models and query types.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
注入BM25分数作为文本改善基于bert的重新排序
本文提出了一种结合第一阶段词法检索模型和基于transformer的重新排序器的新方法:我们在交叉编码器重新排序器的输入中间注入词法模型的相关分数作为标记。先前的研究表明,词汇相关性评分与基于bert的重新排序器之间的插值并不一定会产生更高的有效性。我们的想法源于BERT模型可以捕获数字信息的发现。我们比较了BM25分数的几种表示,并将它们作为文本注入到四个不同的交叉编码器的输入中。我们还分析了不同查询类型的效果,并研究了我们的方法在捕获精确匹配相关性方面的有效性。对MSMARCO通道集合和TREC DL集合的评估表明,该方法比所有交叉编码器重新排序器以及常用的插值方法都有显著的改进。我们展示了对所有查询类型的改进是一致的。我们还发现在BM25和交叉编码器上精确匹配能力的改进。我们的研究结果表明,通过显式地将第一阶段排序器的输出添加到模型输入中,可以有效地改进跨编码器重新排序,而无需额外的计算负担和额外的步骤,并且这种效果对于不同的模型和查询类型都是鲁棒的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Experiments in News Bias Detection with Pre-trained Neural Transformers Is Interpretable Machine Learning Effective at Feature Selection for Neural Learning-to-Rank? Two-Step SPLADE: Simple, Efficient and Effective Approximation of SPLADE Exploring the Nexus Between Retrievability and Query Generation Strategies Countering Mainstream Bias via End-to-End Adaptive Local Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1