利用排名模型加强问答文本检索：为 RAG 制定基准、微调和部署 Rerankers

arXiv - CS - Information Retrieval Pub Date : 2024-09-12 DOI:arxiv-2409.07691

Gabriel de Souza P. Moreira, Ronay Ak, Benedikt Schifferer, Mengyao Xu, Radek Osmulski, Even Oldridge

{"title":"利用排名模型加强问答文本检索：为 RAG 制定基准、微调和部署 Rerankers","authors":"Gabriel de Souza P. Moreira, Ronay Ak, Benedikt Schifferer, Mengyao Xu, Radek Osmulski, Even Oldridge","doi":"arxiv-2409.07691","DOIUrl":null,"url":null,"abstract":"Ranking models play a crucial role in enhancing overall accuracy of text\nretrieval systems. These multi-stage systems typically utilize either dense\nembedding models or sparse lexical indices to retrieve relevant passages based\non a given query, followed by ranking models that refine the ordering of the\ncandidate passages by its relevance to the query. This paper benchmarks various publicly available ranking models and examines\ntheir impact on ranking accuracy. We focus on text retrieval for\nquestion-answering tasks, a common use case for Retrieval-Augmented Generation\nsystems. Our evaluation benchmarks include models some of which are\ncommercially viable for industrial applications. We introduce a state-of-the-art ranking model, NV-RerankQA-Mistral-4B-v3,\nwhich achieves a significant accuracy increase of ~14% compared to pipelines\nwith other rerankers. We also provide an ablation study comparing the\nfine-tuning of ranking models with different sizes, losses and self-attention\nmechanisms. Finally, we discuss challenges of text retrieval pipelines with ranking\nmodels in real-world industry applications, in particular the trade-offs among\nmodel size, ranking accuracy and system requirements like indexing and serving\nlatency / throughput.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG\",\"authors\":\"Gabriel de Souza P. Moreira, Ronay Ak, Benedikt Schifferer, Mengyao Xu, Radek Osmulski, Even Oldridge\",\"doi\":\"arxiv-2409.07691\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Ranking models play a crucial role in enhancing overall accuracy of text\\nretrieval systems. These multi-stage systems typically utilize either dense\\nembedding models or sparse lexical indices to retrieve relevant passages based\\non a given query, followed by ranking models that refine the ordering of the\\ncandidate passages by its relevance to the query. This paper benchmarks various publicly available ranking models and examines\\ntheir impact on ranking accuracy. We focus on text retrieval for\\nquestion-answering tasks, a common use case for Retrieval-Augmented Generation\\nsystems. Our evaluation benchmarks include models some of which are\\ncommercially viable for industrial applications. We introduce a state-of-the-art ranking model, NV-RerankQA-Mistral-4B-v3,\\nwhich achieves a significant accuracy increase of ~14% compared to pipelines\\nwith other rerankers. We also provide an ablation study comparing the\\nfine-tuning of ranking models with different sizes, losses and self-attention\\nmechanisms. Finally, we discuss challenges of text retrieval pipelines with ranking\\nmodels in real-world industry applications, in particular the trade-offs among\\nmodel size, ranking accuracy and system requirements like indexing and serving\\nlatency / throughput.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07691\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07691","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

排序模型在提高文本检索系统的整体准确性方面起着至关重要的作用。这些多阶段系统通常利用密集嵌入模型或稀疏词性索引来检索基于给定查询的相关段落，然后利用排序模型根据其与查询的相关性来完善候选段落的排序。本文对各种公开可用的排序模型进行了基准测试，并检验了它们对排序准确性的影响。我们的重点是问题解答任务的文本检索，这是检索增强生成系统的常见用例。我们的评估基准包括一些在工业应用中具有商业可行性的模型。我们引入了最先进的排序模型 NV-RerankQA-Mistral-4B-v3，与使用其他排序器的管道相比，它的准确率显著提高了约 14%。我们还提供了一项消融研究，比较了不同规模、损失和自我关注机制的排序模型的微调。最后，我们讨论了在实际行业应用中使用排名模型的文本检索管道所面临的挑战，特别是在模型大小、排名准确性和系统要求（如索引和服务延迟/吞吐量）之间的权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG

Ranking models play a crucial role in enhancing overall accuracy of text retrieval systems. These multi-stage systems typically utilize either dense embedding models or sparse lexical indices to retrieve relevant passages based on a given query, followed by ranking models that refine the ordering of the candidate passages by its relevance to the query. This paper benchmarks various publicly available ranking models and examines their impact on ranking accuracy. We focus on text retrieval for question-answering tasks, a common use case for Retrieval-Augmented Generation systems. Our evaluation benchmarks include models some of which are commercially viable for industrial applications. We introduce a state-of-the-art ranking model, NV-RerankQA-Mistral-4B-v3, which achieves a significant accuracy increase of ~14% compared to pipelines with other rerankers. We also provide an ablation study comparing the fine-tuning of ranking models with different sizes, losses and self-attention mechanisms. Finally, we discuss challenges of text retrieval pipelines with ranking models in real-world industry applications, in particular the trade-offs among model size, ranking accuracy and system requirements like indexing and serving latency / throughput.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Information Retrieval

自引率

0.00%

发文量