Learning a Better Negative Sampling Policy with Deep Neural Networks for Search

Daniel Cohen, Scott M. Jordan, W. Bruce Croft
{"title":"Learning a Better Negative Sampling Policy with Deep Neural Networks for Search","authors":"Daniel Cohen, Scott M. Jordan, W. Bruce Croft","doi":"10.1145/3341981.3344220","DOIUrl":null,"url":null,"abstract":"In information retrieval, sampling methods used to select documents for neural models must often deal with large class imbalances during training. This issue necessitates careful selection of negative instances when training neural models to avoid the risk of overfitting. For most work, heuristic sampling approaches, or policies, are created based off of domain experts, such as choosing samples with high BM25 scores or a random process over candidate documents. However, these sampling approaches are done with the test distribution in mind. In this paper, we demonstrate that the method chosen to sample negative documents during training plays a critical role in both the stability of training, as well as overall performance. Furthermore, we establish that using reinforcement learning to optimize a policy over a set of sampling functions can significantly improve performance over standard training practices with respect to IR metrics and is robust to hyperparameters and random seeds.","PeriodicalId":173154,"journal":{"name":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","volume":"176 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3341981.3344220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

In information retrieval, sampling methods used to select documents for neural models must often deal with large class imbalances during training. This issue necessitates careful selection of negative instances when training neural models to avoid the risk of overfitting. For most work, heuristic sampling approaches, or policies, are created based off of domain experts, such as choosing samples with high BM25 scores or a random process over candidate documents. However, these sampling approaches are done with the test distribution in mind. In this paper, we demonstrate that the method chosen to sample negative documents during training plays a critical role in both the stability of training, as well as overall performance. Furthermore, we establish that using reinforcement learning to optimize a policy over a set of sampling functions can significantly improve performance over standard training practices with respect to IR metrics and is robust to hyperparameters and random seeds.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用深度神经网络学习一种更好的搜索负抽样策略
在信息检索中,用于选择神经模型文档的抽样方法必须在训练过程中经常处理大的类不平衡。这个问题需要在训练神经模型时仔细选择负面实例,以避免过度拟合的风险。对于大多数工作,启发式抽样方法或策略是基于领域专家创建的,例如选择具有高BM25分数的样本或对候选文档进行随机处理。然而,这些抽样方法是在考虑测试分布的情况下完成的。在本文中,我们证明了在训练过程中选择的负面文件抽样方法对训练的稳定性和整体性能都起着至关重要的作用。此外,我们建立了使用强化学习在一组采样函数上优化策略可以显着提高相对于IR指标的标准训练实践的性能,并且对超参数和随机种子具有鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Tangent-CFT: An Embedding Model for Mathematical Formulas Statistical Significance Testing in Theory and in Practice Utilizing Passages in Fusion-based Document Retrieval Learning a Better Negative Sampling Policy with Deep Neural Networks for Search A Study of Query Performance Prediction for Answer Quality Determination
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1