CAGS: Context-Aware Document Ranking With Contrastive Graph Sampling

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-11-05 DOI:10.1109/TKDE.2024.3491996

Zhaoheng Huang;Yutao Zhu;Zhicheng Dou;Ji-Rong Wen

{"title":"CAGS: Context-Aware Document Ranking With Contrastive Graph Sampling","authors":"Zhaoheng Huang;Yutao Zhu;Zhicheng Dou;Ji-Rong Wen","doi":"10.1109/TKDE.2024.3491996","DOIUrl":null,"url":null,"abstract":"In search sessions, a series of interactions in the context has been proven to be advantageous in capturing users’ search intents. Existing studies show that designing pre-training tasks and data augmentation strategies for session search improves the robustness and generalizability of the model. However, such data augmentation strategies only focus on changing the original session structure to learn a better representation. Ignoring information from outside the session, users’ diverse and complex intents cannot be learned well by simply reordering and deleting historical behaviors, proving that such strategies are limited and inadequate. In order to solve the problem of insufficient modeling under complex user intents, we propose exploiting information outside the original session. More specifically, in this paper, we sample queries and documents from the global click-on and follow-up session graph, alter an original session with these samples, and construct a new session that shares a similar user intent with the original one. Specifically, we design four data augmentation strategies based on session graphs in view of both one-hop and multi-hop structures to sample intent-associated query/document nodes. Experiments conducted on three large-scale public datasets demonstrate that our model outperforms the existing ad-hoc and context-aware document ranking models.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"89-101"},"PeriodicalIF":8.9000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10742917/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In search sessions, a series of interactions in the context has been proven to be advantageous in capturing users’ search intents. Existing studies show that designing pre-training tasks and data augmentation strategies for session search improves the robustness and generalizability of the model. However, such data augmentation strategies only focus on changing the original session structure to learn a better representation. Ignoring information from outside the session, users’ diverse and complex intents cannot be learned well by simply reordering and deleting historical behaviors, proving that such strategies are limited and inadequate. In order to solve the problem of insufficient modeling under complex user intents, we propose exploiting information outside the original session. More specifically, in this paper, we sample queries and documents from the global click-on and follow-up session graph, alter an original session with these samples, and construct a new session that shares a similar user intent with the original one. Specifically, we design four data augmentation strategies based on session graphs in view of both one-hop and multi-hop structures to sample intent-associated query/document nodes. Experiments conducted on three large-scale public datasets demonstrate that our model outperforms the existing ad-hoc and context-aware document ranking models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CAGS：利用对比图采样进行上下文感知文档排序

在搜索会话中，上下文中的一系列交互已被证明对捕获用户的搜索意图是有利的。已有研究表明，为会话搜索设计预训练任务和数据增强策略可以提高模型的鲁棒性和泛化性。然而，这种数据增强策略只关注于改变原始会话结构来学习更好的表示。忽略会话外部的信息，简单地对历史行为进行重新排序和删除，并不能很好地了解用户多样而复杂的意图，证明了这种策略的局限性和不足。为了解决复杂用户意图下建模不足的问题，我们提出利用原始会话之外的信息。更具体地说，在本文中，我们从全局点击和后续会话图中采样查询和文档，用这些样本修改原始会话，并构建一个与原始会话共享类似用户意图的新会话。具体来说，我们针对一跳和多跳结构设计了四种基于会话图的数据增强策略，以采样意图相关的查询/文档节点。在三个大规模公共数据集上进行的实验表明，我们的模型优于现有的ad-hoc和上下文感知文档排名模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.

期刊最新文献

2024 Reviewers List Web-FTP: A Feature Transferring-Based Pre-Trained Model for Web Attack Detection Network-to-Network: Self-Supervised Network Representation Learning via Position Prediction AEGK: Aligned Entropic Graph Kernels Through Continuous-Time Quantum Walks Contextual Inference From Sparse Shopping Transactions Based on Motif Patterns