学习对用户查询进行排序以检测搜索任务

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI:10.1145/2970398.2970407

C. Lucchese, F. M. Nardini, S. Orlando, Gabriele Tolomei

{"title":"学习对用户查询进行排序以检测搜索任务","authors":"C. Lucchese, F. M. Nardini, S. Orlando, Gabriele Tolomei","doi":"10.1145/2970398.2970407","DOIUrl":null,"url":null,"abstract":"We present a framework for discovering sets of web queries having similar latent needs, called search tasks, from user queries stored in a search engine log. The framework is made of two main modules: Query Similarity Learning (QSL) and Graph-based Query Clustering (GQC). The former is devoted to learning a query similarity function from a ground truth of manually-labeled search tasks. The latter represents each user search log as a graph whose nodes are queries, and uses the learned similarity function to weight edges between query pairs. Finally, search tasks are detected by clustering those queries in the graph which are connected by the strongest links, in fact by detecting the strongest connected components of the graph. To discriminate between \"strong\" and \"weak\" links also the GQC module entails a learning phase whose goal is to estimate the best threshold for pruning the edges of the graph. We discuss how the QSL module can be effectively implemented using Learning to Rank (L2R) techniques. Experiments on a real-world search engine log show that query similarity functions learned using L2R lead to better performing GQC implementations when compared to similarity functions induced by other state-of-the-art machine learning solutions, such as logistic regression and decision trees.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Learning to Rank User Queries to Detect Search Tasks\",\"authors\":\"C. Lucchese, F. M. Nardini, S. Orlando, Gabriele Tolomei\",\"doi\":\"10.1145/2970398.2970407\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a framework for discovering sets of web queries having similar latent needs, called search tasks, from user queries stored in a search engine log. The framework is made of two main modules: Query Similarity Learning (QSL) and Graph-based Query Clustering (GQC). The former is devoted to learning a query similarity function from a ground truth of manually-labeled search tasks. The latter represents each user search log as a graph whose nodes are queries, and uses the learned similarity function to weight edges between query pairs. Finally, search tasks are detected by clustering those queries in the graph which are connected by the strongest links, in fact by detecting the strongest connected components of the graph. To discriminate between \\\"strong\\\" and \\\"weak\\\" links also the GQC module entails a learning phase whose goal is to estimate the best threshold for pruning the edges of the graph. We discuss how the QSL module can be effectively implemented using Learning to Rank (L2R) techniques. Experiments on a real-world search engine log show that query similarity functions learned using L2R lead to better performing GQC implementations when compared to similarity functions induced by other state-of-the-art machine learning solutions, such as logistic regression and decision trees.\",\"PeriodicalId\":443715,\"journal\":{\"name\":\"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2970398.2970407\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2970398.2970407","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们提出了一个框架，用于从存储在搜索引擎日志中的用户查询中发现具有相似潜在需求的web查询集，称为搜索任务。该框架由两个主要模块组成:查询相似学习(QSL)和基于图的查询聚类(GQC)。前者致力于从人工标记的搜索任务的基本事实中学习查询相似度函数。后者将每个用户搜索日志表示为以查询为节点的图，并使用学习到的相似度函数对查询对之间的边进行加权。最后，通过聚类图中由最强链接连接的查询来检测搜索任务，实际上是通过检测图中最强连接的组件。为了区分“强”和“弱”链接，GQC模块还需要一个学习阶段，其目标是估计修剪图边的最佳阈值。我们讨论了如何使用学习排序(L2R)技术有效地实现QSL模块。在真实搜索引擎日志上的实验表明，与其他最先进的机器学习解决方案(如逻辑回归和决策树)诱导的相似函数相比，使用L2R学习的查询相似函数可以更好地实现GQC。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learning to Rank User Queries to Detect Search Tasks

We present a framework for discovering sets of web queries having similar latent needs, called search tasks, from user queries stored in a search engine log. The framework is made of two main modules: Query Similarity Learning (QSL) and Graph-based Query Clustering (GQC). The former is devoted to learning a query similarity function from a ground truth of manually-labeled search tasks. The latter represents each user search log as a graph whose nodes are queries, and uses the learned similarity function to weight edges between query pairs. Finally, search tasks are detected by clustering those queries in the graph which are connected by the strongest links, in fact by detecting the strongest connected components of the graph. To discriminate between "strong" and "weak" links also the GQC module entails a learning phase whose goal is to estimate the best threshold for pruning the edges of the graph. We discuss how the QSL module can be effectively implemented using Learning to Rank (L2R) techniques. Experiments on a real-world search engine log show that query similarity functions learned using L2R lead to better performing GQC implementations when compared to similarity functions induced by other state-of-the-art machine learning solutions, such as logistic regression and decision trees.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

自引率

0.00%

发文量

期刊最新文献

A Simple and Effective Approach to Score Standardisation Understanding the Message of Images with Knowledge Base Traversals A Topical Approach to Retrievability Bias Estimation Efficient and Effective Higher Order Proximity Modeling Cross-Language Microblog Retrieval using Latent Semantic Modeling