我做得怎么样?:离线评估会话搜索系统

ACM Transactions on Information Systems (TOIS) Pub Date : 2021-08-17 DOI:10.1145/3451160

Aldo Lipani, Ben Carterette, Emine Yilmaz

{"title":"我做得怎么样?:离线评估会话搜索系统","authors":"Aldo Lipani, Ben Carterette, Emine Yilmaz","doi":"10.1145/3451160","DOIUrl":null,"url":null,"abstract":"As conversational agents like Siri and Alexa gain in popularity and use, conversation is becoming a more and more important mode of interaction for search. Conversational search shares some features with traditional search, but differs in some important respects: conversational search systems are less likely to return ranked lists of results (a SERP), more likely to involve iterated interactions, and more likely to feature longer, well-formed user queries in the form of natural language questions. Because of these differences, traditional methods for search evaluation (such as the Cranfield paradigm) do not translate easily to conversational search. In this work, we propose a framework for offline evaluation of conversational search, which includes a methodology for creating test collections with relevance judgments, an evaluation measure based on a user interaction model, and an approach to collecting user interaction data to train the model. The framework is based on the idea of “subtopics”, often used to model novelty and diversity in search and recommendation, and the user model is similar to the geometric browsing model introduced by RBP and used in ERR. As far as we know, this is the first work to combine these ideas into a comprehensive framework for offline evaluation of conversational search.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"10 1","pages":"1 - 22"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"How Am I Doing?: Evaluating Conversational Search Systems Offline\",\"authors\":\"Aldo Lipani, Ben Carterette, Emine Yilmaz\",\"doi\":\"10.1145/3451160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As conversational agents like Siri and Alexa gain in popularity and use, conversation is becoming a more and more important mode of interaction for search. Conversational search shares some features with traditional search, but differs in some important respects: conversational search systems are less likely to return ranked lists of results (a SERP), more likely to involve iterated interactions, and more likely to feature longer, well-formed user queries in the form of natural language questions. Because of these differences, traditional methods for search evaluation (such as the Cranfield paradigm) do not translate easily to conversational search. In this work, we propose a framework for offline evaluation of conversational search, which includes a methodology for creating test collections with relevance judgments, an evaluation measure based on a user interaction model, and an approach to collecting user interaction data to train the model. The framework is based on the idea of “subtopics”, often used to model novelty and diversity in search and recommendation, and the user model is similar to the geometric browsing model introduced by RBP and used in ERR. As far as we know, this is the first work to combine these ideas into a comprehensive framework for offline evaluation of conversational search.\",\"PeriodicalId\":6934,\"journal\":{\"name\":\"ACM Transactions on Information Systems (TOIS)\",\"volume\":\"10 1\",\"pages\":\"1 - 22\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Information Systems (TOIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3451160\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems (TOIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3451160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

摘要

随着Siri和Alexa等对话代理的普及和使用，对话正成为搜索中越来越重要的交互模式。会话搜索与传统搜索共享一些特性，但在一些重要方面有所不同:会话搜索系统不太可能返回排序结果列表(SERP)，更可能涉及迭代交互，并且更可能以自然语言问题的形式提供较长、格式良好的用户查询。由于这些差异，传统的搜索评估方法(如克兰菲尔德范式)不容易转化为会话搜索。在这项工作中，我们提出了一个离线评估会话搜索的框架，其中包括一种创建具有相关性判断的测试集的方法，一种基于用户交互模型的评估方法，以及一种收集用户交互数据以训练模型的方法。该框架基于“子主题”的思想，通常用于对搜索和推荐中的新颖性和多样性进行建模，用户模型类似于RBP引入并用于ERR的几何浏览模型。据我们所知，这是第一个将这些想法结合成一个全面的框架来评估会话搜索的工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

How Am I Doing?: Evaluating Conversational Search Systems Offline

As conversational agents like Siri and Alexa gain in popularity and use, conversation is becoming a more and more important mode of interaction for search. Conversational search shares some features with traditional search, but differs in some important respects: conversational search systems are less likely to return ranked lists of results (a SERP), more likely to involve iterated interactions, and more likely to feature longer, well-formed user queries in the form of natural language questions. Because of these differences, traditional methods for search evaluation (such as the Cranfield paradigm) do not translate easily to conversational search. In this work, we propose a framework for offline evaluation of conversational search, which includes a methodology for creating test collections with relevance judgments, an evaluation measure based on a user interaction model, and an approach to collecting user interaction data to train the model. The framework is based on the idea of “subtopics”, often used to model novelty and diversity in search and recommendation, and the user model is similar to the geometric browsing model introduced by RBP and used in ERR. As far as we know, this is the first work to combine these ideas into a comprehensive framework for offline evaluation of conversational search.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Information Systems (TOIS)

自引率

0.00%

发文量