Foundations and Trends in Information Retrieval最新文献_第3页

Neural Approaches to Conversational AI 会话AI的神经方法

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2019-02-20 DOI: 10.1561/1500000074

Jianfeng Gao, Michel Galley, Lihong Li

The present paper surveys neural approaches to conversational AI that have been developed in the last few years. We group conversational systems into three categories: (1) question answering agents, (2) task-oriented dialogue agents, and (3) chatbots. For each category, we present a review of state-of-the-art neural approaches, draw the connection between them and traditional approaches, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies.

本文概述了在过去几年中发展起来的会话人工智能的神经方法。我们将会话系统分为三类:(1)问答代理，(2)面向任务的对话代理，(3)聊天机器人。对于每个类别，我们都介绍了最新的神经方法，绘制了它们与传统方法之间的联系，并讨论了已经取得的进展和仍然面临的挑战，使用特定的系统和模型作为案例研究。

引用次数: 0

Efficient Query Processing for Scalable Web Search 可扩展Web搜索的高效查询处理

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2018-12-23 DOI: 10.1561/1500000057

N. Tonellotto, C. Macdonald, I. Ounis

Search engines are exceptionally important tools for accessing information in today’s world. In satisfying the information needs of millions of users, the effectiveness (the quality of the search results) and the efficiency (the speed at which the results are returned to the users) of a search engine are two goals that form a natural trade-off, as techniques that improve the effectiveness of the search engine can also make it less efficient. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for the development of efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency. This survey comprehensively reviews the foundations of search engines, from index layouts to basic term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing strategies, while also providing the latest trends in the literature in efficient query processing, including the coherent and systematic reviews of techniques such as dynamic pruning and impact-sorted posting lists as well as their variants and optimisations. Our explanations of query processing strategies, for instance the WAND and BMW dynamic pruning algorithms, are presented with illustrative figures showing how the processing state changes as the algorithms progress. Moreover, acknowledging the recent trends in applying a cascading infrastructure within search systems, this survey describes techniques for efficiently integrating effective learned models, such as those obtained from learning-torank techniques. The survey also covers the selective application of query processing techniques, often achieved by predicting the response times of the search engine (known as query efficiency prediction), and making per-query tradeoffs between efficiency and effectiveness to ensure that the required retrieval speed targets can be met. Finally, the survey concludes with a summary of open directions in efficient search infrastructures, namely the use of signatures, real-time, energy-efficient and modern hardware & software architectures.

搜索引擎是当今世界获取信息的特别重要的工具。在满足数百万用户的信息需求时，搜索引擎的有效性(搜索结果的质量)和效率(将结果返回给用户的速度)是自然形成权衡的两个目标，因为提高搜索引擎有效性的技术也可能使其效率降低。与此同时，搜索引擎继续快速发展，索引更大，检索策略更复杂，查询量不断增长。因此，需要开发高效的查询处理基础设施，以适当牺牲有效性来获得效率方面的收益。本调查全面回顾了搜索引擎的基础，从索引布局到基本的一次术语(TAAT)和一次文档(DAAT)查询处理策略，同时也提供了有效查询处理方面的最新趋势，包括对动态修剪和影响排序发布列表等技术的连贯和系统的回顾，以及它们的变体和优化。我们对查询处理策略(例如WAND和BMW动态剪枝算法)的解释用插图说明了处理状态如何随着算法的进展而变化。此外，考虑到在搜索系统中应用级联基础设施的最新趋势，本调查描述了有效集成有效学习模型的技术，例如从学习-秩技术中获得的技术。该调查还涵盖了查询处理技术的选择性应用，通常通过预测搜索引擎的响应时间(称为查询效率预测)来实现，并在效率和有效性之间进行每个查询的权衡，以确保能够满足所需的检索速度目标。最后，调查总结了高效搜索基础设施的开放方向，即签名、实时、节能和现代硬件和软件架构的使用。

{"title":"Efficient Query Processing for Scalable Web Search","authors":"N. Tonellotto, C. Macdonald, I. Ounis","doi":"10.1561/1500000057","DOIUrl":"https://doi.org/10.1561/1500000057","url":null,"abstract":"Search engines are exceptionally important tools for accessing information in today’s world. In satisfying the information needs of millions of users, the effectiveness (the quality of the search results) and the efficiency (the speed at which the results are returned to the users) of a search engine are two goals that form a natural trade-off, as techniques that improve the effectiveness of the search engine can also make it less efficient. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for the development of efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency. This survey comprehensively reviews the foundations of search engines, from index layouts to basic term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing strategies, while also providing the latest trends in the literature in efficient query processing, including the coherent and systematic reviews of techniques such as dynamic pruning and impact-sorted posting lists as well as their variants and optimisations. Our explanations of query processing strategies, for instance the WAND and BMW dynamic pruning algorithms, are presented with illustrative figures showing how the processing state changes as the algorithms progress. Moreover, acknowledging the recent trends in applying a cascading infrastructure within search systems, this survey describes techniques for efficiently integrating effective learned models, such as those obtained from learning-torank techniques. The survey also covers the selective application of query processing techniques, often achieved by predicting the response times of the search engine (known as query efficiency prediction), and making per-query tradeoffs between efficiency and effectiveness to ensure that the required retrieval speed targets can be met. Finally, the survey concludes with a summary of open directions in efficient search infrastructures, namely the use of signatures, real-time, energy-efficient and modern hardware & software architectures.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"40 1","pages":"319-500"},"PeriodicalIF":10.4,"publicationDate":"2018-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84551326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

An Introduction to Neural Information Retrieval 神经信息检索导论

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2018-12-23 DOI: 10.1561/1500000061

Bhaskar Mitra, Nick Craswell

Neural models have been employed in many Information Retrieval scenarios, including ad-hoc retrieval, recommender systems, multi-media search, and even conversational systems that generate answers in response to natural language questions. An Introduction to Neural Information Retrieval provides a tutorial introduction to neural methods for ranking documents in response to a query, an important IR task. The monograph provides a complete picture of neural information retrieval techniques that culminate in supervised neural learning to rank models including deep neural network architectures that are trained end-to-end for ranking tasks. In reaching this point, the authors cover all the important topics, including the learning to rank framework and an overview of deep neural networks. This monograph provides an accessible, yet comprehensive, overview of the state-of-the-art of Neural Information Retrieval.

神经模型已经应用于许多信息检索场景，包括特别检索、推荐系统、多媒体搜索，甚至是对自然语言问题生成答案的会话系统。神经信息检索导论提供了一个教程，介绍了在响应查询时对文档进行排序的神经方法，这是一项重要的IR任务。该专著提供了神经信息检索技术的完整图片，最终在监督神经学习中对模型进行排名，包括对端到端进行排名任务训练的深度神经网络架构。在这一点上，作者涵盖了所有重要的主题，包括学习排名框架和深度神经网络的概述。这本专著提供了一个可访问的，但全面的，最先进的神经信息检索的概述。

引用次数: 300

Explainable Recommendation: A Survey and New Perspectives 可解释的建议:一项调查和新的观点

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2018-04-01 DOI: 10.1561/1500000066

Yongfeng Zhang, Xu Chen

Explainable recommendation attempts to develop models that generate not only high-quality recommendations but also intuitive explanations. The explanations may either be post-hoc or directly come from an explainable model (also called interpretable or transparent model in some contexts). Explainable recommendation tries to address the problem of why: by providing explanations to users or system designers, it helps humans to understand why certain items are recommended by the algorithm, where the human can either be users or system designers. Explainable recommendation helps to improve the transparency, persuasiveness, effectiveness, trustworthiness, and satisfaction of recommendation systems. It also facilitates system designers for better system debugging. In recent years, a large number of explainable recommendation approaches -- especially model-based methods -- have been proposed and applied in real-world systems. In this survey, we provide a comprehensive review for the explainable recommendation research. We first highlight the position of explainable recommendation in recommender system research by categorizing recommendation problems into the 5W, i.e., what, when, who, where, and why. We then conduct a comprehensive survey of explainable recommendation on three perspectives: 1) We provide a chronological research timeline of explainable recommendation. 2) We provide a two-dimensional taxonomy to classify existing explainable recommendation research. 3) We summarize how explainable recommendation applies to different recommendation tasks. We also devote a chapter to discuss the explanation perspectives in broader IR and AI/ML research. We end the survey by discussing potential future directions to promote the explainable recommendation research area and beyond.

可解释推荐试图开发的模型不仅能产生高质量的推荐，还能产生直观的解释。这些解释可以是事后的，也可以直接来自可解释模型(在某些上下文中也称为可解释模型或透明模型)。可解释推荐试图解决为什么的问题:通过向用户或系统设计人员提供解释，它帮助人们理解为什么某些项目被算法推荐，其中人类可以是用户或系统设计人员。可解释性推荐有助于提高推荐系统的透明度、说服力、有效性、可信度和满意度。它还有助于系统设计人员更好地进行系统调试。近年来，大量可解释的推荐方法——尤其是基于模型的方法——已经被提出并应用于实际系统中。在本调查中，我们对可解释推荐的研究进行了全面的综述。我们首先通过将推荐问题分为5W (what, when, who, where, why)来强调可解释推荐在推荐系统研究中的地位。然后，我们从三个角度对可解释性推荐进行了全面的调查:1)我们提供了一个按时间顺序排列的可解释性推荐研究时间表。2)我们提供了一个二维的分类法对现有的可解释推荐研究进行分类。3)总结了可解释推荐如何应用于不同的推荐任务。我们还专门用一章来讨论更广泛的IR和AI/ML研究中的解释视角。最后，我们讨论了可解释推荐研究领域的未来发展方向。

{"title":"Explainable Recommendation: A Survey and New Perspectives","authors":"Yongfeng Zhang, Xu Chen","doi":"10.1561/1500000066","DOIUrl":"https://doi.org/10.1561/1500000066","url":null,"abstract":"Explainable recommendation attempts to develop models that generate not only high-quality recommendations but also intuitive explanations. The explanations may either be post-hoc or directly come from an explainable model (also called interpretable or transparent model in some contexts). Explainable recommendation tries to address the problem of why: by providing explanations to users or system designers, it helps humans to understand why certain items are recommended by the algorithm, where the human can either be users or system designers. Explainable recommendation helps to improve the transparency, persuasiveness, effectiveness, trustworthiness, and satisfaction of recommendation systems. It also facilitates system designers for better system debugging. In recent years, a large number of explainable recommendation approaches -- especially model-based methods -- have been proposed and applied in real-world systems. \u0000In this survey, we provide a comprehensive review for the explainable recommendation research. We first highlight the position of explainable recommendation in recommender system research by categorizing recommendation problems into the 5W, i.e., what, when, who, where, and why. We then conduct a comprehensive survey of explainable recommendation on three perspectives: 1) We provide a chronological research timeline of explainable recommendation. 2) We provide a two-dimensional taxonomy to classify existing explainable recommendation research. 3) We summarize how explainable recommendation applies to different recommendation tasks. We also devote a chapter to discuss the explanation perspectives in broader IR and AI/ML research. We end the survey by discussing potential future directions to promote the explainable recommendation research area and beyond.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"18 1","pages":"1-101"},"PeriodicalIF":10.4,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87223946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 648

Geographic Information Retrieval: Progress and Challenges in Spatial Search of Text 地理信息检索:文本空间检索的进展与挑战

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2018-02-21 DOI: 10.1561/1500000034

R. Purves, Paul D. Clough, Christopher B. Jones, M. Hall, Vanessa Murdock

Significant amounts of information available today contain references to places on earth. Traditionally such information has been held as structured data and was the concern of Geographic Information Systems (GIS). However, increasing amounts of data in the form of unstructured text are available for indexing and retrieval that also contain spatial references. This monograph describes the field of Geographic Information Retrieval (GIR) that seeks to develop spatially-aware search systems and support user’s geographical information needs. Important concepts with respect to storing, querying and analysing geographical information in computers are introduced, before user needs and interaction in the context of GIR are explored. The task of associating documents with coordinates, prior to their indexing and ranking forms the core of any GIR system, and different approaches and their implications are discussed. Evaluating the resulting systems and their components, and different paradigms for doing so continue to be an important area of research in GIR and are illustrated through several examples. The monograph provides an overview of the research field, and in so doing identifies key remaining research challenges in GIR.

今天可获得的大量信息都涉及到地球上的地点。传统上，这类信息被视为结构化数据，是地理信息系统(GIS)关注的问题。然而，越来越多的非结构化文本形式的数据可用于索引和检索，其中也包含空间引用。这本专著描述了地理信息检索(GIR)领域，旨在开发空间感知搜索系统并支持用户的地理信息需求。介绍了在计算机中存储、查询和分析地理信息的重要概念，然后探索了GIR背景下的用户需求和交互。在索引和排序之前，将文档与坐标联系起来的任务构成了任何GIR系统的核心，并讨论了不同的方法及其含义。评估所产生的系统及其组成部分，以及这样做的不同范例仍然是GIR研究的一个重要领域，并通过几个例子加以说明。该专著提供了研究领域的概述，并以此确定了GIR中关键的剩余研究挑战。

{"title":"Geographic Information Retrieval: Progress and Challenges in Spatial Search of Text","authors":"R. Purves, Paul D. Clough, Christopher B. Jones, M. Hall, Vanessa Murdock","doi":"10.1561/1500000034","DOIUrl":"https://doi.org/10.1561/1500000034","url":null,"abstract":"Significant amounts of information available today contain references to places on earth. Traditionally such information has been held as structured data and was the concern of Geographic Information Systems (GIS). However, increasing amounts of data in the form of unstructured text are available for indexing and retrieval that also contain spatial references. This monograph describes the field of Geographic Information Retrieval (GIR) that seeks to develop spatially-aware search systems and support user’s geographical information needs. Important concepts with respect to storing, querying and analysing geographical information in computers are introduced, before user needs and interaction in the context of GIR are explored. The task of associating documents with coordinates, prior to their indexing and ranking forms the core of any GIR system, and different approaches and their implications are discussed. Evaluating the resulting systems and their components, and different paradigms for doing so continue to be an important area of research in GIR and are illustrated through several examples. The monograph provides an overview of the research field, and in so doing identifies key remaining research challenges in GIR.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"10 1","pages":"164-318"},"PeriodicalIF":10.4,"publicationDate":"2018-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88432922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 77

Web Forum Retrieval and Text Analytics: A Survey 网络论坛检索和文本分析:一项调查

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2018-01-02 DOI: 10.1561/1500000062

D. Hoogeveen, Li Wang, Timothy Baldwin, Karin M. Verspoor

This survey presents an overview of information retrieval, natural languageprocessing and machine learning research that makes use of forumdata, including both discussion forums and community questionansweringcQA archives. The focus is on automated analysis, withthe goal of gaining a better understanding of the data and its users.We discuss the different strategies used for both retrieval taskspost retrieval, question retrieval, and answer retrieval and classificationtasks post type classification, question classification, post qualityassessment, subjectivity, and viewpoint classification at the postlevel, as well as at the thread level thread retrieval, solvedness andtask orientation, discourse structure recovery and dialogue act tagging,QA-pair extraction, and thread summarisation. We also review workon forum users, including user satisfaction, expert finding, questionrecommendation and routing, and community analysis.The survey includes a brief history of forums, an overview of thedifferent kinds of forums, a summary of publicly available datasets forforum research, and a short discussion on the evaluation of retrievaltasks using forum data.The aim is to give a broad overview of the different kinds of forumresearch, a summary of the methods that have been applied, some insightsinto successful strategies, and potential areas for future research.

本调查概述了利用论坛数据的信息检索、自然语言处理和机器学习研究，包括讨论论坛和社区问答cqa档案。重点是自动化分析，目标是更好地理解数据及其用户。我们讨论了用于检索任务(现场检索、问题检索、答案检索和分类)的不同策略，包括后级的帖子类型分类、问题分类、帖子质量评估、主观性和观点分类，以及线程级的线程检索、可解性和任务定向、话语结构恢复和对话行为标记、问答对提取和线程摘要。我们还审查工作论坛用户，包括用户满意度，专家发现，问题推荐和路由，以及社区分析。该调查包括论坛的简史、不同类型论坛的概述、论坛研究的公开可用数据集的摘要，以及关于使用论坛数据评估检索任务的简短讨论。目的是对不同类型的论坛研究进行广泛的概述，总结已经应用的方法，对成功策略的一些见解，以及未来研究的潜在领域。

{"title":"Web Forum Retrieval and Text Analytics: A Survey","authors":"D. Hoogeveen, Li Wang, Timothy Baldwin, Karin M. Verspoor","doi":"10.1561/1500000062","DOIUrl":"https://doi.org/10.1561/1500000062","url":null,"abstract":"This survey presents an overview of information retrieval, natural languageprocessing and machine learning research that makes use of forumdata, including both discussion forums and community questionansweringcQA archives. The focus is on automated analysis, withthe goal of gaining a better understanding of the data and its users.We discuss the different strategies used for both retrieval taskspost retrieval, question retrieval, and answer retrieval and classificationtasks post type classification, question classification, post qualityassessment, subjectivity, and viewpoint classification at the postlevel, as well as at the thread level thread retrieval, solvedness andtask orientation, discourse structure recovery and dialogue act tagging,QA-pair extraction, and thread summarisation. We also review workon forum users, including user satisfaction, expert finding, questionrecommendation and routing, and community analysis.The survey includes a brief history of forums, an overview of thedifferent kinds of forums, a summary of publicly available datasets forforum research, and a short discussion on the evaluation of retrievaltasks using forum data.The aim is to give a broad overview of the different kinds of forumresearch, a summary of the methods that have been applied, some insightsinto successful strategies, and potential areas for future research.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"54 1","pages":"1-163"},"PeriodicalIF":10.4,"publicationDate":"2018-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79217784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Applications of Topic Models 主题模型的应用

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2017-07-13 DOI: 10.1561/1500000030

Jordan L. Boyd-Graber, Yuening Hu, David Mimno

How can a single person understand what’s going on in a collection of millions of documents? This is an increasingly widespread problem: sifting through an organization’s e-mails, understanding a decade worth of newspapers, or characterizing a scientific field’s research. This monograph explores the ways that humans and computers make sense of document collections through tools called topic models. Topic models are a statistical framework that help users understand large document collections; not just to find individual documents but to understand the general themes present in the collection. Applications of Topic Models describes the recent academic and industrial applications of topic models. In addition to topic models’ effective application to traditional problems like information retrieval, visualization, statistical inference, multilingual modeling, and linguistic understanding, Applications of Topic Models also reviews topic models’ ability to unlock large text collections for qualitative analysis. It reviews their successful use by researchers to help understand fiction, non-fiction, scientific publications, and political texts. Applications of Topic Models is aimed at the reader with some knowledge of document processing, basic understanding of some probability, and interested in many application domains. It discusses the information needs of each application area, and how those specific needs affect models, curation procedures, and interpretations. By the end of the monograph, it is hoped that readers will be excited enough to attempt to embark on building their own topic models. It should also be of interest to topic model experts as the coverage of diverse applications may expose models and approaches they had not seen before.

一个人怎么能理解数百万份文件的集合中发生的事情呢?这是一个越来越普遍的问题:筛选一个组织的电子邮件，了解十年来的报纸，或者描述一个科学领域的研究。这本专著探讨了人类和计算机通过称为主题模型的工具来理解文档集合的方式。主题模型是一个帮助用户理解大型文档集合的统计框架;不仅要找到单个文档，还要了解集合中呈现的总体主题。主题模型的应用描述了主题模型最近在学术和工业上的应用。除了主题模型在信息检索、可视化、统计推断、多语言建模和语言理解等传统问题上的有效应用之外，《主题模型的应用》还回顾了主题模型解锁大型文本集进行定性分析的能力。它回顾了研究人员成功地使用它们来帮助理解小说、非小说、科学出版物和政治文本。《主题模型的应用》的目标读者是对文档处理有一定的了解，对概率有基本的了解，并对许多应用领域感兴趣的读者。它讨论了每个应用程序领域的信息需求，以及这些特定需求如何影响模型、管理过程和解释。在本专著的最后，希望读者能够兴奋地尝试着手建立自己的主题模型。主题模型专家也应该对此感兴趣，因为不同应用程序的覆盖范围可能会暴露他们以前从未见过的模型和方法。

{"title":"Applications of Topic Models","authors":"Jordan L. Boyd-Graber, Yuening Hu, David Mimno","doi":"10.1561/1500000030","DOIUrl":"https://doi.org/10.1561/1500000030","url":null,"abstract":"How can a single person understand what’s going on in a collection of millions of documents? This is an increasingly widespread problem: sifting through an organization’s e-mails, understanding a decade worth of newspapers, or characterizing a scientific field’s research. This monograph explores the ways that humans and computers make sense of document collections through tools called topic models. Topic models are a statistical framework that help users understand large document collections; not just to find individual documents but to understand the general themes present in the collection. Applications of Topic Models describes the recent academic and industrial applications of topic models. In addition to topic models’ effective application to traditional problems like information retrieval, visualization, statistical inference, multilingual modeling, and linguistic understanding, Applications of Topic Models also reviews topic models’ ability to unlock large text collections for qualitative analysis. It reviews their successful use by researchers to help understand fiction, non-fiction, scientific publications, and political texts. Applications of Topic Models is aimed at the reader with some knowledge of document processing, basic understanding of some probability, and interested in many application domains. It discusses the information needs of each application area, and how those specific needs affect models, curation procedures, and interpretations. By the end of the monograph, it is hoped that readers will be excited enough to attempt to embark on building their own topic models. It should also be of interest to topic model experts as the coverage of diverse applications may expose models and approaches they had not seen before.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"18 1","pages":"143-296"},"PeriodicalIF":10.4,"publicationDate":"2017-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81818542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 198

Searching the Enterprise 搜索企业

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2017-07-12 DOI: 10.1561/1500000053

Udo Kruschwitz, Charlie Hull

Search has become ubiquitous but that does not mean that search has been solved. Enterprise search, which is broadly speaking the use of information retrieval technology to find information within organisations, is a good example to illustrate this. It is an area that is of huge importance for businesses, yet has attracted relatively little academic interest. This monograph will explore the main issues involved in enterprise search both from a research as well as a practical point of view. We will first plot the landscape of enterprise search and its links to related areas. This will allow us to identify key features before we survey the field in more detail. Throughout the monograph we will discuss the topic as part of the wider information retrieval research field, and we use Web search as a common reference point as this is likely the search application area that the average reader is most familiar with. U. Kruschwitz and C. Hull. Searching the Enterprise. Foundations and Trends © in Information Retrieval, vol. 11, no. 1, pp. 1–142, 2017. DOI: 10.1561/1500000053. Full text available at: http://dx.doi.org/10.1561/1500000053

搜索已经无处不在，但这并不意味着搜索已经被解决了。企业搜索是一个很好的例子，它广义上是使用信息检索技术来查找组织内的信息。这是一个对企业非常重要的领域，但学术界对它的兴趣相对较少。本专著将探讨企业搜索涉及的主要问题，从研究和实践的角度来看。我们将首先绘制企业搜索的景观及其与相关领域的链接。这将允许我们在更详细地调查该领域之前确定关键特征。在整个专著中，我们将把这个主题作为更广泛的信息检索研究领域的一部分来讨论，我们使用Web搜索作为一个共同的参考点，因为这可能是普通读者最熟悉的搜索应用领域。克鲁什维茨和赫尔。搜索进取号。基础与趋势©信息检索，第11卷，第11期。1, pp. 1 - 142, 2017。DOI: 10.1561 / 1500000053。全文可在:http://dx.doi.org/10.1561/1500000053

引用次数: 29

Aggregated Search 聚合搜索

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2017-03-06 DOI: 10.1561/1500000052

Jaime Arguello

The goal of aggregated search is to provide integrated search across multiple heterogeneous search services in a unified interfacea single query box and a common presentation of results. In the web search domain, aggregated search systems are responsible for integrating results from specialized search services, or verticals, alongside the core web results. For example, search portals such as Google, Bing, and Yahoo! provide access to vertical search engines that focus on different types of media (images and video), different types of search tasks (search for local businesses and online products), and even applications that can help users complete certain tasks (language translation and math calculations). This monograph provides a comprehensive summary of previous research in aggregated search. It starts by describing why aggregated search requires unique solutions. It then discusses different sources of evidence that are likely to be available to an aggregated search system, as well as different techniques for integrating evidence in order to make vertical selection and presentation decisions. Next, it surveys different evaluation methodologies for aggregated search and discusses prior user studies that have aimed to better understand how users behave with aggregated search interfaces. It proceeds to review different advanced topics in aggregated search. It concludes by highlighting the main trends and discussing short-term and long-term areas for future work.

聚合搜索的目标是在一个统一的接口中提供跨多个异构搜索服务的集成搜索——一个查询框和结果的通用表示。在网络搜索领域，聚合搜索系统负责整合来自专业搜索服务或垂直领域的结果，以及核心网络结果。例如，搜索门户如Google、Bing和Yahoo!提供对垂直搜索引擎的访问，这些垂直搜索引擎专注于不同类型的媒体(图像和视频)、不同类型的搜索任务(搜索本地企业和在线产品)，甚至可以帮助用户完成某些任务(语言翻译和数学计算)的应用程序。这个专著提供了一个全面的总结，在聚合搜索以前的研究。本文首先描述了为什么聚合搜索需要独特的解决方案。然后讨论了可能用于聚合搜索系统的不同证据来源，以及整合证据的不同技术，以便做出垂直选择和呈现决策。接下来，它调查了聚合搜索的不同评估方法，并讨论了先前的用户研究，这些研究旨在更好地理解用户如何使用聚合搜索界面。接着回顾聚合搜索中不同的高级主题。报告最后强调了主要趋势，并讨论了今后工作的短期和长期领域。

{"title":"Aggregated Search","authors":"Jaime Arguello","doi":"10.1561/1500000052","DOIUrl":"https://doi.org/10.1561/1500000052","url":null,"abstract":"The goal of aggregated search is to provide integrated search across multiple heterogeneous search services in a unified interfacea single query box and a common presentation of results. In the web search domain, aggregated search systems are responsible for integrating results from specialized search services, or verticals, alongside the core web results. For example, search portals such as Google, Bing, and Yahoo! provide access to vertical search engines that focus on different types of media (images and video), different types of search tasks (search for local businesses and online products), and even applications that can help users complete certain tasks (language translation and math calculations). This monograph provides a comprehensive summary of previous research in aggregated search. It starts by describing why aggregated search requires unique solutions. It then discusses different sources of evidence that are likely to be available to an aggregated search system, as well as different techniques for integrating evidence in order to make vertical selection and presentation decisions. Next, it surveys different evaluation methodologies for aggregated search and discusses prior user studies that have aimed to better understand how users behave with aggregated search interfaces. It proceeds to review different advanced topics in aggregated search. It concludes by highlighting the main trends and discussing short-term and long-term areas for future work.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"20 1","pages":"365-502"},"PeriodicalIF":10.4,"publicationDate":"2017-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81758037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

A Survey of Query Auto Completion in Information Retrieval 信息检索中查询自动补全的研究

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2016-09-13 DOI: 10.1561/1500000055

Fei Cai, M. de Rijke

In information retrieval, query auto completion (QAC), also known as type-ahead and auto-complete suggestion, refers to the following functionality: given a prex consisting of a number of characters entered into a search box, the user interface proposes alternative ways of extending the prex to a full query. QAC helps users to formulate their query when they have an intent in mind but not a clear way of expressing this in a query. It helps to avoid possible spelling mistakes, especially on devices with small screens. It saves keystrokes and cuts down the search duration of users which implies a lower load on the search engine, and results in savings in machine resources and maintenance. Because of the clear benets of QAC, a considerable number of algorithmic approaches to QAC have been proposed in the past few years. Query logs have proven to be a key asset underlying most of the recent research. This monograph surveys this research. It focuses on summarizing the literature on QAC and provides a general understanding of the wealth of QAC approaches that are currently available. A Survey of Query Auto Completion in Information Retrieval is an ideal reference on the topic. Its contributions can be summarized as follows: It provides researchers who are working on query auto completion or related problems in the eld of information retrieval with a good overview and analysis of state-of-the-art QAC approaches. In particular, for researchers new to the eld, the survey can serve as an introduction to the state-of-the-art. It also offers a comprehensive perspective on QAC approaches by presenting a taxonomy of existing solutions. In addition, it presents solutions for QAC under different conditions such as available high-resolution query logs, in-depth user interactions with QAC using eye-tracking, and elaborate user engagements in a QAC process. It also discusses practical issues related to QAC. Lastly, it presents a detailed discussion of core challenges and promising open directions in QAC.

在信息检索中，查询自动完成(QAC)，也称为提前输入和自动完成建议，指的是以下功能:给定一个由多个字符组成的前缀，输入到搜索框中，用户界面提出将该前缀扩展为完整查询的替代方法。当用户心中有一个意图，但在查询中没有明确的表达方式时，QAC可以帮助他们制定查询。这有助于避免可能的拼写错误，尤其是在小屏幕设备上。它节省了用户的击键次数，缩短了用户的搜索时间，从而降低了搜索引擎的负载，从而节省了机器资源和维护费用。由于QAC的明显好处，在过去几年中，已经提出了相当多的QAC算法方法。查询日志已被证明是大多数最新研究的关键资产。这本专著概述了这项研究。它着重于总结关于QAC的文献，并提供对当前可用的丰富的QAC方法的一般理解。《信息检索中的查询自动补全研究》是研究这一课题的理想参考。它的贡献可以概括如下:它为在信息检索领域从事查询自动完成或相关问题的研究人员提供了对最先进的QAC方法的良好概述和分析。特别是，对于新进入该领域的研究人员来说，该调查可以作为最新技术的介绍。通过对现有解决方案进行分类，本文还提供了对QAC方法的全面了解。此外，它还提供了不同条件下的QAC解决方案，例如可用的高分辨率查询日志、使用眼动跟踪与QAC进行深入的用户交互以及在QAC过程中详细的用户参与。并讨论了与质量保证有关的实际问题。最后，详细讨论了QAC的核心挑战和有希望的开放方向。

{"title":"A Survey of Query Auto Completion in Information Retrieval","authors":"Fei Cai, M. de Rijke","doi":"10.1561/1500000055","DOIUrl":"https://doi.org/10.1561/1500000055","url":null,"abstract":"In information retrieval, query auto completion (QAC), also known as type-ahead and auto-complete suggestion, refers to the following functionality: given a prex consisting of a number of characters entered into a search box, the user interface proposes alternative ways of extending the prex to a full query. QAC helps users to formulate their query when they have an intent in mind but not a clear way of expressing this in a query. It helps to avoid possible spelling mistakes, especially on devices with small screens. It saves keystrokes and cuts down the search duration of users which implies a lower load on the search engine, and results in savings in machine resources and maintenance. Because of the clear benets of QAC, a considerable number of algorithmic approaches to QAC have been proposed in the past few years. Query logs have proven to be a key asset underlying most of the recent research. This monograph surveys this research. It focuses on summarizing the literature on QAC and provides a general understanding of the wealth of QAC approaches that are currently available. A Survey of Query Auto Completion in Information Retrieval is an ideal reference on the topic. Its contributions can be summarized as follows: It provides researchers who are working on query auto completion or related problems in the eld of information retrieval with a good overview and analysis of state-of-the-art QAC approaches. In particular, for researchers new to the eld, the survey can serve as an introduction to the state-of-the-art. It also offers a comprehensive perspective on QAC approaches by presenting a taxonomy of existing solutions. In addition, it presents solutions for QAC under different conditions such as available high-resolution query logs, in-depth user interactions with QAC using eye-tracking, and elaborate user engagements in a QAC process. It also discusses practical issues related to QAC. Lastly, it presents a detailed discussion of core challenges and promising open directions in QAC.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"17 1","pages":"273-363"},"PeriodicalIF":10.4,"publicationDate":"2016-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77250135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 152