Foundations and Trends in Information Retrieval最新文献

英文中文

Searching the Enterprise 搜索企业

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2017-07-12 DOI: 10.1561/1500000053

Udo Kruschwitz, Charlie Hull

Search has become ubiquitous but that does not mean that search has been solved. Enterprise search, which is broadly speaking the use of information retrieval technology to find information within organisations, is a good example to illustrate this. It is an area that is of huge importance for businesses, yet has attracted relatively little academic interest. This monograph will explore the main issues involved in enterprise search both from a research as well as a practical point of view. We will first plot the landscape of enterprise search and its links to related areas. This will allow us to identify key features before we survey the field in more detail. Throughout the monograph we will discuss the topic as part of the wider information retrieval research field, and we use Web search as a common reference point as this is likely the search application area that the average reader is most familiar with. U. Kruschwitz and C. Hull. Searching the Enterprise. Foundations and Trends © in Information Retrieval, vol. 11, no. 1, pp. 1–142, 2017. DOI: 10.1561/1500000053. Full text available at: http://dx.doi.org/10.1561/1500000053

搜索已经无处不在，但这并不意味着搜索已经被解决了。企业搜索是一个很好的例子，它广义上是使用信息检索技术来查找组织内的信息。这是一个对企业非常重要的领域，但学术界对它的兴趣相对较少。本专著将探讨企业搜索涉及的主要问题，从研究和实践的角度来看。我们将首先绘制企业搜索的景观及其与相关领域的链接。这将允许我们在更详细地调查该领域之前确定关键特征。在整个专著中，我们将把这个主题作为更广泛的信息检索研究领域的一部分来讨论，我们使用Web搜索作为一个共同的参考点，因为这可能是普通读者最熟悉的搜索应用领域。克鲁什维茨和赫尔。搜索进取号。基础与趋势©信息检索，第11卷，第11期。1, pp. 1 - 142, 2017。DOI: 10.1561 / 1500000053。全文可在:http://dx.doi.org/10.1561/1500000053

引用次数: 29

Aggregated Search 聚合搜索

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2017-03-06 DOI: 10.1561/1500000052

Jaime Arguello

The goal of aggregated search is to provide integrated search across multiple heterogeneous search services in a unified interfacea single query box and a common presentation of results. In the web search domain, aggregated search systems are responsible for integrating results from specialized search services, or verticals, alongside the core web results. For example, search portals such as Google, Bing, and Yahoo! provide access to vertical search engines that focus on different types of media (images and video), different types of search tasks (search for local businesses and online products), and even applications that can help users complete certain tasks (language translation and math calculations). This monograph provides a comprehensive summary of previous research in aggregated search. It starts by describing why aggregated search requires unique solutions. It then discusses different sources of evidence that are likely to be available to an aggregated search system, as well as different techniques for integrating evidence in order to make vertical selection and presentation decisions. Next, it surveys different evaluation methodologies for aggregated search and discusses prior user studies that have aimed to better understand how users behave with aggregated search interfaces. It proceeds to review different advanced topics in aggregated search. It concludes by highlighting the main trends and discussing short-term and long-term areas for future work.

聚合搜索的目标是在一个统一的接口中提供跨多个异构搜索服务的集成搜索——一个查询框和结果的通用表示。在网络搜索领域，聚合搜索系统负责整合来自专业搜索服务或垂直领域的结果，以及核心网络结果。例如，搜索门户如Google、Bing和Yahoo!提供对垂直搜索引擎的访问，这些垂直搜索引擎专注于不同类型的媒体(图像和视频)、不同类型的搜索任务(搜索本地企业和在线产品)，甚至可以帮助用户完成某些任务(语言翻译和数学计算)的应用程序。这个专著提供了一个全面的总结，在聚合搜索以前的研究。本文首先描述了为什么聚合搜索需要独特的解决方案。然后讨论了可能用于聚合搜索系统的不同证据来源，以及整合证据的不同技术，以便做出垂直选择和呈现决策。接下来，它调查了聚合搜索的不同评估方法，并讨论了先前的用户研究，这些研究旨在更好地理解用户如何使用聚合搜索界面。接着回顾聚合搜索中不同的高级主题。报告最后强调了主要趋势，并讨论了今后工作的短期和长期领域。

{"title":"Aggregated Search","authors":"Jaime Arguello","doi":"10.1561/1500000052","DOIUrl":"https://doi.org/10.1561/1500000052","url":null,"abstract":"The goal of aggregated search is to provide integrated search across multiple heterogeneous search services in a unified interfacea single query box and a common presentation of results. In the web search domain, aggregated search systems are responsible for integrating results from specialized search services, or verticals, alongside the core web results. For example, search portals such as Google, Bing, and Yahoo! provide access to vertical search engines that focus on different types of media (images and video), different types of search tasks (search for local businesses and online products), and even applications that can help users complete certain tasks (language translation and math calculations). This monograph provides a comprehensive summary of previous research in aggregated search. It starts by describing why aggregated search requires unique solutions. It then discusses different sources of evidence that are likely to be available to an aggregated search system, as well as different techniques for integrating evidence in order to make vertical selection and presentation decisions. Next, it surveys different evaluation methodologies for aggregated search and discusses prior user studies that have aimed to better understand how users behave with aggregated search interfaces. It proceeds to review different advanced topics in aggregated search. It concludes by highlighting the main trends and discussing short-term and long-term areas for future work.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"20 1","pages":"365-502"},"PeriodicalIF":10.4,"publicationDate":"2017-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81758037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

A Survey of Query Auto Completion in Information Retrieval 信息检索中查询自动补全的研究

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2016-09-13 DOI: 10.1561/1500000055

Fei Cai, M. de Rijke

In information retrieval, query auto completion (QAC), also known as type-ahead and auto-complete suggestion, refers to the following functionality: given a prex consisting of a number of characters entered into a search box, the user interface proposes alternative ways of extending the prex to a full query. QAC helps users to formulate their query when they have an intent in mind but not a clear way of expressing this in a query. It helps to avoid possible spelling mistakes, especially on devices with small screens. It saves keystrokes and cuts down the search duration of users which implies a lower load on the search engine, and results in savings in machine resources and maintenance. Because of the clear benets of QAC, a considerable number of algorithmic approaches to QAC have been proposed in the past few years. Query logs have proven to be a key asset underlying most of the recent research. This monograph surveys this research. It focuses on summarizing the literature on QAC and provides a general understanding of the wealth of QAC approaches that are currently available. A Survey of Query Auto Completion in Information Retrieval is an ideal reference on the topic. Its contributions can be summarized as follows: It provides researchers who are working on query auto completion or related problems in the eld of information retrieval with a good overview and analysis of state-of-the-art QAC approaches. In particular, for researchers new to the eld, the survey can serve as an introduction to the state-of-the-art. It also offers a comprehensive perspective on QAC approaches by presenting a taxonomy of existing solutions. In addition, it presents solutions for QAC under different conditions such as available high-resolution query logs, in-depth user interactions with QAC using eye-tracking, and elaborate user engagements in a QAC process. It also discusses practical issues related to QAC. Lastly, it presents a detailed discussion of core challenges and promising open directions in QAC.

在信息检索中，查询自动完成(QAC)，也称为提前输入和自动完成建议，指的是以下功能:给定一个由多个字符组成的前缀，输入到搜索框中，用户界面提出将该前缀扩展为完整查询的替代方法。当用户心中有一个意图，但在查询中没有明确的表达方式时，QAC可以帮助他们制定查询。这有助于避免可能的拼写错误，尤其是在小屏幕设备上。它节省了用户的击键次数，缩短了用户的搜索时间，从而降低了搜索引擎的负载，从而节省了机器资源和维护费用。由于QAC的明显好处，在过去几年中，已经提出了相当多的QAC算法方法。查询日志已被证明是大多数最新研究的关键资产。这本专著概述了这项研究。它着重于总结关于QAC的文献，并提供对当前可用的丰富的QAC方法的一般理解。《信息检索中的查询自动补全研究》是研究这一课题的理想参考。它的贡献可以概括如下:它为在信息检索领域从事查询自动完成或相关问题的研究人员提供了对最先进的QAC方法的良好概述和分析。特别是，对于新进入该领域的研究人员来说，该调查可以作为最新技术的介绍。通过对现有解决方案进行分类，本文还提供了对QAC方法的全面了解。此外，它还提供了不同条件下的QAC解决方案，例如可用的高分辨率查询日志、使用眼动跟踪与QAC进行深入的用户交互以及在QAC过程中详细的用户参与。并讨论了与质量保证有关的实际问题。最后，详细讨论了QAC的核心挑战和有希望的开放方向。

{"title":"A Survey of Query Auto Completion in Information Retrieval","authors":"Fei Cai, M. de Rijke","doi":"10.1561/1500000055","DOIUrl":"https://doi.org/10.1561/1500000055","url":null,"abstract":"In information retrieval, query auto completion (QAC), also known as type-ahead and auto-complete suggestion, refers to the following functionality: given a prex consisting of a number of characters entered into a search box, the user interface proposes alternative ways of extending the prex to a full query. QAC helps users to formulate their query when they have an intent in mind but not a clear way of expressing this in a query. It helps to avoid possible spelling mistakes, especially on devices with small screens. It saves keystrokes and cuts down the search duration of users which implies a lower load on the search engine, and results in savings in machine resources and maintenance. Because of the clear benets of QAC, a considerable number of algorithmic approaches to QAC have been proposed in the past few years. Query logs have proven to be a key asset underlying most of the recent research. This monograph surveys this research. It focuses on summarizing the literature on QAC and provides a general understanding of the wealth of QAC approaches that are currently available. A Survey of Query Auto Completion in Information Retrieval is an ideal reference on the topic. Its contributions can be summarized as follows: It provides researchers who are working on query auto completion or related problems in the eld of information retrieval with a good overview and analysis of state-of-the-art QAC approaches. In particular, for researchers new to the eld, the survey can serve as an introduction to the state-of-the-art. It also offers a comprehensive perspective on QAC approaches by presenting a taxonomy of existing solutions. In addition, it presents solutions for QAC under different conditions such as available high-resolution query logs, in-depth user interactions with QAC using eye-tracking, and elaborate user engagements in a QAC process. It also discusses practical issues related to QAC. Lastly, it presents a detailed discussion of core challenges and promising open directions in QAC.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"17 1","pages":"273-363"},"PeriodicalIF":10.4,"publicationDate":"2016-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77250135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 152

Online Evaluation for Information Retrieval 信息检索在线评价

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2016-06-07 DOI: 10.1561/1500000051

Katja Hofmann, Lihong Li, Filip Radlinski

Online evaluation is one of the most common approaches to measure the effectiveness of an information retrieval system. It involves fielding the information retrieval system to real users, and observing these users' interactions in-situ while they engage with the system. This allows actual users with real world information needs to play an important part in assessing retrieval quality. As such, online evaluation complements the common alternative offline evaluation approaches which may provide more easily interpretable outcomes, yet are often less realistic when measuring of quality and actual user experience.In this survey, we provide an overview of online evaluation techniques for information retrieval. We show how online evaluation is used for controlled experiments, segmenting them into experiment designs that allow absolute or relative quality assessments. Our presentation of different metrics further partitions online evaluation based on different sized experimental units commonly of interest: documents, lists and sessions. Additionally, we include an extensive discussion of recent work on data re-use, and experiment estimation based on historical data.A substantial part of this work focuses on practical issues: How to run evaluations in practice, how to select experimental parameters, how to take into account ethical considerations inherent in online evaluations, and limitations that experimenters should be aware of. While most published work on online experimentation today is at large scale in systems with millions of users, we also emphasize that the same techniques can be applied at small scale. To this end, we emphasize recent work that makes it easier to use at smaller scales and encourage studying real-world information seeking in a wide range of scenarios. Finally, we present a summary of the most recent work in the area, and describe open problems, as well as postulating future directions.

在线评估是衡量信息检索系统有效性的最常用方法之一。它涉及到将信息检索系统部署到真实的用户，并在这些用户与系统交互时现场观察他们的交互。这允许具有真实世界信息需求的实际用户在评估检索质量方面发挥重要作用。因此，在线评估补充了常见的替代离线评估方法，后者可能提供更容易解释的结果，但在衡量质量和实际用户体验时往往不太现实。在这项调查中，我们提供了一个概述在线评估技术的信息检索。我们展示了在线评估如何用于控制实验，将它们划分为实验设计，允许绝对或相对质量评估。我们对不同指标的介绍进一步划分了基于不同大小的实验单元(通常是文档、列表和会话)的在线评估。此外，我们还包括对数据重用的最新工作的广泛讨论，以及基于历史数据的实验估计。这项工作的很大一部分集中在实际问题上:如何在实践中进行评估，如何选择实验参数，如何考虑在线评估中固有的伦理考虑，以及实验者应该意识到的局限性。虽然今天发表的大多数关于在线实验的工作都是在拥有数百万用户的系统中大规模进行的，但我们也强调同样的技术可以在小规模中应用。为此，我们强调最近的工作，使其更容易在更小的范围内使用，并鼓励在广泛的场景中研究现实世界的信息搜索。最后，我们对该领域的最新工作进行了总结，并描述了尚未解决的问题，以及对未来方向的假设。

{"title":"Online Evaluation for Information Retrieval","authors":"Katja Hofmann, Lihong Li, Filip Radlinski","doi":"10.1561/1500000051","DOIUrl":"https://doi.org/10.1561/1500000051","url":null,"abstract":"Online evaluation is one of the most common approaches to measure the effectiveness of an information retrieval system. It involves fielding the information retrieval system to real users, and observing these users' interactions in-situ while they engage with the system. This allows actual users with real world information needs to play an important part in assessing retrieval quality. As such, online evaluation complements the common alternative offline evaluation approaches which may provide more easily interpretable outcomes, yet are often less realistic when measuring of quality and actual user experience.In this survey, we provide an overview of online evaluation techniques for information retrieval. We show how online evaluation is used for controlled experiments, segmenting them into experiment designs that allow absolute or relative quality assessments. Our presentation of different metrics further partitions online evaluation based on different sized experimental units commonly of interest: documents, lists and sessions. Additionally, we include an extensive discussion of recent work on data re-use, and experiment estimation based on historical data.A substantial part of this work focuses on practical issues: How to run evaluations in practice, how to select experimental parameters, how to take into account ethical considerations inherent in online evaluations, and limitations that experimenters should be aware of. While most published work on online experimentation today is at large scale in systems with millions of users, we also emphasize that the same techniques can be applied at small scale. To this end, we emphasize recent work that makes it easier to use at smaller scales and encourage studying real-world information seeking in a wide range of scenarios. Finally, we present a summary of the most recent work in the area, and describe open problems, as well as postulating future directions.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"58 1","pages":"1-117"},"PeriodicalIF":10.4,"publicationDate":"2016-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84890294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 97

Semantic Search on Text and Knowledge Bases 基于文本和知识库的语义搜索

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2016-06-07 DOI: 10.1561/1500000032

H. Bast, Björn Buchhold, Elmar Haussmann

This article provides a comprehensive overview of the broad area of semantic search on text and knowledge bases. In a nutshell, semantic search is "search with meaning". This "meaning" can refer to various parts of the search process: understanding the query instead of just finding matches of its components in the data, understanding the data instead of just searching it for such matches, or representing knowledge in a way suitable for meaningful retrieval.Semantic search is studied in a variety of different communities with a variety of different views of the problem. In this survey, we classify this work according to two dimensions: the type of data text, knowledge bases, combinations of these and the kind of search keyword, structured, natural language. We consider all nine combinations. The focus is on fundamental techniques, concrete systems, and benchmarks. The survey also considers advanced issues: ranking, indexing, ontology matching and merging, and inference. It also provides a succinct overview of fundamental natural language processing techniques: POS-tagging, named-entity recognition and disambiguation, sentence parsing, and distributional semantics.The survey is as self-contained as possible, and should thus also serve as a good tutorial for newcomers to this fascinating and highly topical field.

本文提供了对文本和知识库的广泛语义搜索领域的全面概述。简而言之，语义搜索就是“有意义的搜索”。这个“意义”可以指搜索过程的各个部分:理解查询，而不仅仅是在数据中查找其组件的匹配项;理解数据，而不仅仅是搜索这样的匹配项;或者以适合有意义检索的方式表示知识。语义搜索在各种不同的社区中进行研究，对这个问题有各种不同的看法。在这项调查中，我们根据两个维度对这项工作进行分类:数据文本的类型、知识库、它们的组合以及搜索关键字的类型、结构化、自然语言。我们考虑所有9种组合。重点是基本技术、具体系统和基准。该调查还考虑了高级问题:排名、索引、本体匹配和合并以及推理。它还简要概述了基本的自然语言处理技术:pos标记、命名实体识别和消歧义、句子解析和分布语义。这项调查是尽可能独立的，因此也应该作为一个很好的教程新手这个迷人的和高度热门的领域。

{"title":"Semantic Search on Text and Knowledge Bases","authors":"H. Bast, Björn Buchhold, Elmar Haussmann","doi":"10.1561/1500000032","DOIUrl":"https://doi.org/10.1561/1500000032","url":null,"abstract":"This article provides a comprehensive overview of the broad area of semantic search on text and knowledge bases. In a nutshell, semantic search is \"search with meaning\". This \"meaning\" can refer to various parts of the search process: understanding the query instead of just finding matches of its components in the data, understanding the data instead of just searching it for such matches, or representing knowledge in a way suitable for meaningful retrieval.Semantic search is studied in a variety of different communities with a variety of different views of the problem. In this survey, we classify this work according to two dimensions: the type of data text, knowledge bases, combinations of these and the kind of search keyword, structured, natural language. We consider all nine combinations. The focus is on fundamental techniques, concrete systems, and benchmarks. The survey also considers advanced issues: ranking, indexing, ontology matching and merging, and inference. It also provides a succinct overview of fundamental natural language processing techniques: POS-tagging, named-entity recognition and disambiguation, sentence parsing, and distributional semantics.The survey is as self-contained as possible, and should thus also serve as a good tutorial for newcomers to this fascinating and highly topical field.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"94 1","pages":"119-271"},"PeriodicalIF":10.4,"publicationDate":"2016-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90520421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 149

Credibility in Information Retrieval 信息检索中的可信度

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2015-11-18 DOI: 10.1561/1500000046

A. Gînsca, Adrian Daniel Popescu, M. Lupu

Credibility, as the general concept covering trustworthiness and expertise, but also quality and reliability, is strongly debated in philosophy, psychology, and sociology, and its adoption in computer science is therefore fraught with difficulties. Yet its importance has grown in the information access community because of two complementing factors: on one hand, it is relatively difficult to precisely point to the source of a piece of information, and on the other hand, complex algorithms, statistical machine learning, artificial intelligence, make decisions on behalf of the users, with little oversight from the users themselves.This survey presents a detailed analysis of existing credibility models from different information seeking research areas, with focus on the Web and its pervasive social component. It shows that there is a very rich body of work pertaining to different aspects and interpretations of credibility, particularly for different types of textual content e.g., Web sites, blogs, tweets, but also to other modalities videos, images, audio and topics e.g., health care. After an introduction placing credibility in the context of other sciences and relating it to trust, we argue for a quartic decomposition of credibility: expertise and trustworthiness, well documented in the literature and predominantly related to information source, and quality and reliability, raised to the status of equal partners because the source is often impossible to detect, and predominantly related to the content.The second half of the survey provides the reader with access points to the literature, grouped by research interests. Section 3 reviews general research directions: the factors that contribute to credibility assessment in human consumers of information; the models used to combine these factors; the methods to predict credibility. A smaller section is dedicated to informing users about the credibility learned from the data. Sections 4, 5, and 6 go further into details, with domain-specific credibility, social media credibility, and multimedia credibility, respectively. While each of them is best understood in the context of Sections 1 and 2, they can be read independently of each other.The last section of this survey addresses a topic not commonly considered under "credibility": the credibility of the system itself, independent of the data creators. This is a topic of particular importance in domains where the user is professionally motivated and where there are no concerns about the credibility of the data e.g. e-discovery and patent search. While there is little explicit work in this direction, we argue that this is an open research direction that is worthy of future exploration.Finally, as an additional help to the reader, an appendix lists the existing test collections that cater specifically to some aspect of credibility.Overall, this review will provide the reader with an organised and comprehensive reference guide to the state of the art and t

可信性，作为涵盖可信性和专业知识，也包括质量和可靠性的一般概念，在哲学、心理学和社会学中都有激烈的争论，因此在计算机科学中采用它充满了困难。然而，由于两个互补的因素，它在信息获取社区的重要性越来越大:一方面，精确地指出一条信息的来源相对困难，另一方面，复杂的算法，统计机器学习，人工智能，代表用户做出决策，几乎没有用户自己的监督。本调查对来自不同信息寻求研究领域的现有可信度模型进行了详细分析，重点关注网络及其无处不在的社会成分。它表明，有非常丰富的工作涉及可信度的不同方面和解释，特别是不同类型的文本内容，如网站、博客、推文，但也涉及其他形式的视频、图像、音频和主题，如保健。在介绍了将可信度置于其他科学的背景下并将其与信任联系起来之后，我们主张可信度的四次分解:专业知识和可信度，在文献中有充分记录，主要与信息来源有关，质量和可靠性，提升到平等伙伴的地位，因为来源通常不可能检测到，主要与内容有关。调查的后半部分为读者提供了文献的访问点，按研究兴趣分组。第3节综述了一般研究方向:影响信息消费者可信度评估的因素;用于组合这些因素的模型;预测可信度的方法。一个较小的部分专门用于告知用户从数据中获得的可信度。第4、5和6节进一步详细介绍了特定领域的可信度、社交媒体可信度和多媒体可信度。虽然在第1节和第2节的上下文中可以最好地理解它们，但它们可以相互独立地阅读。本调查的最后一部分涉及一个通常不被认为是“可信度”的主题:独立于数据创建者的系统本身的可信度。这是一个特别重要的主题，在用户有专业动机和不关心数据可信度的领域，如电子发现和专利检索。虽然在这个方向上很少有明确的工作，但我们认为这是一个值得未来探索的开放研究方向。最后，作为对读者的额外帮助，附录列出了专门针对可信度某些方面的现有测试集合。总的来说，这篇综述将为读者提供一个有组织和全面的参考指南，以了解当前的技术状况和手头的问题，而不是对计算机科学的可信度是什么这个问题的最终答案。即使在相对有限的精确科学范围内，对于一个本身在哲学和社会科学中广泛争论的概念，这样的答案也是不可能的。

{"title":"Credibility in Information Retrieval","authors":"A. Gînsca, Adrian Daniel Popescu, M. Lupu","doi":"10.1561/1500000046","DOIUrl":"https://doi.org/10.1561/1500000046","url":null,"abstract":"Credibility, as the general concept covering trustworthiness and expertise, but also quality and reliability, is strongly debated in philosophy, psychology, and sociology, and its adoption in computer science is therefore fraught with difficulties. Yet its importance has grown in the information access community because of two complementing factors: on one hand, it is relatively difficult to precisely point to the source of a piece of information, and on the other hand, complex algorithms, statistical machine learning, artificial intelligence, make decisions on behalf of the users, with little oversight from the users themselves.This survey presents a detailed analysis of existing credibility models from different information seeking research areas, with focus on the Web and its pervasive social component. It shows that there is a very rich body of work pertaining to different aspects and interpretations of credibility, particularly for different types of textual content e.g., Web sites, blogs, tweets, but also to other modalities videos, images, audio and topics e.g., health care. After an introduction placing credibility in the context of other sciences and relating it to trust, we argue for a quartic decomposition of credibility: expertise and trustworthiness, well documented in the literature and predominantly related to information source, and quality and reliability, raised to the status of equal partners because the source is often impossible to detect, and predominantly related to the content.The second half of the survey provides the reader with access points to the literature, grouped by research interests. Section 3 reviews general research directions: the factors that contribute to credibility assessment in human consumers of information; the models used to combine these factors; the methods to predict credibility. A smaller section is dedicated to informing users about the credibility learned from the data. Sections 4, 5, and 6 go further into details, with domain-specific credibility, social media credibility, and multimedia credibility, respectively. While each of them is best understood in the context of Sections 1 and 2, they can be read independently of each other.The last section of this survey addresses a topic not commonly considered under \"credibility\": the credibility of the system itself, independent of the data creators. This is a topic of particular importance in domains where the user is professionally motivated and where there are no concerns about the credibility of the data e.g. e-discovery and patent search. While there is little explicit work in this direction, we argue that this is an open research direction that is worthy of future exploration.Finally, as an additional help to the reader, an appendix lists the existing test collections that cater specifically to some aspect of credibility.Overall, this review will provide the reader with an organised and comprehensive reference guide to the state of the art and t","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"62 1","pages":"355-475"},"PeriodicalIF":10.4,"publicationDate":"2015-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84903879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Temporal Information Retrieval 时间信息检索

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2015-07-01 DOI: 10.1007/springerreference_65900

KanhabuaNattiya, BlancoRoi, NørvågKjetil

引用次数: 0

Search Result Diversification 搜索结果多样化

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2015-02-27 DOI: 10.1561/1500000040

Rodrygo L. T. Santos, C. Macdonald, I. Ounis

Ranking in information retrieval has been traditionally approachedas a pursuit of relevant information, under the assumption that theusers' information needs are unambiguously conveyed by their submittedqueries. Nevertheless, as an inherently limited representation of amore complex information need, every query can arguably be consideredambiguous to some extent. In order to tackle query ambiguity,search result diversification approaches have recently been proposed toproduce rankings aimed to satisfy the multiple possible informationneeds underlying a query. In this survey, we review the published literatureon search result diversification. In particular, we discuss themotivations for diversifying the search results for an ambiguous queryand provide a formal definition of the search result diversification problem.In addition, we describe the most successful approaches in theliterature for producing and evaluating diversity in multiple search domains.Finally, we also discuss recent advances as well as open researchdirections in the field of search result diversification.

信息检索中的排名传统上被认为是对相关信息的追求，假设用户的信息需求是通过他们提交的查询明确表达的。然而，作为对更复杂的信息需求的固有的有限表示，每个查询在某种程度上都可以被认为是模糊的。为了解决查询歧义，最近提出了搜索结果多样化方法来产生排序，旨在满足查询背后的多种可能的信息需求。在本调查中，我们回顾了已发表的关于搜索结果多样化的文献。特别地，我们讨论了歧义查询多样化搜索结果的动机，并提供了搜索结果多样化问题的正式定义。此外，我们描述了文献中最成功的方法，用于在多个搜索领域中产生和评估多样性。最后，讨论了搜索结果多样化领域的最新进展和开放的研究方向。

引用次数: 85

Search Result Diversification 搜索结果多样化

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2015-01-01 DOI: 10.1561/1500000043

Nattiya Kanhabua, Roi Blanco, K. Nørvåg

引用次数: 7

Music Information Retrieval: Recent Developments and Applications 音乐信息检索:最新发展与应用

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval

Pub Date : 2014-09-08 DOI: 10.1561/1500000042

M. Schedl, E. Gómez, Julián Urbano

We provide a survey of the field of Music Information Retrieval (MIR), in particular paying attention to latest developments, such as semantic auto-tagging and user-centric retrieval and recommendation approaches. We first elaborate on well-established and proven methods for feature extraction and music indexing, from both the audio signal and contextual data sources about music items, such as web pages or collaborative tags. These in turn enable a wide variety of music retrieval tasks, such as semantic music search or music identification ("query by example"). Subsequently, we review current work on user analysis and modeling in the context of music recommendation and retrieval, addressing the recent trend towards user-centric and adaptive approaches and systems. A discussion follows about the important aspect of how various MIR approaches to different problems are evaluated and compared. Eventually, a discussion about the major open challenges concludes the survey.

我们提供了音乐信息检索(MIR)领域的调查，特别关注最新的发展，如语义自动标记和以用户为中心的检索和推荐方法。我们首先详细阐述了建立和验证的特征提取和音乐索引方法，从音频信号和音乐项目的上下文数据源，如网页或协作标签。这反过来又支持各种各样的音乐检索任务，例如语义音乐搜索或音乐识别(“按示例查询”)。随后，我们回顾了当前在音乐推荐和检索背景下的用户分析和建模工作，解决了以用户为中心和自适应方法和系统的最新趋势。接下来将讨论如何评估和比较不同问题的各种MIR方法的重要方面。最后，关于主要公开挑战的讨论结束了调查。

引用次数: 213

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Foundations and Trends in Information Retrieval

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀