Explainable recommendation attempts to develop models that generate not only high-quality recommendations but also intuitive explanations. The explanations may either be post-hoc or directly come from an explainable model (also called interpretable or transparent model in some contexts). Explainable recommendation tries to address the problem of why: by providing explanations to users or system designers, it helps humans to understand why certain items are recommended by the algorithm, where the human can either be users or system designers. Explainable recommendation helps to improve the transparency, persuasiveness, effectiveness, trustworthiness, and satisfaction of recommendation systems. It also facilitates system designers for better system debugging. In recent years, a large number of explainable recommendation approaches -- especially model-based methods -- have been proposed and applied in real-world systems. In this survey, we provide a comprehensive review for the explainable recommendation research. We first highlight the position of explainable recommendation in recommender system research by categorizing recommendation problems into the 5W, i.e., what, when, who, where, and why. We then conduct a comprehensive survey of explainable recommendation on three perspectives: 1) We provide a chronological research timeline of explainable recommendation. 2) We provide a two-dimensional taxonomy to classify existing explainable recommendation research. 3) We summarize how explainable recommendation applies to different recommendation tasks. We also devote a chapter to discuss the explanation perspectives in broader IR and AI/ML research. We end the survey by discussing potential future directions to promote the explainable recommendation research area and beyond.
{"title":"Explainable Recommendation: A Survey and New Perspectives","authors":"Yongfeng Zhang, Xu Chen","doi":"10.1561/1500000066","DOIUrl":"https://doi.org/10.1561/1500000066","url":null,"abstract":"Explainable recommendation attempts to develop models that generate not only high-quality recommendations but also intuitive explanations. The explanations may either be post-hoc or directly come from an explainable model (also called interpretable or transparent model in some contexts). Explainable recommendation tries to address the problem of why: by providing explanations to users or system designers, it helps humans to understand why certain items are recommended by the algorithm, where the human can either be users or system designers. Explainable recommendation helps to improve the transparency, persuasiveness, effectiveness, trustworthiness, and satisfaction of recommendation systems. It also facilitates system designers for better system debugging. In recent years, a large number of explainable recommendation approaches -- especially model-based methods -- have been proposed and applied in real-world systems. \u0000In this survey, we provide a comprehensive review for the explainable recommendation research. We first highlight the position of explainable recommendation in recommender system research by categorizing recommendation problems into the 5W, i.e., what, when, who, where, and why. We then conduct a comprehensive survey of explainable recommendation on three perspectives: 1) We provide a chronological research timeline of explainable recommendation. 2) We provide a two-dimensional taxonomy to classify existing explainable recommendation research. 3) We summarize how explainable recommendation applies to different recommendation tasks. We also devote a chapter to discuss the explanation perspectives in broader IR and AI/ML research. We end the survey by discussing potential future directions to promote the explainable recommendation research area and beyond.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"18 1","pages":"1-101"},"PeriodicalIF":10.4,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87223946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Purves, Paul D. Clough, Christopher B. Jones, M. Hall, Vanessa Murdock
Significant amounts of information available today contain references to places on earth. Traditionally such information has been held as structured data and was the concern of Geographic Information Systems (GIS). However, increasing amounts of data in the form of unstructured text are available for indexing and retrieval that also contain spatial references. This monograph describes the field of Geographic Information Retrieval (GIR) that seeks to develop spatially-aware search systems and support user’s geographical information needs. Important concepts with respect to storing, querying and analysing geographical information in computers are introduced, before user needs and interaction in the context of GIR are explored. The task of associating documents with coordinates, prior to their indexing and ranking forms the core of any GIR system, and different approaches and their implications are discussed. Evaluating the resulting systems and their components, and different paradigms for doing so continue to be an important area of research in GIR and are illustrated through several examples. The monograph provides an overview of the research field, and in so doing identifies key remaining research challenges in GIR.
{"title":"Geographic Information Retrieval: Progress and Challenges in Spatial Search of Text","authors":"R. Purves, Paul D. Clough, Christopher B. Jones, M. Hall, Vanessa Murdock","doi":"10.1561/1500000034","DOIUrl":"https://doi.org/10.1561/1500000034","url":null,"abstract":"Significant amounts of information available today contain references to places on earth. Traditionally such information has been held as structured data and was the concern of Geographic Information Systems (GIS). However, increasing amounts of data in the form of unstructured text are available for indexing and retrieval that also contain spatial references. This monograph describes the field of Geographic Information Retrieval (GIR) that seeks to develop spatially-aware search systems and support user’s geographical information needs. Important concepts with respect to storing, querying and analysing geographical information in computers are introduced, before user needs and interaction in the context of GIR are explored. The task of associating documents with coordinates, prior to their indexing and ranking forms the core of any GIR system, and different approaches and their implications are discussed. Evaluating the resulting systems and their components, and different paradigms for doing so continue to be an important area of research in GIR and are illustrated through several examples. The monograph provides an overview of the research field, and in so doing identifies key remaining research challenges in GIR.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"10 1","pages":"164-318"},"PeriodicalIF":10.4,"publicationDate":"2018-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88432922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Hoogeveen, Li Wang, Timothy Baldwin, Karin M. Verspoor
This survey presents an overview of information retrieval, natural languageprocessing and machine learning research that makes use of forumdata, including both discussion forums and community questionansweringcQA archives. The focus is on automated analysis, withthe goal of gaining a better understanding of the data and its users.We discuss the different strategies used for both retrieval taskspost retrieval, question retrieval, and answer retrieval and classificationtasks post type classification, question classification, post qualityassessment, subjectivity, and viewpoint classification at the postlevel, as well as at the thread level thread retrieval, solvedness andtask orientation, discourse structure recovery and dialogue act tagging,QA-pair extraction, and thread summarisation. We also review workon forum users, including user satisfaction, expert finding, questionrecommendation and routing, and community analysis.The survey includes a brief history of forums, an overview of thedifferent kinds of forums, a summary of publicly available datasets forforum research, and a short discussion on the evaluation of retrievaltasks using forum data.The aim is to give a broad overview of the different kinds of forumresearch, a summary of the methods that have been applied, some insightsinto successful strategies, and potential areas for future research.
{"title":"Web Forum Retrieval and Text Analytics: A Survey","authors":"D. Hoogeveen, Li Wang, Timothy Baldwin, Karin M. Verspoor","doi":"10.1561/1500000062","DOIUrl":"https://doi.org/10.1561/1500000062","url":null,"abstract":"This survey presents an overview of information retrieval, natural languageprocessing and machine learning research that makes use of forumdata, including both discussion forums and community questionansweringcQA archives. The focus is on automated analysis, withthe goal of gaining a better understanding of the data and its users.We discuss the different strategies used for both retrieval taskspost retrieval, question retrieval, and answer retrieval and classificationtasks post type classification, question classification, post qualityassessment, subjectivity, and viewpoint classification at the postlevel, as well as at the thread level thread retrieval, solvedness andtask orientation, discourse structure recovery and dialogue act tagging,QA-pair extraction, and thread summarisation. We also review workon forum users, including user satisfaction, expert finding, questionrecommendation and routing, and community analysis.The survey includes a brief history of forums, an overview of thedifferent kinds of forums, a summary of publicly available datasets forforum research, and a short discussion on the evaluation of retrievaltasks using forum data.The aim is to give a broad overview of the different kinds of forumresearch, a summary of the methods that have been applied, some insightsinto successful strategies, and potential areas for future research.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"54 1","pages":"1-163"},"PeriodicalIF":10.4,"publicationDate":"2018-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79217784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
How can a single person understand what’s going on in a collection of millions of documents? This is an increasingly widespread problem: sifting through an organization’s e-mails, understanding a decade worth of newspapers, or characterizing a scientific field’s research. This monograph explores the ways that humans and computers make sense of document collections through tools called topic models. Topic models are a statistical framework that help users understand large document collections; not just to find individual documents but to understand the general themes present in the collection. Applications of Topic Models describes the recent academic and industrial applications of topic models. In addition to topic models’ effective application to traditional problems like information retrieval, visualization, statistical inference, multilingual modeling, and linguistic understanding, Applications of Topic Models also reviews topic models’ ability to unlock large text collections for qualitative analysis. It reviews their successful use by researchers to help understand fiction, non-fiction, scientific publications, and political texts. Applications of Topic Models is aimed at the reader with some knowledge of document processing, basic understanding of some probability, and interested in many application domains. It discusses the information needs of each application area, and how those specific needs affect models, curation procedures, and interpretations. By the end of the monograph, it is hoped that readers will be excited enough to attempt to embark on building their own topic models. It should also be of interest to topic model experts as the coverage of diverse applications may expose models and approaches they had not seen before.
{"title":"Applications of Topic Models","authors":"Jordan L. Boyd-Graber, Yuening Hu, David Mimno","doi":"10.1561/1500000030","DOIUrl":"https://doi.org/10.1561/1500000030","url":null,"abstract":"How can a single person understand what’s going on in a collection of millions of documents? This is an increasingly widespread problem: sifting through an organization’s e-mails, understanding a decade worth of newspapers, or characterizing a scientific field’s research. This monograph explores the ways that humans and computers make sense of document collections through tools called topic models. Topic models are a statistical framework that help users understand large document collections; not just to find individual documents but to understand the general themes present in the collection. Applications of Topic Models describes the recent academic and industrial applications of topic models. In addition to topic models’ effective application to traditional problems like information retrieval, visualization, statistical inference, multilingual modeling, and linguistic understanding, Applications of Topic Models also reviews topic models’ ability to unlock large text collections for qualitative analysis. It reviews their successful use by researchers to help understand fiction, non-fiction, scientific publications, and political texts. Applications of Topic Models is aimed at the reader with some knowledge of document processing, basic understanding of some probability, and interested in many application domains. It discusses the information needs of each application area, and how those specific needs affect models, curation procedures, and interpretations. By the end of the monograph, it is hoped that readers will be excited enough to attempt to embark on building their own topic models. It should also be of interest to topic model experts as the coverage of diverse applications may expose models and approaches they had not seen before.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"18 1","pages":"143-296"},"PeriodicalIF":10.4,"publicationDate":"2017-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81818542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The goal of aggregated search is to provide integrated search across multiple heterogeneous search services in a unified interfacea single query box and a common presentation of results. In the web search domain, aggregated search systems are responsible for integrating results from specialized search services, or verticals, alongside the core web results. For example, search portals such as Google, Bing, and Yahoo! provide access to vertical search engines that focus on different types of media (images and video), different types of search tasks (search for local businesses and online products), and even applications that can help users complete certain tasks (language translation and math calculations). This monograph provides a comprehensive summary of previous research in aggregated search. It starts by describing why aggregated search requires unique solutions. It then discusses different sources of evidence that are likely to be available to an aggregated search system, as well as different techniques for integrating evidence in order to make vertical selection and presentation decisions. Next, it surveys different evaluation methodologies for aggregated search and discusses prior user studies that have aimed to better understand how users behave with aggregated search interfaces. It proceeds to review different advanced topics in aggregated search. It concludes by highlighting the main trends and discussing short-term and long-term areas for future work.
{"title":"Aggregated Search","authors":"Jaime Arguello","doi":"10.1561/1500000052","DOIUrl":"https://doi.org/10.1561/1500000052","url":null,"abstract":"The goal of aggregated search is to provide integrated search across multiple heterogeneous search services in a unified interfacea single query box and a common presentation of results. In the web search domain, aggregated search systems are responsible for integrating results from specialized search services, or verticals, alongside the core web results. For example, search portals such as Google, Bing, and Yahoo! provide access to vertical search engines that focus on different types of media (images and video), different types of search tasks (search for local businesses and online products), and even applications that can help users complete certain tasks (language translation and math calculations). This monograph provides a comprehensive summary of previous research in aggregated search. It starts by describing why aggregated search requires unique solutions. It then discusses different sources of evidence that are likely to be available to an aggregated search system, as well as different techniques for integrating evidence in order to make vertical selection and presentation decisions. Next, it surveys different evaluation methodologies for aggregated search and discusses prior user studies that have aimed to better understand how users behave with aggregated search interfaces. It proceeds to review different advanced topics in aggregated search. It concludes by highlighting the main trends and discussing short-term and long-term areas for future work.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"20 1","pages":"365-502"},"PeriodicalIF":10.4,"publicationDate":"2017-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81758037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In information retrieval, query auto completion (QAC), also known as type-ahead and auto-complete suggestion, refers to the following functionality: given a prex consisting of a number of characters entered into a search box, the user interface proposes alternative ways of extending the prex to a full query. QAC helps users to formulate their query when they have an intent in mind but not a clear way of expressing this in a query. It helps to avoid possible spelling mistakes, especially on devices with small screens. It saves keystrokes and cuts down the search duration of users which implies a lower load on the search engine, and results in savings in machine resources and maintenance. Because of the clear benets of QAC, a considerable number of algorithmic approaches to QAC have been proposed in the past few years. Query logs have proven to be a key asset underlying most of the recent research. This monograph surveys this research. It focuses on summarizing the literature on QAC and provides a general understanding of the wealth of QAC approaches that are currently available. A Survey of Query Auto Completion in Information Retrieval is an ideal reference on the topic. Its contributions can be summarized as follows: It provides researchers who are working on query auto completion or related problems in the eld of information retrieval with a good overview and analysis of state-of-the-art QAC approaches. In particular, for researchers new to the eld, the survey can serve as an introduction to the state-of-the-art. It also offers a comprehensive perspective on QAC approaches by presenting a taxonomy of existing solutions. In addition, it presents solutions for QAC under different conditions such as available high-resolution query logs, in-depth user interactions with QAC using eye-tracking, and elaborate user engagements in a QAC process. It also discusses practical issues related to QAC. Lastly, it presents a detailed discussion of core challenges and promising open directions in QAC.
{"title":"A Survey of Query Auto Completion in Information Retrieval","authors":"Fei Cai, M. de Rijke","doi":"10.1561/1500000055","DOIUrl":"https://doi.org/10.1561/1500000055","url":null,"abstract":"In information retrieval, query auto completion (QAC), also known as type-ahead and auto-complete suggestion, refers to the following functionality: given a prex consisting of a number of characters entered into a search box, the user interface proposes alternative ways of extending the prex to a full query. QAC helps users to formulate their query when they have an intent in mind but not a clear way of expressing this in a query. It helps to avoid possible spelling mistakes, especially on devices with small screens. It saves keystrokes and cuts down the search duration of users which implies a lower load on the search engine, and results in savings in machine resources and maintenance. Because of the clear benets of QAC, a considerable number of algorithmic approaches to QAC have been proposed in the past few years. Query logs have proven to be a key asset underlying most of the recent research. This monograph surveys this research. It focuses on summarizing the literature on QAC and provides a general understanding of the wealth of QAC approaches that are currently available. A Survey of Query Auto Completion in Information Retrieval is an ideal reference on the topic. Its contributions can be summarized as follows: It provides researchers who are working on query auto completion or related problems in the eld of information retrieval with a good overview and analysis of state-of-the-art QAC approaches. In particular, for researchers new to the eld, the survey can serve as an introduction to the state-of-the-art. It also offers a comprehensive perspective on QAC approaches by presenting a taxonomy of existing solutions. In addition, it presents solutions for QAC under different conditions such as available high-resolution query logs, in-depth user interactions with QAC using eye-tracking, and elaborate user engagements in a QAC process. It also discusses practical issues related to QAC. Lastly, it presents a detailed discussion of core challenges and promising open directions in QAC.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"17 1","pages":"273-363"},"PeriodicalIF":10.4,"publicationDate":"2016-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77250135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online evaluation is one of the most common approaches to measure the effectiveness of an information retrieval system. It involves fielding the information retrieval system to real users, and observing these users' interactions in-situ while they engage with the system. This allows actual users with real world information needs to play an important part in assessing retrieval quality. As such, online evaluation complements the common alternative offline evaluation approaches which may provide more easily interpretable outcomes, yet are often less realistic when measuring of quality and actual user experience.In this survey, we provide an overview of online evaluation techniques for information retrieval. We show how online evaluation is used for controlled experiments, segmenting them into experiment designs that allow absolute or relative quality assessments. Our presentation of different metrics further partitions online evaluation based on different sized experimental units commonly of interest: documents, lists and sessions. Additionally, we include an extensive discussion of recent work on data re-use, and experiment estimation based on historical data.A substantial part of this work focuses on practical issues: How to run evaluations in practice, how to select experimental parameters, how to take into account ethical considerations inherent in online evaluations, and limitations that experimenters should be aware of. While most published work on online experimentation today is at large scale in systems with millions of users, we also emphasize that the same techniques can be applied at small scale. To this end, we emphasize recent work that makes it easier to use at smaller scales and encourage studying real-world information seeking in a wide range of scenarios. Finally, we present a summary of the most recent work in the area, and describe open problems, as well as postulating future directions.
{"title":"Online Evaluation for Information Retrieval","authors":"Katja Hofmann, Lihong Li, Filip Radlinski","doi":"10.1561/1500000051","DOIUrl":"https://doi.org/10.1561/1500000051","url":null,"abstract":"Online evaluation is one of the most common approaches to measure the effectiveness of an information retrieval system. It involves fielding the information retrieval system to real users, and observing these users' interactions in-situ while they engage with the system. This allows actual users with real world information needs to play an important part in assessing retrieval quality. As such, online evaluation complements the common alternative offline evaluation approaches which may provide more easily interpretable outcomes, yet are often less realistic when measuring of quality and actual user experience.In this survey, we provide an overview of online evaluation techniques for information retrieval. We show how online evaluation is used for controlled experiments, segmenting them into experiment designs that allow absolute or relative quality assessments. Our presentation of different metrics further partitions online evaluation based on different sized experimental units commonly of interest: documents, lists and sessions. Additionally, we include an extensive discussion of recent work on data re-use, and experiment estimation based on historical data.A substantial part of this work focuses on practical issues: How to run evaluations in practice, how to select experimental parameters, how to take into account ethical considerations inherent in online evaluations, and limitations that experimenters should be aware of. While most published work on online experimentation today is at large scale in systems with millions of users, we also emphasize that the same techniques can be applied at small scale. To this end, we emphasize recent work that makes it easier to use at smaller scales and encourage studying real-world information seeking in a wide range of scenarios. Finally, we present a summary of the most recent work in the area, and describe open problems, as well as postulating future directions.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"58 1","pages":"1-117"},"PeriodicalIF":10.4,"publicationDate":"2016-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84890294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article provides a comprehensive overview of the broad area of semantic search on text and knowledge bases. In a nutshell, semantic search is "search with meaning". This "meaning" can refer to various parts of the search process: understanding the query instead of just finding matches of its components in the data, understanding the data instead of just searching it for such matches, or representing knowledge in a way suitable for meaningful retrieval.Semantic search is studied in a variety of different communities with a variety of different views of the problem. In this survey, we classify this work according to two dimensions: the type of data text, knowledge bases, combinations of these and the kind of search keyword, structured, natural language. We consider all nine combinations. The focus is on fundamental techniques, concrete systems, and benchmarks. The survey also considers advanced issues: ranking, indexing, ontology matching and merging, and inference. It also provides a succinct overview of fundamental natural language processing techniques: POS-tagging, named-entity recognition and disambiguation, sentence parsing, and distributional semantics.The survey is as self-contained as possible, and should thus also serve as a good tutorial for newcomers to this fascinating and highly topical field.
{"title":"Semantic Search on Text and Knowledge Bases","authors":"H. Bast, Björn Buchhold, Elmar Haussmann","doi":"10.1561/1500000032","DOIUrl":"https://doi.org/10.1561/1500000032","url":null,"abstract":"This article provides a comprehensive overview of the broad area of semantic search on text and knowledge bases. In a nutshell, semantic search is \"search with meaning\". This \"meaning\" can refer to various parts of the search process: understanding the query instead of just finding matches of its components in the data, understanding the data instead of just searching it for such matches, or representing knowledge in a way suitable for meaningful retrieval.Semantic search is studied in a variety of different communities with a variety of different views of the problem. In this survey, we classify this work according to two dimensions: the type of data text, knowledge bases, combinations of these and the kind of search keyword, structured, natural language. We consider all nine combinations. The focus is on fundamental techniques, concrete systems, and benchmarks. The survey also considers advanced issues: ranking, indexing, ontology matching and merging, and inference. It also provides a succinct overview of fundamental natural language processing techniques: POS-tagging, named-entity recognition and disambiguation, sentence parsing, and distributional semantics.The survey is as self-contained as possible, and should thus also serve as a good tutorial for newcomers to this fascinating and highly topical field.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"94 1","pages":"119-271"},"PeriodicalIF":10.4,"publicationDate":"2016-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90520421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Credibility, as the general concept covering trustworthiness and expertise, but also quality and reliability, is strongly debated in philosophy, psychology, and sociology, and its adoption in computer science is therefore fraught with difficulties. Yet its importance has grown in the information access community because of two complementing factors: on one hand, it is relatively difficult to precisely point to the source of a piece of information, and on the other hand, complex algorithms, statistical machine learning, artificial intelligence, make decisions on behalf of the users, with little oversight from the users themselves.This survey presents a detailed analysis of existing credibility models from different information seeking research areas, with focus on the Web and its pervasive social component. It shows that there is a very rich body of work pertaining to different aspects and interpretations of credibility, particularly for different types of textual content e.g., Web sites, blogs, tweets, but also to other modalities videos, images, audio and topics e.g., health care. After an introduction placing credibility in the context of other sciences and relating it to trust, we argue for a quartic decomposition of credibility: expertise and trustworthiness, well documented in the literature and predominantly related to information source, and quality and reliability, raised to the status of equal partners because the source is often impossible to detect, and predominantly related to the content.The second half of the survey provides the reader with access points to the literature, grouped by research interests. Section 3 reviews general research directions: the factors that contribute to credibility assessment in human consumers of information; the models used to combine these factors; the methods to predict credibility. A smaller section is dedicated to informing users about the credibility learned from the data. Sections 4, 5, and 6 go further into details, with domain-specific credibility, social media credibility, and multimedia credibility, respectively. While each of them is best understood in the context of Sections 1 and 2, they can be read independently of each other.The last section of this survey addresses a topic not commonly considered under "credibility": the credibility of the system itself, independent of the data creators. This is a topic of particular importance in domains where the user is professionally motivated and where there are no concerns about the credibility of the data e.g. e-discovery and patent search. While there is little explicit work in this direction, we argue that this is an open research direction that is worthy of future exploration.Finally, as an additional help to the reader, an appendix lists the existing test collections that cater specifically to some aspect of credibility.Overall, this review will provide the reader with an organised and comprehensive reference guide to the state of the art and t
{"title":"Credibility in Information Retrieval","authors":"A. Gînsca, Adrian Daniel Popescu, M. Lupu","doi":"10.1561/1500000046","DOIUrl":"https://doi.org/10.1561/1500000046","url":null,"abstract":"Credibility, as the general concept covering trustworthiness and expertise, but also quality and reliability, is strongly debated in philosophy, psychology, and sociology, and its adoption in computer science is therefore fraught with difficulties. Yet its importance has grown in the information access community because of two complementing factors: on one hand, it is relatively difficult to precisely point to the source of a piece of information, and on the other hand, complex algorithms, statistical machine learning, artificial intelligence, make decisions on behalf of the users, with little oversight from the users themselves.This survey presents a detailed analysis of existing credibility models from different information seeking research areas, with focus on the Web and its pervasive social component. It shows that there is a very rich body of work pertaining to different aspects and interpretations of credibility, particularly for different types of textual content e.g., Web sites, blogs, tweets, but also to other modalities videos, images, audio and topics e.g., health care. After an introduction placing credibility in the context of other sciences and relating it to trust, we argue for a quartic decomposition of credibility: expertise and trustworthiness, well documented in the literature and predominantly related to information source, and quality and reliability, raised to the status of equal partners because the source is often impossible to detect, and predominantly related to the content.The second half of the survey provides the reader with access points to the literature, grouped by research interests. Section 3 reviews general research directions: the factors that contribute to credibility assessment in human consumers of information; the models used to combine these factors; the methods to predict credibility. A smaller section is dedicated to informing users about the credibility learned from the data. Sections 4, 5, and 6 go further into details, with domain-specific credibility, social media credibility, and multimedia credibility, respectively. While each of them is best understood in the context of Sections 1 and 2, they can be read independently of each other.The last section of this survey addresses a topic not commonly considered under \"credibility\": the credibility of the system itself, independent of the data creators. This is a topic of particular importance in domains where the user is professionally motivated and where there are no concerns about the credibility of the data e.g. e-discovery and patent search. While there is little explicit work in this direction, we argue that this is an open research direction that is worthy of future exploration.Finally, as an additional help to the reader, an appendix lists the existing test collections that cater specifically to some aspect of credibility.Overall, this review will provide the reader with an organised and comprehensive reference guide to the state of the art and t","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"62 1","pages":"355-475"},"PeriodicalIF":10.4,"publicationDate":"2015-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84903879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}