Online evaluation is one of the most common approaches to measure the effectiveness of an information retrieval system. It involves fielding the information retrieval system to real users, and observing these users' interactions in-situ while they engage with the system. This allows actual users with real world information needs to play an important part in assessing retrieval quality. As such, online evaluation complements the common alternative offline evaluation approaches which may provide more easily interpretable outcomes, yet are often less realistic when measuring of quality and actual user experience.In this survey, we provide an overview of online evaluation techniques for information retrieval. We show how online evaluation is used for controlled experiments, segmenting them into experiment designs that allow absolute or relative quality assessments. Our presentation of different metrics further partitions online evaluation based on different sized experimental units commonly of interest: documents, lists and sessions. Additionally, we include an extensive discussion of recent work on data re-use, and experiment estimation based on historical data.A substantial part of this work focuses on practical issues: How to run evaluations in practice, how to select experimental parameters, how to take into account ethical considerations inherent in online evaluations, and limitations that experimenters should be aware of. While most published work on online experimentation today is at large scale in systems with millions of users, we also emphasize that the same techniques can be applied at small scale. To this end, we emphasize recent work that makes it easier to use at smaller scales and encourage studying real-world information seeking in a wide range of scenarios. Finally, we present a summary of the most recent work in the area, and describe open problems, as well as postulating future directions.
{"title":"Online Evaluation for Information Retrieval","authors":"Katja Hofmann, Lihong Li, Filip Radlinski","doi":"10.1561/1500000051","DOIUrl":"https://doi.org/10.1561/1500000051","url":null,"abstract":"Online evaluation is one of the most common approaches to measure the effectiveness of an information retrieval system. It involves fielding the information retrieval system to real users, and observing these users' interactions in-situ while they engage with the system. This allows actual users with real world information needs to play an important part in assessing retrieval quality. As such, online evaluation complements the common alternative offline evaluation approaches which may provide more easily interpretable outcomes, yet are often less realistic when measuring of quality and actual user experience.In this survey, we provide an overview of online evaluation techniques for information retrieval. We show how online evaluation is used for controlled experiments, segmenting them into experiment designs that allow absolute or relative quality assessments. Our presentation of different metrics further partitions online evaluation based on different sized experimental units commonly of interest: documents, lists and sessions. Additionally, we include an extensive discussion of recent work on data re-use, and experiment estimation based on historical data.A substantial part of this work focuses on practical issues: How to run evaluations in practice, how to select experimental parameters, how to take into account ethical considerations inherent in online evaluations, and limitations that experimenters should be aware of. While most published work on online experimentation today is at large scale in systems with millions of users, we also emphasize that the same techniques can be applied at small scale. To this end, we emphasize recent work that makes it easier to use at smaller scales and encourage studying real-world information seeking in a wide range of scenarios. Finally, we present a summary of the most recent work in the area, and describe open problems, as well as postulating future directions.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"58 1","pages":"1-117"},"PeriodicalIF":10.4,"publicationDate":"2016-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84890294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article provides a comprehensive overview of the broad area of semantic search on text and knowledge bases. In a nutshell, semantic search is "search with meaning". This "meaning" can refer to various parts of the search process: understanding the query instead of just finding matches of its components in the data, understanding the data instead of just searching it for such matches, or representing knowledge in a way suitable for meaningful retrieval.Semantic search is studied in a variety of different communities with a variety of different views of the problem. In this survey, we classify this work according to two dimensions: the type of data text, knowledge bases, combinations of these and the kind of search keyword, structured, natural language. We consider all nine combinations. The focus is on fundamental techniques, concrete systems, and benchmarks. The survey also considers advanced issues: ranking, indexing, ontology matching and merging, and inference. It also provides a succinct overview of fundamental natural language processing techniques: POS-tagging, named-entity recognition and disambiguation, sentence parsing, and distributional semantics.The survey is as self-contained as possible, and should thus also serve as a good tutorial for newcomers to this fascinating and highly topical field.
{"title":"Semantic Search on Text and Knowledge Bases","authors":"H. Bast, Björn Buchhold, Elmar Haussmann","doi":"10.1561/1500000032","DOIUrl":"https://doi.org/10.1561/1500000032","url":null,"abstract":"This article provides a comprehensive overview of the broad area of semantic search on text and knowledge bases. In a nutshell, semantic search is \"search with meaning\". This \"meaning\" can refer to various parts of the search process: understanding the query instead of just finding matches of its components in the data, understanding the data instead of just searching it for such matches, or representing knowledge in a way suitable for meaningful retrieval.Semantic search is studied in a variety of different communities with a variety of different views of the problem. In this survey, we classify this work according to two dimensions: the type of data text, knowledge bases, combinations of these and the kind of search keyword, structured, natural language. We consider all nine combinations. The focus is on fundamental techniques, concrete systems, and benchmarks. The survey also considers advanced issues: ranking, indexing, ontology matching and merging, and inference. It also provides a succinct overview of fundamental natural language processing techniques: POS-tagging, named-entity recognition and disambiguation, sentence parsing, and distributional semantics.The survey is as self-contained as possible, and should thus also serve as a good tutorial for newcomers to this fascinating and highly topical field.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"94 1","pages":"119-271"},"PeriodicalIF":10.4,"publicationDate":"2016-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90520421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Credibility, as the general concept covering trustworthiness and expertise, but also quality and reliability, is strongly debated in philosophy, psychology, and sociology, and its adoption in computer science is therefore fraught with difficulties. Yet its importance has grown in the information access community because of two complementing factors: on one hand, it is relatively difficult to precisely point to the source of a piece of information, and on the other hand, complex algorithms, statistical machine learning, artificial intelligence, make decisions on behalf of the users, with little oversight from the users themselves.This survey presents a detailed analysis of existing credibility models from different information seeking research areas, with focus on the Web and its pervasive social component. It shows that there is a very rich body of work pertaining to different aspects and interpretations of credibility, particularly for different types of textual content e.g., Web sites, blogs, tweets, but also to other modalities videos, images, audio and topics e.g., health care. After an introduction placing credibility in the context of other sciences and relating it to trust, we argue for a quartic decomposition of credibility: expertise and trustworthiness, well documented in the literature and predominantly related to information source, and quality and reliability, raised to the status of equal partners because the source is often impossible to detect, and predominantly related to the content.The second half of the survey provides the reader with access points to the literature, grouped by research interests. Section 3 reviews general research directions: the factors that contribute to credibility assessment in human consumers of information; the models used to combine these factors; the methods to predict credibility. A smaller section is dedicated to informing users about the credibility learned from the data. Sections 4, 5, and 6 go further into details, with domain-specific credibility, social media credibility, and multimedia credibility, respectively. While each of them is best understood in the context of Sections 1 and 2, they can be read independently of each other.The last section of this survey addresses a topic not commonly considered under "credibility": the credibility of the system itself, independent of the data creators. This is a topic of particular importance in domains where the user is professionally motivated and where there are no concerns about the credibility of the data e.g. e-discovery and patent search. While there is little explicit work in this direction, we argue that this is an open research direction that is worthy of future exploration.Finally, as an additional help to the reader, an appendix lists the existing test collections that cater specifically to some aspect of credibility.Overall, this review will provide the reader with an organised and comprehensive reference guide to the state of the art and t
{"title":"Credibility in Information Retrieval","authors":"A. Gînsca, Adrian Daniel Popescu, M. Lupu","doi":"10.1561/1500000046","DOIUrl":"https://doi.org/10.1561/1500000046","url":null,"abstract":"Credibility, as the general concept covering trustworthiness and expertise, but also quality and reliability, is strongly debated in philosophy, psychology, and sociology, and its adoption in computer science is therefore fraught with difficulties. Yet its importance has grown in the information access community because of two complementing factors: on one hand, it is relatively difficult to precisely point to the source of a piece of information, and on the other hand, complex algorithms, statistical machine learning, artificial intelligence, make decisions on behalf of the users, with little oversight from the users themselves.This survey presents a detailed analysis of existing credibility models from different information seeking research areas, with focus on the Web and its pervasive social component. It shows that there is a very rich body of work pertaining to different aspects and interpretations of credibility, particularly for different types of textual content e.g., Web sites, blogs, tweets, but also to other modalities videos, images, audio and topics e.g., health care. After an introduction placing credibility in the context of other sciences and relating it to trust, we argue for a quartic decomposition of credibility: expertise and trustworthiness, well documented in the literature and predominantly related to information source, and quality and reliability, raised to the status of equal partners because the source is often impossible to detect, and predominantly related to the content.The second half of the survey provides the reader with access points to the literature, grouped by research interests. Section 3 reviews general research directions: the factors that contribute to credibility assessment in human consumers of information; the models used to combine these factors; the methods to predict credibility. A smaller section is dedicated to informing users about the credibility learned from the data. Sections 4, 5, and 6 go further into details, with domain-specific credibility, social media credibility, and multimedia credibility, respectively. While each of them is best understood in the context of Sections 1 and 2, they can be read independently of each other.The last section of this survey addresses a topic not commonly considered under \"credibility\": the credibility of the system itself, independent of the data creators. This is a topic of particular importance in domains where the user is professionally motivated and where there are no concerns about the credibility of the data e.g. e-discovery and patent search. While there is little explicit work in this direction, we argue that this is an open research direction that is worthy of future exploration.Finally, as an additional help to the reader, an appendix lists the existing test collections that cater specifically to some aspect of credibility.Overall, this review will provide the reader with an organised and comprehensive reference guide to the state of the art and t","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"62 1","pages":"355-475"},"PeriodicalIF":10.4,"publicationDate":"2015-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84903879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-01DOI: 10.1007/springerreference_65900
KanhabuaNattiya, BlancoRoi, NørvågKjetil
{"title":"Temporal Information Retrieval","authors":"KanhabuaNattiya, BlancoRoi, NørvågKjetil","doi":"10.1007/springerreference_65900","DOIUrl":"https://doi.org/10.1007/springerreference_65900","url":null,"abstract":"","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"1 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52982468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ranking in information retrieval has been traditionally approachedas a pursuit of relevant information, under the assumption that theusers' information needs are unambiguously conveyed by their submittedqueries. Nevertheless, as an inherently limited representation of amore complex information need, every query can arguably be consideredambiguous to some extent. In order to tackle query ambiguity,search result diversification approaches have recently been proposed toproduce rankings aimed to satisfy the multiple possible informationneeds underlying a query. In this survey, we review the published literatureon search result diversification. In particular, we discuss themotivations for diversifying the search results for an ambiguous queryand provide a formal definition of the search result diversification problem.In addition, we describe the most successful approaches in theliterature for producing and evaluating diversity in multiple search domains.Finally, we also discuss recent advances as well as open researchdirections in the field of search result diversification.
{"title":"Search Result Diversification","authors":"Rodrygo L. T. Santos, C. Macdonald, I. Ounis","doi":"10.1561/1500000040","DOIUrl":"https://doi.org/10.1561/1500000040","url":null,"abstract":"Ranking in information retrieval has been traditionally approachedas a pursuit of relevant information, under the assumption that theusers' information needs are unambiguously conveyed by their submittedqueries. Nevertheless, as an inherently limited representation of amore complex information need, every query can arguably be consideredambiguous to some extent. In order to tackle query ambiguity,search result diversification approaches have recently been proposed toproduce rankings aimed to satisfy the multiple possible informationneeds underlying a query. In this survey, we review the published literatureon search result diversification. In particular, we discuss themotivations for diversifying the search results for an ambiguous queryand provide a formal definition of the search result diversification problem.In addition, we describe the most successful approaches in theliterature for producing and evaluating diversity in multiple search domains.Finally, we also discuss recent advances as well as open researchdirections in the field of search result diversification.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"55 1","pages":"1-90"},"PeriodicalIF":10.4,"publicationDate":"2015-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90741623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Search Result Diversification","authors":"Nattiya Kanhabua, Roi Blanco, K. Nørvåg","doi":"10.1561/1500000043","DOIUrl":"https://doi.org/10.1561/1500000043","url":null,"abstract":"","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"9 1","pages":"91-208"},"PeriodicalIF":10.4,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1561/1500000043","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67081922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We provide a survey of the field of Music Information Retrieval (MIR), in particular paying attention to latest developments, such as semantic auto-tagging and user-centric retrieval and recommendation approaches. We first elaborate on well-established and proven methods for feature extraction and music indexing, from both the audio signal and contextual data sources about music items, such as web pages or collaborative tags. These in turn enable a wide variety of music retrieval tasks, such as semantic music search or music identification ("query by example"). Subsequently, we review current work on user analysis and modeling in the context of music recommendation and retrieval, addressing the recent trend towards user-centric and adaptive approaches and systems. A discussion follows about the important aspect of how various MIR approaches to different problems are evaluated and compared. Eventually, a discussion about the major open challenges concludes the survey.
{"title":"Music Information Retrieval: Recent Developments and Applications","authors":"M. Schedl, E. Gómez, Julián Urbano","doi":"10.1561/1500000042","DOIUrl":"https://doi.org/10.1561/1500000042","url":null,"abstract":"We provide a survey of the field of Music Information Retrieval (MIR), in particular paying attention to latest developments, such as semantic auto-tagging and user-centric retrieval and recommendation approaches. We first elaborate on well-established and proven methods for feature extraction and music indexing, from both the audio signal and contextual data sources about music items, such as web pages or collaborative tags. These in turn enable a wide variety of music retrieval tasks, such as semantic music search or music identification (\"query by example\"). Subsequently, we review current work on user analysis and modeling in the context of music recommendation and retrieval, addressing the recent trend towards user-centric and adaptive approaches and systems. A discussion follows about the important aspect of how various MIR approaches to different problems are evaluated and compared. Eventually, a discussion about the major open challenges concludes the survey.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"22 1","pages":"127-261"},"PeriodicalIF":10.4,"publicationDate":"2014-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83916735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We have recently observed a convergence of technologies to foster the emergence of lifelogging as a mainstream activity. Computer storage has become significantly cheaper, and advancements in sensing technology allows for the efficient sensing of personal activities, locations and the environment. This is best seen in the growing popularity of the quantified self movement, in which life activities are tracked using wearable sensors in the hope of better understanding human performance in a variety of tasks. This review aims to provide a comprehensive summary of lifelogging, to cover its research history, current technologies, and applications. Thus far, most of the lifelogging research has focused predominantly on visual lifelogging, hence we maintain this focus in this review. However, we also reflect on the challenges lifelogging poses for information access and retrieval in general. This review is a suitable reference for those seeking an information retrieval scientist's perspective on lifelogging and the quantified self.
{"title":"LifeLogging: Personal Big Data","authors":"C. Gurrin, A. Smeaton, A. Doherty","doi":"10.1561/1500000033","DOIUrl":"https://doi.org/10.1561/1500000033","url":null,"abstract":"We have recently observed a convergence of technologies to foster the emergence of lifelogging as a mainstream activity. Computer storage has become significantly cheaper, and advancements in sensing technology allows for the efficient sensing of personal activities, locations and the environment. This is best seen in the growing popularity of the quantified self movement, in which life activities are tracked using wearable sensors in the hope of better understanding human performance in a variety of tasks. This review aims to provide a comprehensive summary of lifelogging, to cover its research history, current technologies, and applications. Thus far, most of the lifelogging research has focused predominantly on visual lifelogging, hence we maintain this focus in this review. However, we also reflect on the challenges lifelogging poses for information access and retrieval in general. This review is a suitable reference for those seeking an information retrieval scientist's perspective on lifelogging and the quantified self.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"4 1","pages":"1-125"},"PeriodicalIF":10.4,"publicationDate":"2014-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88918794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational Advertising: Techniques for Targeting Relevant Ads","authors":"Kushal S. Dave, Vasudeva Varma","doi":"10.1561/1500000045","DOIUrl":"https://doi.org/10.1561/1500000045","url":null,"abstract":"","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"41 1","pages":"263-418"},"PeriodicalIF":10.4,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79064423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Music Information Retrieval: Recent Developments and Applications","authors":"Kushal S. Dave, Vasudeva Varma","doi":"10.1561/9781601988331","DOIUrl":"https://doi.org/10.1561/9781601988331","url":null,"abstract":"","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"8 1","pages":"263-418"},"PeriodicalIF":10.4,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67081977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}