The premise of entity retrieval is to better answer search queries by returning specific entities instead of documents. Many queries mention particular entities; recognizing and linking them to the corresponding entry in a knowledge base is known as the task of entity linking in queries. In this paper we make a first attempt at bringing together these two, i.e., leveraging entity annotations of queries in the entity retrieval model. We introduce a new probabilistic component and show how it can be applied on top of any term-based entity retrieval model that can be emulated in the Markov Random Field framework, including language models, sequential dependence models, as well as their fielded variations. Using a standard entity retrieval test collection, we show that our extension brings consistent improvements over all baseline methods, including the current state-of-the-art. We further show that our extension is robust against parameter settings.
{"title":"Exploiting Entity Linking in Queries for Entity Retrieval","authors":"Faegheh Hasibi, K. Balog, Svein Erik Bratsberg","doi":"10.1145/2970398.2970406","DOIUrl":"https://doi.org/10.1145/2970398.2970406","url":null,"abstract":"The premise of entity retrieval is to better answer search queries by returning specific entities instead of documents. Many queries mention particular entities; recognizing and linking them to the corresponding entry in a knowledge base is known as the task of entity linking in queries. In this paper we make a first attempt at bringing together these two, i.e., leveraging entity annotations of queries in the entity retrieval model. We introduce a new probabilistic component and show how it can be applied on top of any term-based entity retrieval model that can be emulated in the Markov Random Field framework, including language models, sequential dependence models, as well as their fielded variations. Using a standard entity retrieval test collection, we show that our extension brings consistent improvements over all baseline methods, including the current state-of-the-art. We further show that our extension is robust against parameter settings.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129450631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
significant increase in the number of questions in question answering forums has led to the interest in text categorization methods for classifying a newly posted question as good (suitable) or bad (otherwise) for the forum. Standard text categorization approaches, e.g. multinomial Naive Bayes, are likely to be unsuitable for this classification task because of: i) the lack of sufficient informative content in the questions due to their relatively short length; and ii) considerable vocabulary overlap between the classes. To increase the robustness of this classification task, we propose to use the neighbourhood of existing questions which are similar to the newly asked question. Instead of learning the classification boundary from the questions alone, we transform each question vector into a different one in the feature space. We explore two different neighbourhood functions using: the discrete term space, the continuous vector space of real numbers obtained from vector embeddings of documents. Experiments conducted on StackOverflow data show that our approach of using the neighborhood transformation can improve classification accuracy by up to about 8%.
{"title":"Nearest Neighbour based Transformation Functions for Text Classification: A Case Study with StackOverflow","authors":"Piyush Arora, Debasis Ganguly, G. Jones","doi":"10.1145/2970398.2970426","DOIUrl":"https://doi.org/10.1145/2970398.2970426","url":null,"abstract":"significant increase in the number of questions in question answering forums has led to the interest in text categorization methods for classifying a newly posted question as good (suitable) or bad (otherwise) for the forum. Standard text categorization approaches, e.g. multinomial Naive Bayes, are likely to be unsuitable for this classification task because of: i) the lack of sufficient informative content in the questions due to their relatively short length; and ii) considerable vocabulary overlap between the classes. To increase the robustness of this classification task, we propose to use the neighbourhood of existing questions which are similar to the newly asked question. Instead of learning the classification boundary from the questions alone, we transform each question vector into a different one in the feature space. We explore two different neighbourhood functions using: the discrete term space, the continuous vector space of real numbers obtained from vector embeddings of documents. Experiments conducted on StackOverflow data show that our approach of using the neighborhood transformation can improve classification accuracy by up to about 8%.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128383893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developing effective information retrieval models has been a long standing challenge in Information Retrieval (IR), and significant progresses have been made over the years. With the increasing number of developed retrieval functions and the release of new data collections, it becomes more difficult, if not impossible, to compare a new retrieval function with all existing retrieval functions over all available data collections. To tackle thisproblem, this paper describes our efforts on constructing a platform that aims to improve the reproducibility of IR researchand facilitate the evaluation and comparison of retrieval functions. With the developed platform, more than 20 state of the art retrieval functions have been implemented and systematically evaluated over 16 standard TREC collections (including the newly released ClueWeb datasets). Our reproducibility study leads to several interesting observations. First, the performance difference between the reproduced results and those reported in the original papers is small for most retrieval functions. Second, the optimal performance of a few representative retrieval functions is still comparable over the new TREC ClueWeb collections. Finally, the developed platform (i.e., RISE) is made publicly available so that any IR researchers would be able to utilize it to evaluate other retrieval functions.
{"title":"A Reproducibility Study of Information Retrieval Models","authors":"Peilin Yang, Hui Fang","doi":"10.1145/2970398.2970415","DOIUrl":"https://doi.org/10.1145/2970398.2970415","url":null,"abstract":"Developing effective information retrieval models has been a long standing challenge in Information Retrieval (IR), and significant progresses have been made over the years. With the increasing number of developed retrieval functions and the release of new data collections, it becomes more difficult, if not impossible, to compare a new retrieval function with all existing retrieval functions over all available data collections. To tackle thisproblem, this paper describes our efforts on constructing a platform that aims to improve the reproducibility of IR researchand facilitate the evaluation and comparison of retrieval functions. With the developed platform, more than 20 state of the art retrieval functions have been implemented and systematically evaluated over 16 standard TREC collections (including the newly released ClueWeb datasets). Our reproducibility study leads to several interesting observations. First, the performance difference between the reproduced results and those reported in the original papers is small for most retrieval functions. Second, the optimal performance of a few representative retrieval functions is still comparable over the new TREC ClueWeb collections. Finally, the developed platform (i.e., RISE) is made publicly available so that any IR researchers would be able to utilize it to evaluate other retrieval functions.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126569781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bag-of-words retrieval models are widely used, and provide a robust trade-off between efficiency and effectiveness. These models often make simplifying assumptions about relations between query terms, and treat term statistics independently. However, query terms are rarely independent, and previous work has repeatedly shown that term dependencies can be critical to improving the effectiveness of ranked retrieval results. Among all term-dependency models, the Markov Random Field (MRF) [Metzler and Croft, SIGIR, 2005] model has received the most attention in recent years. Despite clear effectiveness improvements, these models are not deployed in performance-critical applications because of the potentially high computational costs. As a result, bigram models are generally considered to be the best compromise between full term dependence, and term-independent models such as BM25. Here we provide further evidence that term-dependency features not captured by bag-of-words models can reliably improve retrieval effectiveness. We also present a new variation on the highly-effective MRF model that relies on a BM25-derived potential. The benefit of this approach is that it is built from feature functions which require no higher-order global statistics. We empirically show that our new model reduces retrieval costs by up to 60%, with no loss in effectiveness compared to previous approaches.
词袋检索模型被广泛使用,并且在效率和有效性之间提供了一个稳健的权衡。这些模型通常对查询词之间的关系做出简化的假设,并独立地处理词统计。然而,查询词很少是独立的,以前的工作一再表明,词依赖关系对于提高排序检索结果的有效性至关重要。在所有的术语依赖模型中,Markov Random Field (MRF) [Metzler and Croft, SIGIR, 2005]模型近年来受到了最广泛的关注。尽管有明显的有效性改进,但由于潜在的高计算成本,这些模型没有部署在性能关键型应用程序中。因此,双元模型通常被认为是完全项依赖模型和项独立模型(如BM25)之间的最佳折衷。在这里,我们提供了进一步的证据,证明词袋模型未捕获的术语依赖特征可以可靠地提高检索效率。我们还提出了一种依赖于bm25衍生电位的高效MRF模型的新变体。这种方法的好处是,它是由不需要高阶全局统计的特征函数构建的。我们的经验表明,我们的新模型减少了高达60%的检索成本,与以前的方法相比,没有损失的有效性。
{"title":"Efficient and Effective Higher Order Proximity Modeling","authors":"Xiaolu Lu, Alistair Moffat, J. Culpepper","doi":"10.1145/2970398.2970404","DOIUrl":"https://doi.org/10.1145/2970398.2970404","url":null,"abstract":"Bag-of-words retrieval models are widely used, and provide a robust trade-off between efficiency and effectiveness. These models often make simplifying assumptions about relations between query terms, and treat term statistics independently. However, query terms are rarely independent, and previous work has repeatedly shown that term dependencies can be critical to improving the effectiveness of ranked retrieval results. Among all term-dependency models, the Markov Random Field (MRF) [Metzler and Croft, SIGIR, 2005] model has received the most attention in recent years. Despite clear effectiveness improvements, these models are not deployed in performance-critical applications because of the potentially high computational costs. As a result, bigram models are generally considered to be the best compromise between full term dependence, and term-independent models such as BM25. Here we provide further evidence that term-dependency features not captured by bag-of-words models can reliably improve retrieval effectiveness. We also present a new variation on the highly-effective MRF model that relies on a BM25-derived potential. The benefit of this approach is that it is built from feature functions which require no higher-order global statistics. We empirically show that our new model reduces retrieval costs by up to 60%, with no loss in effectiveness compared to previous approaches.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114463764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Webber, Moffat and Zobel proposed score standardization for information retrieval evaluation with multiple test collections. Given a topic-by-run raw score matrix in terms of some evaluation measure, each score can be standardised using the topic's sample mean and sample standard deviation across a set of past runs so as to quantify how different a system is from the "average" system in standard deviation units. Using standardised scores, researchers can compare systems across different test collections without worrying about topic hardness or normalisation. WhileWebber et al. mapped the standardised scores to the [0, 1] range using a standard normal cumulative density function, the present study demonstrates that linear transformation of the standardised scores, a method widely used in educational research, can be a simple and effective alternative. We use three TREC robust track data sets with graded relevance assessments and official runs to compare these methods by means of leave-one-out tests, discriminative power, swap rate tests, and topic set size design. In particular, we demonstrate that our method is superior to the method of Webber et al. in terms of swap rates and topic set size design: put simply, our method ensures pairwise system comparisons that are more consistent across different data sets, and is arguably more convenient for designing a new test collection from a statistical viewpoint.
{"title":"A Simple and Effective Approach to Score Standardisation","authors":"T. Sakai","doi":"10.1145/2970398.2970399","DOIUrl":"https://doi.org/10.1145/2970398.2970399","url":null,"abstract":"Webber, Moffat and Zobel proposed score standardization for information retrieval evaluation with multiple test collections. Given a topic-by-run raw score matrix in terms of some evaluation measure, each score can be standardised using the topic's sample mean and sample standard deviation across a set of past runs so as to quantify how different a system is from the \"average\" system in standard deviation units. Using standardised scores, researchers can compare systems across different test collections without worrying about topic hardness or normalisation. WhileWebber et al. mapped the standardised scores to the [0, 1] range using a standard normal cumulative density function, the present study demonstrates that linear transformation of the standardised scores, a method widely used in educational research, can be a simple and effective alternative. We use three TREC robust track data sets with graded relevance assessments and official runs to compare these methods by means of leave-one-out tests, discriminative power, swap rate tests, and topic set size design. In particular, we demonstrate that our method is superior to the method of Webber et al. in terms of swap rates and topic set size design: put simply, our method ensures pairwise system comparisons that are more consistent across different data sets, and is arguably more convenient for designing a new test collection from a statistical viewpoint.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"118 4-5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114048263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Liao, Wai Lam, Shoaib Jameel, S. Schockaert, Xing Xie
We consider the problem of identifying possible companions for a user who is planning to visit a given venue. Specifically, we study the task of predicting which of the user's current friends, in a location based social network (LBSN), are most likely to be interested in joining the visit. An important underlying assumption of our model is that friendship relations can be clustered based on the kinds of interests that are shared by the friends. To identify these friendship types, we use a latent topic model, which moreover takes into account the geographic proximity of the user to the location of the proposed venue. To the best of our knowledge, our model is the first that addresses the task of recommending companions for a proposed activity. While a number of existing topic models can be adapted to make such predictions, we experimentally show that such methods are significantly outperformed by our model.
{"title":"Who Wants to Join Me?: Companion Recommendation in Location Based Social Networks","authors":"Yi Liao, Wai Lam, Shoaib Jameel, S. Schockaert, Xing Xie","doi":"10.1145/2970398.2970420","DOIUrl":"https://doi.org/10.1145/2970398.2970420","url":null,"abstract":"We consider the problem of identifying possible companions for a user who is planning to visit a given venue. Specifically, we study the task of predicting which of the user's current friends, in a location based social network (LBSN), are most likely to be interested in joining the visit. An important underlying assumption of our model is that friendship relations can be clustered based on the kinds of interests that are shared by the friends. To identify these friendship types, we use a latent topic model, which moreover takes into account the geographic proximity of the user to the location of the proposed venue. To the best of our knowledge, our model is the first that addresses the task of recommending companions for a proposed activity. While a number of existing topic models can be adapted to make such predictions, we experimentally show that such methods are significantly outperformed by our model.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122678810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lydia Weiland, Ioana Hulpus, Simone Paolo Ponzetto, Laura Dietz
The message of news articles is often supported by the pointed use of iconic images. These images together with their captions encourage emotional involvement of the reader. Current algorithms for understanding the semantics of news articles focus on its text, often ignoring the image. On the other side, works that target the semantics of images, mostly focus on recognizing and enumerating the objects that appear in the image. In this work, we explore the problem from another perspective: Can we devise algorithms to understand the message encoded by images and their captions? To answer this question, we study how well algorithms can describe an image-caption pair in terms of Wikipedia entities, thereby casting the problem as an entity-ranking task with an image-caption pair as query. Our proposed algorithm brings together aspects of entity linking, subgraph selection, entity clustering, relatedness measures, and learning-to-rank. In our experiments, we focus on media-iconic image-caption pairs which often reflect complex subjects such as sustainable energy and endangered species. Our test collection includes a gold standard of over 300 image-caption pairs about topics at different levels of abstraction. We show that with a MAP of 0.69, the best results are obtained when aggregating content-based and graph-based features in a Wikipedia-derived knowledge base.
{"title":"Understanding the Message of Images with Knowledge Base Traversals","authors":"Lydia Weiland, Ioana Hulpus, Simone Paolo Ponzetto, Laura Dietz","doi":"10.1145/2970398.2970414","DOIUrl":"https://doi.org/10.1145/2970398.2970414","url":null,"abstract":"The message of news articles is often supported by the pointed use of iconic images. These images together with their captions encourage emotional involvement of the reader. Current algorithms for understanding the semantics of news articles focus on its text, often ignoring the image. On the other side, works that target the semantics of images, mostly focus on recognizing and enumerating the objects that appear in the image. In this work, we explore the problem from another perspective: Can we devise algorithms to understand the message encoded by images and their captions? To answer this question, we study how well algorithms can describe an image-caption pair in terms of Wikipedia entities, thereby casting the problem as an entity-ranking task with an image-caption pair as query. Our proposed algorithm brings together aspects of entity linking, subgraph selection, entity clustering, relatedness measures, and learning-to-rank. In our experiments, we focus on media-iconic image-caption pairs which often reflect complex subjects such as sustainable energy and endangered species. Our test collection includes a gold standard of over 300 image-caption pairs about topics at different levels of abstraction. We show that with a MAP of 0.69, the best results are obtained when aggregating content-based and graph-based features in a Wikipedia-derived knowledge base.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114130719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Retrievability is an independent evaluation measure that offers insights to an aspect of retrieval systems that performance and efficiency measures do not. Retrievability is often used to calculate the retrievability bias, an indication of how accessible a system makes all the documents in a collection. Generally, computing the retrievability bias of a system requires a colossal number of queries to be issued for the system to gain an accurate estimate of the bias. However, it is often the case that the accuracy of the estimate is not of importance, but the relationship between the estimate of bias and performance when tuning a systems parameters. As such, reaching a stable estimation of bias for the system is more important than getting very accurate retrievability scores for individual documents. This work explores the idea of using topical subsets of the collection for query generation and bias estimation to form a local estimate of bias which correlates with the global estimate of retrievability bias. By using topical subsets, it would be possible to reduce the volume of queries required to reach an accurate estimate of retrievability bias, reducing the time and resources required to perform a retrievability analysis. Findings suggest that this is a viable approach to estimating retrievability bias and that the number of queries required can be reduced to less than a quarter of what was previously thought necessary.
{"title":"A Topical Approach to Retrievability Bias Estimation","authors":"C. Wilkie, L. Azzopardi","doi":"10.1145/2970398.2970437","DOIUrl":"https://doi.org/10.1145/2970398.2970437","url":null,"abstract":"Retrievability is an independent evaluation measure that offers insights to an aspect of retrieval systems that performance and efficiency measures do not. Retrievability is often used to calculate the retrievability bias, an indication of how accessible a system makes all the documents in a collection. Generally, computing the retrievability bias of a system requires a colossal number of queries to be issued for the system to gain an accurate estimate of the bias. However, it is often the case that the accuracy of the estimate is not of importance, but the relationship between the estimate of bias and performance when tuning a systems parameters. As such, reaching a stable estimation of bias for the system is more important than getting very accurate retrievability scores for individual documents. This work explores the idea of using topical subsets of the collection for query generation and bias estimation to form a local estimate of bias which correlates with the global estimate of retrievability bias. By using topical subsets, it would be possible to reduce the volume of queries required to reach an accurate estimate of retrievability bias, reducing the time and resources required to perform a retrievability analysis. Findings suggest that this is a viable approach to estimating retrievability bias and that the number of queries required can be reduced to less than a quarter of what was previously thought necessary.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114339287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Searching is performed in the context of a task and as such the value of the information found is with respect to the task. Recently, there has been a drive to developing formal models of information seeking and retrieval that consider the costs and benefits arising through the interaction with the interface/system and the information surfaced during that interaction. In this full day tutorial we will focus on describing and explaining some of the more recent and latest formal models of Information Seeking and Retrieval. The tutorial is structured into two parts. In the first part we will present a series of models that have been developed based on: (i) economic theory, (ii) decision theory (iii) game theory and (iv) optimal foraging theory. The second part of the day will be dedicated to building models where we will discuss different techniques to build and develop models from which we can draw testable hypotheses from. During the tutorial participants will be challenged to develop various formals models, applying the techniques learnt during the day. We will then conclude with presentations on solutions followed by a summary and overview of challenges and future directions. This tutorial is aimed at participants wanting to know more about the various formal models of information seeking, search and retrieval, that have been proposed. The tutorial will be presented at an intermediate level, and is designed to support participants who want to be able to understand and build such models.
{"title":"Advances in Formal Models of Search and Search Behaviour","authors":"L. Azzopardi, G. Zuccon","doi":"10.1145/2970398.2970440","DOIUrl":"https://doi.org/10.1145/2970398.2970440","url":null,"abstract":"Searching is performed in the context of a task and as such the value of the information found is with respect to the task. Recently, there has been a drive to developing formal models of information seeking and retrieval that consider the costs and benefits arising through the interaction with the interface/system and the information surfaced during that interaction. In this full day tutorial we will focus on describing and explaining some of the more recent and latest formal models of Information Seeking and Retrieval. The tutorial is structured into two parts. In the first part we will present a series of models that have been developed based on: (i) economic theory, (ii) decision theory (iii) game theory and (iv) optimal foraging theory. The second part of the day will be dedicated to building models where we will discuss different techniques to build and develop models from which we can draw testable hypotheses from. During the tutorial participants will be challenged to develop various formals models, applying the techniques learnt during the day. We will then conclude with presentations on solutions followed by a summary and overview of challenges and future directions. This tutorial is aimed at participants wanting to know more about the various formal models of information seeking, search and retrieval, that have been proposed. The tutorial will be presented at an intermediate level, and is designed to support participants who want to be able to understand and build such models.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126552190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. T. Damessie, Falk Scholer, K. Järvelin, J. Culpepper
Human relevance judgments are a key component for measuring the effectiveness of information retrieval systems using test collections. Since relevance is not an absolute concept, human assessors can disagree on particular topic-document pairs for a variety of reasons. In this work we investigate the effect that document presentation order has on inter-rater agreement, comparing two presentation ordering approaches similar to those used in IR evaluation campaigns: decreasing relevance order and document identifier order. We make a further distinction between "easy" topics and "hard" topics in order to explore system effects on inter-rater agreement. The results of our pilot user study indicate that assessor agreement is higher when documents are judged in document identifier order. In addition, there is higher overall agreement on easy topics than on hard topics.
{"title":"The Effect of Document Order and Topic Difficulty on Assessor Agreement","authors":"T. T. Damessie, Falk Scholer, K. Järvelin, J. Culpepper","doi":"10.1145/2970398.2970431","DOIUrl":"https://doi.org/10.1145/2970398.2970431","url":null,"abstract":"Human relevance judgments are a key component for measuring the effectiveness of information retrieval systems using test collections. Since relevance is not an absolute concept, human assessors can disagree on particular topic-document pairs for a variety of reasons. In this work we investigate the effect that document presentation order has on inter-rater agreement, comparing two presentation ordering approaches similar to those used in IR evaluation campaigns: decreasing relevance order and document identifier order. We make a further distinction between \"easy\" topics and \"hard\" topics in order to explore system effects on inter-rater agreement. The results of our pilot user study indicate that assessor agreement is higher when documents are judged in document identifier order. In addition, there is higher overall agreement on easy topics than on hard topics.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130628075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}