Mohammed Hasanuzzaman, S. Saha, G. Dias, S. Ferrari
Understanding the temporal orientation of web search queries is an important issue for the success of information access systems. In this paper, we propose a multi-objective ensemble learning solution that (1) allows to accurately classify queries along their temporal intent and (2) identifies a set of performing solutions thus offering a wide range of possible applications. Experiments show that correct representation of the problem can lead to great classification improvements when compared to recent state-of-the-art solutions and baseline ensemble techniques.
{"title":"Understanding Temporal Query Intent","authors":"Mohammed Hasanuzzaman, S. Saha, G. Dias, S. Ferrari","doi":"10.1145/2766462.2767792","DOIUrl":"https://doi.org/10.1145/2766462.2767792","url":null,"abstract":"Understanding the temporal orientation of web search queries is an important issue for the success of information access systems. In this paper, we propose a multi-objective ensemble learning solution that (1) allows to accurately classify queries along their temporal intent and (2) identifies a set of performing solutions thus offering a wide range of possible applications. Experiments show that correct representation of the problem can lead to great classification improvements when compared to recent state-of-the-art solutions and baseline ensemble techniques.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"18 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120970822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information retrieval researchers and engineers use human computation as a mechanism to produce labeled data sets for product development, research and experimentation. To gather useful results, a successful labeling task relies on many different elements: clear instructions, user interface guidelines, representative high-quality datasets, appropriate inter-rater agreement metrics, work quality checks, and channels for worker feedback. Furthermore, designing and implementing tasks that produce and use several thousands or millions of labels is different than conducting small scale research investigations. In this paper we present a perspective for collecting high quality labels with an emphasis on practical problems and scalability. We focus on three main topics: programming crowds, debugging tasks with low agreement, and algorithms for quality control. We show examples from an industrial setting.
{"title":"Practical Lessons for Gathering Quality Labels at Scale","authors":"Omar Alonso","doi":"10.1145/2766462.2776778","DOIUrl":"https://doi.org/10.1145/2766462.2776778","url":null,"abstract":"Information retrieval researchers and engineers use human computation as a mechanism to produce labeled data sets for product development, research and experimentation. To gather useful results, a successful labeling task relies on many different elements: clear instructions, user interface guidelines, representative high-quality datasets, appropriate inter-rater agreement metrics, work quality checks, and channels for worker feedback. Furthermore, designing and implementing tasks that produce and use several thousands or millions of labels is different than conducting small scale research investigations. In this paper we present a perspective for collecting high quality labels with an emphasis on practical problems and scalability. We focus on three main topics: programming crowds, debugging tasks with low agreement, and algorithms for quality control. We show examples from an industrial setting.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126780736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Entering 'Football Players from United States' when searching for 'American Footballers' is an example of vocabulary mismatch, which occurs when different words are used to express the same concepts. In order to address this phenomenon for entity search targeting descriptors for complex categories, we propose a compositional-distributional semantics entity search engine, which extracts semantic and commonsense knowledge from large-scale corpora to address the vocabulary gap between query and data.
{"title":"Linse: A Distributional Semantics Entity Search Engine","authors":"J. Sales, A. Freitas, S. Handschuh, Brian Davis","doi":"10.1145/2766462.2767871","DOIUrl":"https://doi.org/10.1145/2766462.2767871","url":null,"abstract":"Entering 'Football Players from United States' when searching for 'American Footballers' is an example of vocabulary mismatch, which occurs when different words are used to express the same concepts. In order to address this phenomenon for entity search targeting descriptors for complex categories, we propose a compositional-distributional semantics entity search engine, which extracts semantic and commonsense knowledge from large-scale corpora to address the vocabulary gap between query and data.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121483555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern Internet companies improve evaluation criteria of their data-driven decision-making that is based on online controlled experiments (also known as A/B tests). The amplitude metrics of user engagement are known to be well sensitive to service changes, but they could not be used to determine, whether the treatment effect is positive or negative. We propose to overcome this sign-agnostic issue by paying attention to the phase of the corresponding DFT sine wave. We refine the amplitude metrics of the first frequency by the phase ones and formalize our intuition in several novel overall evaluation criteria. These criteria are then verified over A/B experiments on real users of Yandex. We find that our approach holds the sensitivity level of the amplitudes and makes their changes sign-aware w.r.t. the treatment effect.
{"title":"Sign-Aware Periodicity Metrics of User Engagement for Online Search Quality Evaluation","authors":"Alexey Drutsa","doi":"10.1145/2766462.2767814","DOIUrl":"https://doi.org/10.1145/2766462.2767814","url":null,"abstract":"Modern Internet companies improve evaluation criteria of their data-driven decision-making that is based on online controlled experiments (also known as A/B tests). The amplitude metrics of user engagement are known to be well sensitive to service changes, but they could not be used to determine, whether the treatment effect is positive or negative. We propose to overcome this sign-agnostic issue by paying attention to the phase of the corresponding DFT sine wave. We refine the amplitude metrics of the first frequency by the phase ones and formalize our intuition in several novel overall evaluation criteria. These criteria are then verified over A/B experiments on real users of Yandex. We find that our approach holds the sensitivity level of the amplitudes and makes their changes sign-aware w.r.t. the treatment effect.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129189861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online music streaming services (MSS) experienced exponential growth over the past decade. The giant MSS providers not only built massive music collection with metadata, they also accumulated large amount of heterogeneous data generated from users, e.g. listening history, comment, bookmark, and user generated playlist. While various kinds of user data can potentially be used to enhance the music recommendation performance, most existing studies only focused on audio content features and collaborative filtering approaches based on simple user listening history or music rating. In this paper, we propose a novel approach to solve the music recommendation problem by means of heterogeneous graph mining. Meta-path based features are automatically generated from a content-rich heterogeneous graph schema with 6 types of nodes and 16 types of relations. Meanwhile, we use learning-to-rank approach to integrate different features for music recommendation. Experiment results show that the automatically generated graphical features significantly (p<0.0001) enhance state-of-the-art collaborative filtering algorithm.
{"title":"Automatic Feature Generation on Heterogeneous Graph for Music Recommendation","authors":"Chun Guo, Xiaozhong Liu","doi":"10.1145/2766462.2767808","DOIUrl":"https://doi.org/10.1145/2766462.2767808","url":null,"abstract":"Online music streaming services (MSS) experienced exponential growth over the past decade. The giant MSS providers not only built massive music collection with metadata, they also accumulated large amount of heterogeneous data generated from users, e.g. listening history, comment, bookmark, and user generated playlist. While various kinds of user data can potentially be used to enhance the music recommendation performance, most existing studies only focused on audio content features and collaborative filtering approaches based on simple user listening history or music rating. In this paper, we propose a novel approach to solve the music recommendation problem by means of heterogeneous graph mining. Meta-path based features are automatically generated from a content-rich heterogeneous graph schema with 6 types of nodes and 16 types of relations. Meanwhile, we use learning-to-rank approach to integrate different features for music recommendation. Experiment results show that the automatically generated graphical features significantly (p<0.0001) enhance state-of-the-art collaborative filtering algorithm.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130434673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The provision of "ten blue links" has emerged as the standard for the design of search engine result pages (SERPs). While numerous aspects of SERPs have been examined, little attention has been paid to the number of results displayed per page. This paper investigates the relationships among the number of results shown on a SERP, search behavior and user experience. We performed a laboratory experiment with 36 subjects, who were randomly assigned to use one of three search interfaces that varied according to the number of results per SERP (three, six or ten). We found subjects' click distributions differed significantly depending on SERP size. We also found those who interacted with three results per page viewed significantly more SERPs per query; interestingly, the number of SERPs they viewed per query corresponded to about 10 search results. Subjects who interacted with ten results per page viewed and saved significantly more documents. They also reported the greatest difficulty finding relevant documents, rated their skills the lowest and reported greater workload, even though these differences were not significant. This work shows that behavior changes with SERP size, such that more time is spent focused on earlier results when SERP size decreases.
{"title":"How many results per page?: A Study of SERP Size, Search Behavior and User Experience","authors":"D. Kelly, L. Azzopardi","doi":"10.1145/2766462.2767732","DOIUrl":"https://doi.org/10.1145/2766462.2767732","url":null,"abstract":"The provision of \"ten blue links\" has emerged as the standard for the design of search engine result pages (SERPs). While numerous aspects of SERPs have been examined, little attention has been paid to the number of results displayed per page. This paper investigates the relationships among the number of results shown on a SERP, search behavior and user experience. We performed a laboratory experiment with 36 subjects, who were randomly assigned to use one of three search interfaces that varied according to the number of results per SERP (three, six or ten). We found subjects' click distributions differed significantly depending on SERP size. We also found those who interacted with three results per page viewed significantly more SERPs per query; interestingly, the number of SERPs they viewed per query corresponded to about 10 search results. Subjects who interacted with ten results per page viewed and saved significantly more documents. They also reported the greatest difficulty finding relevant documents, rated their skills the lowest and reported greater workload, even though these differences were not significant. This work shows that behavior changes with SERP size, such that more time is spent focused on earlier results when SERP size decreases.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130747075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The majority of research into Collaborative Information Retrieval (CIR) has assumed a uniformity of information access and visibility between collaborators. However in a number of real world scenarios, information access is not uniform between all collaborators in a team e.g. security, health etc. This can be referred to as Multi-Level Collaborative Information Retrieval (MLCIR). To the best of our knowledge, there has not yet been any systematic investigation of the effect of MLCIR on search outcomes. To address this shortcoming, in this paper, we present the results of a simulated evaluation conducted over 4 different non-uniform information access scenarios and 3 different collaborative search strategies. Results indicate that there is some tolerance to removing access to the collection and that there may not always be a negative impact on performance. We also highlight how different access scenarios and search strategies impact on search outcomes.
{"title":"Towards Quantifying the Impact of Non-Uniform Information Access in Collaborative Information Retrieval","authors":"N. Htun, Martin Halvey, L. Baillie","doi":"10.1145/2766462.2767779","DOIUrl":"https://doi.org/10.1145/2766462.2767779","url":null,"abstract":"The majority of research into Collaborative Information Retrieval (CIR) has assumed a uniformity of information access and visibility between collaborators. However in a number of real world scenarios, information access is not uniform between all collaborators in a team e.g. security, health etc. This can be referred to as Multi-Level Collaborative Information Retrieval (MLCIR). To the best of our knowledge, there has not yet been any systematic investigation of the effect of MLCIR on search outcomes. To address this shortcoming, in this paper, we present the results of a simulated evaluation conducted over 4 different non-uniform information access scenarios and 3 different collaborative search strategies. Results indicate that there is some tolerance to removing access to the collection and that there may not always be a negative impact on performance. We also highlight how different access scenarios and search strategies impact on search outcomes.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131061865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timothy Jones, Paul Thomas, Falk Scholer, M. Sanderson
Many IR effectiveness measures are motivated from intuition, theory, or user studies. In general, most effectiveness measures are well correlated with each other. But, what about where they don't correlate? Which rankings cause measures to disagree? Are these rankings predictable for particular pairs of measures? In this work, we examine how and where metrics disagree, and identify differences that should be considered when selecting metrics for use in evaluating retrieval systems.
{"title":"Features of Disagreement Between Retrieval Effectiveness Measures","authors":"Timothy Jones, Paul Thomas, Falk Scholer, M. Sanderson","doi":"10.1145/2766462.2767824","DOIUrl":"https://doi.org/10.1145/2766462.2767824","url":null,"abstract":"Many IR effectiveness measures are motivated from intuition, theory, or user studies. In general, most effectiveness measures are well correlated with each other. But, what about where they don't correlate? Which rankings cause measures to disagree? Are these rankings predictable for particular pairs of measures? In this work, we examine how and where metrics disagree, and identify differences that should be considered when selecting metrics for use in evaluating retrieval systems.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131208558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information retrieval systems rank documents, and shared-task evaluations yield results that can be used to rank information retrieval systems. Comparing rankings in ways that can yield useful insights is thus an important capability. When making such comparisons, it is often useful to give greater weight to comparisons near the head of a ranked list than to what happens further down. This is the focus of the widely used τAP measure. When scores are available, gap-sensitive measures give greater weight to larger differences than to smaller ones. This is the focus of the widely used Pearson correlation measure (ρ). This paper introduces a new measure, τGAP, which combines both features. System comparisons from the TREC 5 Ad Hoc track are used to illustrate the differences in emphasis achieved by τAP, ρ, and the proposed τGAP.
信息检索系统对文档进行排序,共享任务评估产生的结果可用于对信息检索系统进行排序。因此,以能够产生有用见解的方式比较排名是一项重要的功能。在进行这种比较时,给予排名列表顶部附近的比较更大的权重,而不是后面发生的比较,通常是有用的。这是广泛使用的τAP度量的重点。当分数可用时,差距敏感指标给予较大差异比较小差异更大的权重。这是广泛使用的Pearson相关度量(ρ)的焦点。本文介绍了一种结合这两个特征的新测度τGAP。来自TREC 5 Ad Hoc轨道的系统比较用于说明τAP, ρ和提出的τGAP在重点上的差异。
{"title":"A Head-Weighted Gap-Sensitive Correlation Coefficient","authors":"Ning Gao, Douglas W. Oard","doi":"10.1145/2766462.2767793","DOIUrl":"https://doi.org/10.1145/2766462.2767793","url":null,"abstract":"Information retrieval systems rank documents, and shared-task evaluations yield results that can be used to rank information retrieval systems. Comparing rankings in ways that can yield useful insights is thus an important capability. When making such comparisons, it is often useful to give greater weight to comparisons near the head of a ranked list than to what happens further down. This is the focus of the widely used τAP measure. When scores are available, gap-sensitive measures give greater weight to larger differences than to smaller ones. This is the focus of the widely used Pearson correlation measure (ρ). This paper introduces a new measure, τGAP, which combines both features. System comparisons from the TREC 5 Ad Hoc track are used to illustrate the differences in emphasis achieved by τAP, ρ, and the proposed τGAP.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133200263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiyin He, M. Bron, A. D. Vries, L. Azzopardi, M. de Rijke
Traditional batch evaluation metrics assume that user interaction with search results is limited to scanning down a ranked list. However, modern search interfaces come with additional elements supporting result list refinement (RLR) through facets and filters, making user search behavior increasingly dynamic. We develop an evaluation framework that takes a step beyond the interaction assumption of traditional evaluation metrics and allows for batch evaluation of systems with and without RLR elements. In our framework we model user interaction as switching between different sublists. This provides a measure of user effort based on the joint effect of user interaction with RLR elements and result quality. We validate our framework by conducting a user study and comparing model predictions with real user performance. Our model predictions show significant positive correlation with real user effort. Further, in contrast to traditional evaluation metrics, the predictions using our framework, of when users stand to benefit from RLR elements, reflect findings from our user study. Finally, we use the framework to investigate under what conditions systems with and without RLR elements are likely to be effective. We simulate varying conditions concerning ranking quality, users, task and interface properties demonstrating a cost-effective way to study whole system performance.
{"title":"Untangling Result List Refinement and Ranking Quality: a Framework for Evaluation and Prediction","authors":"Jiyin He, M. Bron, A. D. Vries, L. Azzopardi, M. de Rijke","doi":"10.1145/2766462.2767740","DOIUrl":"https://doi.org/10.1145/2766462.2767740","url":null,"abstract":"Traditional batch evaluation metrics assume that user interaction with search results is limited to scanning down a ranked list. However, modern search interfaces come with additional elements supporting result list refinement (RLR) through facets and filters, making user search behavior increasingly dynamic. We develop an evaluation framework that takes a step beyond the interaction assumption of traditional evaluation metrics and allows for batch evaluation of systems with and without RLR elements. In our framework we model user interaction as switching between different sublists. This provides a measure of user effort based on the joint effect of user interaction with RLR elements and result quality. We validate our framework by conducting a user study and comparing model predictions with real user performance. Our model predictions show significant positive correlation with real user effort. Further, in contrast to traditional evaluation metrics, the predictions using our framework, of when users stand to benefit from RLR elements, reflect findings from our user study. Finally, we use the framework to investigate under what conditions systems with and without RLR elements are likely to be effective. We simulate varying conditions concerning ranking quality, users, task and interface properties demonstrating a cost-effective way to study whole system performance.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132957416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}