{"title":"Session details: Session 4B: Recommending","authors":"Paul Benett","doi":"10.1145/3255925","DOIUrl":"https://doi.org/10.1145/3255925","url":null,"abstract":"","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133152645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the problem of subsequence search in databases of event-interval sequences, or e-sequences. In contrast to sequences of instantaneous events, e-sequences contain events that have a duration. In Information Retrieval applications, e-sequences are used for American Sign Language. We show that the subsequence-search problem is NP-hard and provide an exact (worst-case exponential) algorithm. We extend our algorithm to handle different cases of subsequence matching with errors. We then propose the Relation Index, a scheme for speeding up exact retrieval, which we benchmark against several indexing schemes.
{"title":"Subsequence Search in Event-Interval Sequences","authors":"Orestis Kostakis, A. Gionis","doi":"10.1145/2766462.2767778","DOIUrl":"https://doi.org/10.1145/2766462.2767778","url":null,"abstract":"We study the problem of subsequence search in databases of event-interval sequences, or e-sequences. In contrast to sequences of instantaneous events, e-sequences contain events that have a duration. In Information Retrieval applications, e-sequences are used for American Sign Language. We show that the subsequence-search problem is NP-hard and provide an exact (worst-case exponential) algorithm. We extend our algorithm to handle different cases of subsequence matching with errors. We then propose the Relation Index, a scheme for speeding up exact retrieval, which we benchmark against several indexing schemes.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134564016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Considerable work in web page classification has focused on incorporating the topical structure of the web (e.g., the hyperlink graph) to improve prediction accuracy. However, the majority of work has primarily focused on relational or graph-based methods that are impractical to run at scale or in an online environment. This raises the question of whether it is possible to leverage the topical structure of the web while incurring nearly no additional prediction-time cost. To this end, we introduce an approach which adjusts a page content-only classification from that obtained with a global prior to the posterior obtained by incorporating a prior which reflects the topic cohesion of the site. Using ODP data, we empirically demonstrate that our approach yields significant performance increases over a range of topics.
{"title":"Modeling Website Topic Cohesion at Scale to Improve Webpage Classification","authors":"D. Eswaran, Paul N. Bennett, Joseph J. Pfeiffer","doi":"10.1145/2766462.2767834","DOIUrl":"https://doi.org/10.1145/2766462.2767834","url":null,"abstract":"Considerable work in web page classification has focused on incorporating the topical structure of the web (e.g., the hyperlink graph) to improve prediction accuracy. However, the majority of work has primarily focused on relational or graph-based methods that are impractical to run at scale or in an online environment. This raises the question of whether it is possible to leverage the topical structure of the web while incurring nearly no additional prediction-time cost. To this end, we introduce an approach which adjusts a page content-only classification from that obtained with a global prior to the posterior obtained by incorporating a prior which reflects the topic cohesion of the site. Using ODP data, we empirically demonstrate that our approach yields significant performance increases over a range of topics.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129805259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many search engine users attempt to satisfy an information need by issuing multiple queries, with the expectation that each result will contribute some portion of the required information. Previous research has shown that structured or semi-structured descriptive knowledge bases (such as Wikipedia) can be used to improve search quality and experience for general or entity-centric queries. However, such resources do not have sufficient coverage of procedural knowledge, i.e. what actions should be performed and what factors should be considered to achieve some goal; such procedural knowledge is crucial when responding to task-oriented search queries. This paper provides a first attempt to bridge the gap between two evolving research areas: development of procedural knowledge bases (such as wikiHow) and task-oriented search. We investigate whether task-oriented search can benefit from existing procedural knowledge (search task suggestion) and whether automatic procedural knowledge construction can benefit from users' search activities (automatic procedural knowledge base construction). We propose to create a three-way parallel corpus of queries, query contexts, and task descriptions, and reduce both problems to sequence labeling tasks. We propose a set of textual features and structural features to identify key search phrases from task descriptions, and then adapt similar features to extract wikiHow-style procedural knowledge descriptions from search queries and relevant text snippets. We compare our proposed solution with baseline algorithms, commercial search engines, and the (manually-curated) wikiHow procedural knowledge; experimental results show an improvement of +0.28 to +0.41 in terms of Precision@8 and mean average precision (MAP).
{"title":"Leveraging Procedural Knowledge for Task-oriented Search","authors":"Zi Yang, Eric Nyberg","doi":"10.1145/2766462.2767744","DOIUrl":"https://doi.org/10.1145/2766462.2767744","url":null,"abstract":"Many search engine users attempt to satisfy an information need by issuing multiple queries, with the expectation that each result will contribute some portion of the required information. Previous research has shown that structured or semi-structured descriptive knowledge bases (such as Wikipedia) can be used to improve search quality and experience for general or entity-centric queries. However, such resources do not have sufficient coverage of procedural knowledge, i.e. what actions should be performed and what factors should be considered to achieve some goal; such procedural knowledge is crucial when responding to task-oriented search queries. This paper provides a first attempt to bridge the gap between two evolving research areas: development of procedural knowledge bases (such as wikiHow) and task-oriented search. We investigate whether task-oriented search can benefit from existing procedural knowledge (search task suggestion) and whether automatic procedural knowledge construction can benefit from users' search activities (automatic procedural knowledge base construction). We propose to create a three-way parallel corpus of queries, query contexts, and task descriptions, and reduce both problems to sequence labeling tasks. We propose a set of textual features and structural features to identify key search phrases from task descriptions, and then adapt similar features to extract wikiHow-style procedural knowledge descriptions from search queries and relevant text snippets. We compare our proposed solution with baseline algorithms, commercial search engines, and the (manually-curated) wikiHow procedural knowledge; experimental results show an improvement of +0.28 to +0.41 in terms of Precision@8 and mean average precision (MAP).","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132199745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chao Chen, Dongsheng Li, Yingying Zhao, Q. Lv, L. Shang
Matrix approximation is one of the most effective methods for collaborative filtering-based recommender systems. However, the high computation complexity of matrix factorization on large datasets limits its scalability. Prior solutions have adopted co-clustering methods to partition a large matrix into a set of smaller submatrices, which can then be processed in parallel to improve scalability. The drawback is that the recommendation accuracy is lower as the submatrices only contain subsets of the user-item rating information. This paper presents WEMAREC, a weighted and ensemble matrix approximation method for accurate and scalable recommendation. It builds upon the intuition that (sub)matrices containing more frequent samples of certain user/item/rating tend to make more reliable rating predictions for these specific user/item/rating. WEMAREC consists of two important components: (1) a weighting strategy that is computed based on the rating distribution in each submatrix and applied to approximate a single matrix containing those submatrices; and (2) an ensemble strategy that leverages user-specific and item-specific rating distributions to combine the approximation matrices of multiple sets of co-clustering results. Evaluations using real-world datasets demonstrate that WEMAREC outperforms state-of-the-art matrix approximation methods in recommendation accuracy (0.5?11.9% on the MovieLens dataset and 2.2--13.1% on the Netflix dataset) with 3--10X improvement on scalability.
{"title":"WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation","authors":"Chao Chen, Dongsheng Li, Yingying Zhao, Q. Lv, L. Shang","doi":"10.1145/2766462.2767718","DOIUrl":"https://doi.org/10.1145/2766462.2767718","url":null,"abstract":"Matrix approximation is one of the most effective methods for collaborative filtering-based recommender systems. However, the high computation complexity of matrix factorization on large datasets limits its scalability. Prior solutions have adopted co-clustering methods to partition a large matrix into a set of smaller submatrices, which can then be processed in parallel to improve scalability. The drawback is that the recommendation accuracy is lower as the submatrices only contain subsets of the user-item rating information. This paper presents WEMAREC, a weighted and ensemble matrix approximation method for accurate and scalable recommendation. It builds upon the intuition that (sub)matrices containing more frequent samples of certain user/item/rating tend to make more reliable rating predictions for these specific user/item/rating. WEMAREC consists of two important components: (1) a weighting strategy that is computed based on the rating distribution in each submatrix and applied to approximate a single matrix containing those submatrices; and (2) an ensemble strategy that leverages user-specific and item-specific rating distributions to combine the approximation matrices of multiple sets of co-clustering results. Evaluations using real-world datasets demonstrate that WEMAREC outperforms state-of-the-art matrix approximation methods in recommendation accuracy (0.5?11.9% on the MovieLens dataset and 2.2--13.1% on the Netflix dataset) with 3--10X improvement on scalability.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132210162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This short paper presents initial results from a project, in which we investigated differences in how users view relevant and irrelevant Web pages on their visits and revisits. The users' viewing of Web pages was characterized by eye-tracking measures, with a particular attention paid to changes in pupil size. The data was collected in a lab-based experiment, in which users (N=32) conducted assigned information search tasks on Wikipedia. We performed non-parametric tests of significance as well as classification. Our findings demonstrate differences in eye-tracking measures on visits and revisits to relevant and irrelevant pages and thus indicate a feasibility of predicting perceived Web document relevance from eye-tracking data. In particular, relative changes in pupil size differed significantly in almost all conditions. Our work extends results from previous studies to more realistic search scenarios and to Web page visits and revisits.
{"title":"Differences in Eye-Tracking Measures Between Visits and Revisits to Relevant and Irrelevant Web Pages","authors":"J. Gwizdka, Yinglong Zhang","doi":"10.1145/2766462.2767795","DOIUrl":"https://doi.org/10.1145/2766462.2767795","url":null,"abstract":"This short paper presents initial results from a project, in which we investigated differences in how users view relevant and irrelevant Web pages on their visits and revisits. The users' viewing of Web pages was characterized by eye-tracking measures, with a particular attention paid to changes in pupil size. The data was collected in a lab-based experiment, in which users (N=32) conducted assigned information search tasks on Wikipedia. We performed non-parametric tests of significance as well as classification. Our findings demonstrate differences in eye-tracking measures on visits and revisits to relevant and irrelevant pages and thus indicate a feasibility of predicting perceived Web document relevance from eye-tracking data. In particular, relative changes in pupil size differed significantly in almost all conditions. Our work extends results from previous studies to more realistic search scenarios and to Web page visits and revisits.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114819474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Berberich, James Caverlee, Miles Efron, C. Hauff, Vanessa Murdock, Milad Shokouhi, B. Thomee
In this workshop we aim to bring together practitioners and researchers to discuss their recent breakthroughs and the challenges with addressing spatial and temporal information access, both from the algorithmic and the architectural perspectives.
{"title":"SIGIR 2015 Workshop on Temporal, Social and Spatially-aware Information Access (#TAIA2015)","authors":"K. Berberich, James Caverlee, Miles Efron, C. Hauff, Vanessa Murdock, Milad Shokouhi, B. Thomee","doi":"10.1145/2766462.2767860","DOIUrl":"https://doi.org/10.1145/2766462.2767860","url":null,"abstract":"In this workshop we aim to bring together practitioners and researchers to discuss their recent breakthroughs and the challenges with addressing spatial and temporal information access, both from the algorithmic and the architectural perspectives.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131927001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of current big challenges in computer science is development of data management and retrieval techniques that would keep pace with the evolution of contemporary data and with the growing expectations on data processing. Various digital images became a common part of both public and enterprise data collections and there is a natural requirement that the retrieval should consider more the actual visual content of the image data. In our demonstration, we aim at the task of retrieving images that are visually and semantically similar to a given example image; the system should be able to online evaluate k nearest neighbor queries within a collection containing tens of millions of images. The applicability of such a system would be, for instance, on stock photography sites, in e-shops searching in product photos, or in collections from a constrained Web image search.
{"title":"Large-scale Image Retrieval using Neural Net Descriptors","authors":"David Novak, Michal Batko, P. Zezula","doi":"10.1145/2766462.2767868","DOIUrl":"https://doi.org/10.1145/2766462.2767868","url":null,"abstract":"One of current big challenges in computer science is\u0000development of data management and retrieval techniques that\u0000would keep pace with the evolution of contemporary data and\u0000with the growing expectations on data processing. Various\u0000digital images became a common part of both public and\u0000enterprise data collections and there is a natural requirement\u0000that the retrieval should consider more the actual visual\u0000content of the image data. In our demonstration, we aim at the\u0000task of retrieving images that are visually and semantically\u0000similar to a given example image; the system should be able to\u0000online evaluate k nearest neighbor queries within a collection\u0000containing tens of millions of images. The applicability of\u0000such a system would be, for instance, on stock photography\u0000sites, in e-shops searching in product photos, or in\u0000collections from a constrained Web image search.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131972299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Douglas W. Oard, Rashmi Sankepally, Jerome White, A. Jansen, Craig Harman
The development of a new test collection is described in which the task is to search naturally occurring spoken content using naturally occurring spoken queries. To support research on speech retrieval for low-resource settings, the collection includes terms learned by zero-resource term discovery techniques. Use of a new tool designed for exploration of spoken collections provides some additional insight into characteristics of the collection.
{"title":"A Test Collection for Spoken Gujarati Queries","authors":"Douglas W. Oard, Rashmi Sankepally, Jerome White, A. Jansen, Craig Harman","doi":"10.1145/2766462.2767791","DOIUrl":"https://doi.org/10.1145/2766462.2767791","url":null,"abstract":"The development of a new test collection is described in which the task is to search naturally occurring spoken content using naturally occurring spoken queries. To support research on speech retrieval for low-resource settings, the collection includes terms learned by zero-resource term discovery techniques. Use of a new tool designed for exploration of spoken collections provides some additional insight into characteristics of the collection.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133573510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Web search relevance is a billion dollar challenge, while there is a disadvantage of backwardness in web search competition. Vertical search result can be incorporated to enrich web search content, therefore vertical search relevance is critical to provide differentiated search results. Machine learning based ranking algorithms have shown their effectiveness for both web search and vertical search tasks. In this talk, the speaker will not only introduce state-of-the-art ranking algorithms for web search, but also cover the challenges to improve relevance of various vertical search engines: local search, shopping search, news search, etc.
{"title":"From Web Search Relevance to Vertical Search Relevance","authors":"Yi Chang","doi":"10.1145/2766462.2776787","DOIUrl":"https://doi.org/10.1145/2766462.2776787","url":null,"abstract":"Web search relevance is a billion dollar challenge, while there is a disadvantage of backwardness in web search competition. Vertical search result can be incorporated to enrich web search content, therefore vertical search relevance is critical to provide differentiated search results. Machine learning based ranking algorithms have shown their effectiveness for both web search and vertical search tasks. In this talk, the speaker will not only introduce state-of-the-art ranking algorithms for web search, but also cover the challenges to improve relevance of various vertical search engines: local search, shopping search, news search, etc.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133045197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}