Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

英文中文

Session details: Session 4B: Recommending 会话详情:会话4B:推荐

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2015-08-09 DOI: 10.1145/3255925

Paul Benett

引用次数: 0

Subsequence Search in Event-Interval Sequences 事件间隔序列的子序列搜索

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2015-08-09 DOI: 10.1145/2766462.2767778

Orestis Kostakis, A. Gionis

We study the problem of subsequence search in databases of event-interval sequences, or e-sequences. In contrast to sequences of instantaneous events, e-sequences contain events that have a duration. In Information Retrieval applications, e-sequences are used for American Sign Language. We show that the subsequence-search problem is NP-hard and provide an exact (worst-case exponential) algorithm. We extend our algorithm to handle different cases of subsequence matching with errors. We then propose the Relation Index, a scheme for speeding up exact retrieval, which we benchmark against several indexing schemes.

研究事件间隔序列(e-sequence)数据库中的子序列搜索问题。与瞬时事件序列相反，e序列包含具有持续时间的事件。在信息检索应用中，e序列用于美国手语。我们证明了子序列搜索问题是np困难的，并提供了一个精确的(最坏情况指数)算法。我们扩展了我们的算法来处理不同情况下的子序列匹配的错误。然后我们提出了关系索引，这是一种加速精确检索的方案，我们对几种索引方案进行了基准测试。

引用次数: 10

Modeling Website Topic Cohesion at Scale to Improve Webpage Classification 大规模建模网站主题内聚以改进网页分类

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2015-08-09 DOI: 10.1145/2766462.2767834

D. Eswaran, Paul N. Bennett, Joseph J. Pfeiffer

Considerable work in web page classification has focused on incorporating the topical structure of the web (e.g., the hyperlink graph) to improve prediction accuracy. However, the majority of work has primarily focused on relational or graph-based methods that are impractical to run at scale or in an online environment. This raises the question of whether it is possible to leverage the topical structure of the web while incurring nearly no additional prediction-time cost. To this end, we introduce an approach which adjusts a page content-only classification from that obtained with a global prior to the posterior obtained by incorporating a prior which reflects the topic cohesion of the site. Using ODP data, we empirically demonstrate that our approach yields significant performance increases over a range of topics.

网页分类的大量工作集中在结合网页的主题结构(例如，超链接图)以提高预测准确性。然而，大部分工作主要集中在关系或基于图的方法上，这些方法在大规模或在线环境中运行是不切实际的。这就提出了一个问题，即是否有可能在几乎不产生额外预测时间成本的情况下利用网络的主题结构。为此，我们引入了一种方法，该方法通过结合反映网站主题凝聚力的先验获得全局先验后验，从而调整页面内容分类。使用ODP数据，我们通过经验证明，我们的方法在一系列主题上产生了显着的性能提高。

引用次数: 1

Leveraging Procedural Knowledge for Task-oriented Search 利用程序知识进行面向任务的搜索

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2015-08-09 DOI: 10.1145/2766462.2767744

Zi Yang, Eric Nyberg

Many search engine users attempt to satisfy an information need by issuing multiple queries, with the expectation that each result will contribute some portion of the required information. Previous research has shown that structured or semi-structured descriptive knowledge bases (such as Wikipedia) can be used to improve search quality and experience for general or entity-centric queries. However, such resources do not have sufficient coverage of procedural knowledge, i.e. what actions should be performed and what factors should be considered to achieve some goal; such procedural knowledge is crucial when responding to task-oriented search queries. This paper provides a first attempt to bridge the gap between two evolving research areas: development of procedural knowledge bases (such as wikiHow) and task-oriented search. We investigate whether task-oriented search can benefit from existing procedural knowledge (search task suggestion) and whether automatic procedural knowledge construction can benefit from users' search activities (automatic procedural knowledge base construction). We propose to create a three-way parallel corpus of queries, query contexts, and task descriptions, and reduce both problems to sequence labeling tasks. We propose a set of textual features and structural features to identify key search phrases from task descriptions, and then adapt similar features to extract wikiHow-style procedural knowledge descriptions from search queries and relevant text snippets. We compare our proposed solution with baseline algorithms, commercial search engines, and the (manually-curated) wikiHow procedural knowledge; experimental results show an improvement of +0.28 to +0.41 in terms of Precision@8 and mean average precision (MAP).

许多搜索引擎用户试图通过发出多个查询来满足信息需求，期望每个结果都能提供所需信息的一部分。先前的研究表明，结构化或半结构化的描述性知识库(如Wikipedia)可用于提高一般或以实体为中心的查询的搜索质量和体验。但是，这些资源没有充分涵盖程序知识，即应该采取什么行动，应该考虑什么因素来实现某个目标;在响应面向任务的搜索查询时，这种程序性知识是至关重要的。本文首次尝试弥合两个不断发展的研究领域之间的鸿沟:程序知识库的开发(如wikiHow)和面向任务的搜索。我们研究了面向任务的搜索是否可以从已有的程序知识中获益(搜索任务建议)，以及自动程序知识构建是否可以从用户的搜索活动中获益(自动程序知识库构建)。我们建议创建一个查询、查询上下文和任务描述的三向并行语料库，并将这两个问题简化为序列标记任务。我们提出了一组文本特征和结构特征来从任务描述中识别关键搜索短语，然后利用相似的特征从搜索查询和相关文本片段中提取wikihow风格的过程性知识描述。我们将我们提出的解决方案与基线算法、商业搜索引擎和(人工策划的)wikiHow程序知识进行比较;实验结果表明，该方法在Precision@8和平均精度(MAP)方面提高了+0.28 ~ +0.41。

{"title":"Leveraging Procedural Knowledge for Task-oriented Search","authors":"Zi Yang, Eric Nyberg","doi":"10.1145/2766462.2767744","DOIUrl":"https://doi.org/10.1145/2766462.2767744","url":null,"abstract":"Many search engine users attempt to satisfy an information need by issuing multiple queries, with the expectation that each result will contribute some portion of the required information. Previous research has shown that structured or semi-structured descriptive knowledge bases (such as Wikipedia) can be used to improve search quality and experience for general or entity-centric queries. However, such resources do not have sufficient coverage of procedural knowledge, i.e. what actions should be performed and what factors should be considered to achieve some goal; such procedural knowledge is crucial when responding to task-oriented search queries. This paper provides a first attempt to bridge the gap between two evolving research areas: development of procedural knowledge bases (such as wikiHow) and task-oriented search. We investigate whether task-oriented search can benefit from existing procedural knowledge (search task suggestion) and whether automatic procedural knowledge construction can benefit from users' search activities (automatic procedural knowledge base construction). We propose to create a three-way parallel corpus of queries, query contexts, and task descriptions, and reduce both problems to sequence labeling tasks. We propose a set of textual features and structural features to identify key search phrases from task descriptions, and then adapt similar features to extract wikiHow-style procedural knowledge descriptions from search queries and relevant text snippets. We compare our proposed solution with baseline algorithms, commercial search engines, and the (manually-curated) wikiHow procedural knowledge; experimental results show an improvement of +0.28 to +0.41 in terms of Precision@8 and mean average precision (MAP).","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132199745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation WEMAREC:基于加权和集合矩阵逼近的精确可扩展推荐

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2015-08-09 DOI: 10.1145/2766462.2767718

Chao Chen, Dongsheng Li, Yingying Zhao, Q. Lv, L. Shang

Matrix approximation is one of the most effective methods for collaborative filtering-based recommender systems. However, the high computation complexity of matrix factorization on large datasets limits its scalability. Prior solutions have adopted co-clustering methods to partition a large matrix into a set of smaller submatrices, which can then be processed in parallel to improve scalability. The drawback is that the recommendation accuracy is lower as the submatrices only contain subsets of the user-item rating information. This paper presents WEMAREC, a weighted and ensemble matrix approximation method for accurate and scalable recommendation. It builds upon the intuition that (sub)matrices containing more frequent samples of certain user/item/rating tend to make more reliable rating predictions for these specific user/item/rating. WEMAREC consists of two important components: (1) a weighting strategy that is computed based on the rating distribution in each submatrix and applied to approximate a single matrix containing those submatrices; and (2) an ensemble strategy that leverages user-specific and item-specific rating distributions to combine the approximation matrices of multiple sets of co-clustering results. Evaluations using real-world datasets demonstrate that WEMAREC outperforms state-of-the-art matrix approximation methods in recommendation accuracy (0.5?11.9% on the MovieLens dataset and 2.2--13.1% on the Netflix dataset) with 3--10X improvement on scalability.

矩阵逼近是基于协同过滤的推荐系统中最有效的方法之一。然而，大数据集上矩阵分解的高计算复杂度限制了其可扩展性。先前的解决方案采用共聚类方法将大矩阵划分为一组较小的子矩阵，然后可以并行处理以提高可伸缩性。缺点是推荐的准确性较低，因为子矩阵只包含用户-物品评级信息的子集。本文提出了一种加权集合矩阵近似方法WEMAREC，用于精确和可扩展的推荐。它建立在直觉的基础上，即(子)矩阵包含更频繁的某些用户/项目/评级样本，倾向于对这些特定的用户/项目/评级做出更可靠的评级预测。WEMAREC包括两个重要组成部分:(1)加权策略，该策略基于每个子矩阵的评级分布计算，并应用于近似包含这些子矩阵的单个矩阵;(2)一种集成策略，该策略利用特定于用户和特定于商品的评级分布来组合多组共聚类结果的近似矩阵。使用真实数据集的评估表明，WEMAREC在推荐精度方面优于最先进的矩阵近似方法(在MovieLens数据集上为0.5 ~ 11.9%，在Netflix数据集上为2.2 ~ 13.1%)，可扩展性提高了3 ~ 10倍。

{"title":"WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation","authors":"Chao Chen, Dongsheng Li, Yingying Zhao, Q. Lv, L. Shang","doi":"10.1145/2766462.2767718","DOIUrl":"https://doi.org/10.1145/2766462.2767718","url":null,"abstract":"Matrix approximation is one of the most effective methods for collaborative filtering-based recommender systems. However, the high computation complexity of matrix factorization on large datasets limits its scalability. Prior solutions have adopted co-clustering methods to partition a large matrix into a set of smaller submatrices, which can then be processed in parallel to improve scalability. The drawback is that the recommendation accuracy is lower as the submatrices only contain subsets of the user-item rating information. This paper presents WEMAREC, a weighted and ensemble matrix approximation method for accurate and scalable recommendation. It builds upon the intuition that (sub)matrices containing more frequent samples of certain user/item/rating tend to make more reliable rating predictions for these specific user/item/rating. WEMAREC consists of two important components: (1) a weighting strategy that is computed based on the rating distribution in each submatrix and applied to approximate a single matrix containing those submatrices; and (2) an ensemble strategy that leverages user-specific and item-specific rating distributions to combine the approximation matrices of multiple sets of co-clustering results. Evaluations using real-world datasets demonstrate that WEMAREC outperforms state-of-the-art matrix approximation methods in recommendation accuracy (0.5?11.9% on the MovieLens dataset and 2.2--13.1% on the Netflix dataset) with 3--10X improvement on scalability.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132210162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Differences in Eye-Tracking Measures Between Visits and Revisits to Relevant and Irrelevant Web Pages 访问相关和不相关网页之间的眼动追踪测量差异

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2015-08-09 DOI: 10.1145/2766462.2767795

J. Gwizdka, Yinglong Zhang

This short paper presents initial results from a project, in which we investigated differences in how users view relevant and irrelevant Web pages on their visits and revisits. The users' viewing of Web pages was characterized by eye-tracking measures, with a particular attention paid to changes in pupil size. The data was collected in a lab-based experiment, in which users (N=32) conducted assigned information search tasks on Wikipedia. We performed non-parametric tests of significance as well as classification. Our findings demonstrate differences in eye-tracking measures on visits and revisits to relevant and irrelevant pages and thus indicate a feasibility of predicting perceived Web document relevance from eye-tracking data. In particular, relative changes in pupil size differed significantly in almost all conditions. Our work extends results from previous studies to more realistic search scenarios and to Web page visits and revisits.

这篇短文介绍了一个项目的初步结果，在这个项目中，我们调查了用户在访问和重新访问时如何查看相关和不相关网页的差异。用户浏览网页的特征是通过眼球追踪测量，特别注意瞳孔大小的变化。数据是在实验室实验中收集的，在实验中，用户(N=32)在维基百科上执行指定的信息搜索任务。我们进行了非参数显著性检验和分类。我们的研究结果证明了眼动追踪在相关和不相关页面的访问和重访上的差异，从而表明了通过眼动追踪数据预测感知到的网络文档相关性的可行性。特别是，在几乎所有条件下，瞳孔大小的相对变化都有显著差异。我们的工作将以前的研究结果扩展到更现实的搜索场景和Web页面访问和重新访问。

引用次数: 34

SIGIR 2015 Workshop on Temporal, Social and Spatially-aware Information Access (#TAIA2015) SIGIR 2015时间、社会和空间感知信息访问研讨会(#TAIA2015)

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2015-08-09 DOI: 10.1145/2766462.2767860

K. Berberich, James Caverlee, Miles Efron, C. Hauff, Vanessa Murdock, Milad Shokouhi, B. Thomee

In this workshop we aim to bring together practitioners and researchers to discuss their recent breakthroughs and the challenges with addressing spatial and temporal information access, both from the algorithmic and the architectural perspectives.

在本次研讨会中，我们的目标是将从业者和研究人员聚集在一起，从算法和建筑的角度讨论他们在解决空间和时间信息访问方面的最新突破和挑战。

引用次数: 1

Large-scale Image Retrieval using Neural Net Descriptors 基于神经网络描述符的大规模图像检索

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2015-08-09 DOI: 10.1145/2766462.2767868

David Novak, Michal Batko, P. Zezula

One of current big challenges in computer science isdevelopment of data management and retrieval techniques thatwould keep pace with the evolution of contemporary data andwith the growing expectations on data processing. Variousdigital images became a common part of both public andenterprise data collections and there is a natural requirementthat the retrieval should consider more the actual visualcontent of the image data. In our demonstration, we aim at thetask of retrieving images that are visually and semanticallysimilar to a given example image; the system should be able toonline evaluate k nearest neighbor queries within a collectioncontaining tens of millions of images. The applicability ofsuch a system would be, for instance, on stock photographysites, in e-shops searching in product photos, or incollections from a constrained Web image search.

当前计算机科学面临的重大挑战之一是数据管理和检索技术的发展，以跟上当代数据的发展和对数据处理日益增长的期望。各种数字图像成为公共和企业数据收集的共同组成部分，自然要求检索应更多地考虑图像数据的实际视觉内容。在我们的演示中，我们的目标是检索与给定示例图像在视觉和语义上相似的图像;系统应该能够在线评估包含数千万张图像的集合中的k个最近邻查询。这样一个系统的适用性将是，例如，在库存照片，在电子商店中搜索产品照片，或从一个受限的Web图像搜索集合。

引用次数: 29

A Test Collection for Spoken Gujarati Queries 古吉拉特语口语查询测试集

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2015-08-09 DOI: 10.1145/2766462.2767791

Douglas W. Oard, Rashmi Sankepally, Jerome White, A. Jansen, Craig Harman

The development of a new test collection is described in which the task is to search naturally occurring spoken content using naturally occurring spoken queries. To support research on speech retrieval for low-resource settings, the collection includes terms learned by zero-resource term discovery techniques. Use of a new tool designed for exploration of spoken collections provides some additional insight into characteristics of the collection.

描述了一个新测试集合的开发，其中的任务是使用自然发生的口语查询搜索自然发生的口语内容。为了支持低资源设置下的语音检索研究，该集合包括通过零资源术语发现技术学习的术语。使用一个为探索口语集合而设计的新工具提供了对集合特征的一些额外见解。

引用次数: 0

From Web Search Relevance to Vertical Search Relevance 从网络搜索相关性到垂直搜索相关性

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pub Date : 2015-08-09 DOI: 10.1145/2766462.2776787

Yi Chang

Web search relevance is a billion dollar challenge, while there is a disadvantage of backwardness in web search competition. Vertical search result can be incorporated to enrich web search content, therefore vertical search relevance is critical to provide differentiated search results. Machine learning based ranking algorithms have shown their effectiveness for both web search and vertical search tasks. In this talk, the speaker will not only introduce state-of-the-art ranking algorithms for web search, but also cover the challenges to improve relevance of various vertical search engines: local search, shopping search, news search, etc.

网络搜索相关性是一个数十亿美元的挑战，而在网络搜索竞争中存在着落后的劣势。垂直搜索结果可以用来丰富网页搜索内容，因此垂直搜索相关性对于提供差异化的搜索结果至关重要。基于机器学习的排名算法在网络搜索和垂直搜索任务中都显示出了它们的有效性。在这次演讲中，演讲者不仅会介绍最先进的网络搜索排名算法，还会介绍提高各种垂直搜索引擎相关性的挑战:本地搜索、购物搜索、新闻搜索等。

引用次数: 1

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀