首页 > 最新文献

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

英文 中文
Influence of Vertical Result in Web Search Examination 垂直结果对网络搜索考试的影响
Zeyang Liu, Yiqun Liu, K. Zhou, Min Zhang, Shaoping Ma
Research in how users examine results on search engine result pages (SERPs) helps improve result ranking, advertisement placement, performance evaluation and search UI design. Although examination behavior on organic search results (also known as "ten blue links") has been well studied in existing works, there lacks a thorough investigation on how users examine SERPs with verticals. Considering the fact that a large fraction of SERPs are served with one or more verticals in the practical Web search scenario, it is of vital importance to understand the influence of vertical results on search examination behaviors. In this paper, we focus on five popular vertical types and try to study their influences on users' examination processes in both cases when they are relevant or irrelevant to the search queries. With examination behavior data collected with an eye-tracking device, we show the existence of vertical-aware user behavior effects including vertical attraction effect, examination cut-off effect in the presence of a relevant vertical, and examination spill-over effect in the presence of an irrelevant vertical. Furthermore, we are also among the first to systematically investigate the internal examination behavior within the vertical results. We believe that this work will promote our understanding of user interactions with federated search engines and bring benefit to the construction of search performance evaluations.
研究用户如何检查搜索引擎结果页面(serp)上的结果有助于改进结果排名、广告位置、性能评估和搜索UI设计。虽然已有研究对自然搜索结果(也称为“十个蓝链接”)的审查行为进行了很好的研究,但缺乏对用户如何审查垂直搜索结果的彻底调查。考虑到在实际的Web搜索场景中,有很大一部分serp是由一个或多个垂直服务提供的,因此了解垂直结果对搜索审查行为的影响至关重要。在本文中,我们关注五种流行的垂直类型,并试图研究它们在与搜索查询相关或不相关的两种情况下对用户检查过程的影响。通过眼动追踪设备收集的检查行为数据,我们证明了垂直感知用户行为效应的存在,包括垂直吸引效应、相关垂直存在时的检查切断效应和不相关垂直存在时的检查溢出效应。此外,我们也是第一个系统地调查内部考试行为在垂直结果。我们相信这项工作将促进我们对用户与联邦搜索引擎交互的理解,并为搜索性能评估的构建带来好处。
{"title":"Influence of Vertical Result in Web Search Examination","authors":"Zeyang Liu, Yiqun Liu, K. Zhou, Min Zhang, Shaoping Ma","doi":"10.1145/2766462.2767714","DOIUrl":"https://doi.org/10.1145/2766462.2767714","url":null,"abstract":"Research in how users examine results on search engine result pages (SERPs) helps improve result ranking, advertisement placement, performance evaluation and search UI design. Although examination behavior on organic search results (also known as \"ten blue links\") has been well studied in existing works, there lacks a thorough investigation on how users examine SERPs with verticals. Considering the fact that a large fraction of SERPs are served with one or more verticals in the practical Web search scenario, it is of vital importance to understand the influence of vertical results on search examination behaviors. In this paper, we focus on five popular vertical types and try to study their influences on users' examination processes in both cases when they are relevant or irrelevant to the search queries. With examination behavior data collected with an eye-tracking device, we show the existence of vertical-aware user behavior effects including vertical attraction effect, examination cut-off effect in the presence of a relevant vertical, and examination spill-over effect in the presence of an irrelevant vertical. Furthermore, we are also among the first to systematically investigate the internal examination behavior within the vertical results. We believe that this work will promote our understanding of user interactions with federated search engines and bring benefit to the construction of search performance evaluations.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122152929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
Context-aware Point-of-Interest Recommendation Using Tensor Factorization with Social Regularization 基于社会正则化的张量分解的上下文感知兴趣点推荐
Lina Yao, Quan Z. Sheng, Yongrui Qin, Xianzhi Wang, A. Shemshadi, Qi He
Point-of-Interest (POI) recommendation is a new type of recommendation task that comes along with the prevalence of location-based social networks in recent years. Compared with traditional tasks, it focuses more on personalized, context-aware recommendation results to provide better user experience. To address this new challenge, we propose a Collaborative Filtering method based on Non-negative Tensor Factorization, a generalization of the Matrix Factorization approach that exploits a high-order tensor instead of traditional User-Location matrix to model multi-dimensional contextual information. The factorization of this tensor leads to a compact model of the data which is specially suitable for context-aware POI recommendations. In addition, we fuse users' social relations as regularization terms of the factorization to improve the recommendation accuracy. Experimental results on real-world datasets demonstrate the effectiveness of our approach.
兴趣点推荐是近年来随着基于位置的社交网络的普及而出现的一种新型推荐任务。与传统任务相比,它更注重个性化、上下文感知的推荐结果,以提供更好的用户体验。为了解决这一新的挑战,我们提出了一种基于非负张量分解的协同过滤方法,这是矩阵分解方法的一种推广,利用高阶张量而不是传统的用户位置矩阵来建模多维上下文信息。这个张量的因式分解导致数据的紧凑模型,特别适合上下文感知的POI建议。此外,我们将用户的社会关系作为分解的正则化项来融合,以提高推荐的准确率。在真实数据集上的实验结果证明了我们方法的有效性。
{"title":"Context-aware Point-of-Interest Recommendation Using Tensor Factorization with Social Regularization","authors":"Lina Yao, Quan Z. Sheng, Yongrui Qin, Xianzhi Wang, A. Shemshadi, Qi He","doi":"10.1145/2766462.2767794","DOIUrl":"https://doi.org/10.1145/2766462.2767794","url":null,"abstract":"Point-of-Interest (POI) recommendation is a new type of recommendation task that comes along with the prevalence of location-based social networks in recent years. Compared with traditional tasks, it focuses more on personalized, context-aware recommendation results to provide better user experience. To address this new challenge, we propose a Collaborative Filtering method based on Non-negative Tensor Factorization, a generalization of the Matrix Factorization approach that exploits a high-order tensor instead of traditional User-Location matrix to model multi-dimensional contextual information. The factorization of this tensor leads to a compact model of the data which is specially suitable for context-aware POI recommendations. In addition, we fuse users' social relations as regularization terms of the factorization to improve the recommendation accuracy. Experimental results on real-world datasets demonstrate the effectiveness of our approach.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125872995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
Inter-Category Variation in Location Search 位置搜索的类别间变化
Chia-Jung Lee, Nick Craswell, Vanessa Murdock
When searching for place entities such as businesses or points of interest, the desired place may be close (finding the nearest ATM) or far away (finding a hotel in another city). Understanding the role of distance in predicting user interests can guide the design of location search and recommendation systems. We analyze a large dataset of location searches on GPS-enabled mobile devices with 15 location categories. We model user-location distance based on raw geographic distance (kilometers) and intervening opportunities (nth closest). Both models are helpful in predicting user interests, with the intervening opportunity model performing somewhat better. We find significant inter-category variation. For instance, the closest movie theater is selected in 17.7% of cases, while the closest restaurant in only 2.1% of cases. Overall, we recommend taking category information into account when modeling location preferences of users in search and recommendation systems.
当搜索地点实体(如企业或兴趣点)时,想要的地方可能很近(查找最近的自动取款机),也可能很远(查找另一个城市的酒店)。了解距离在预测用户兴趣中的作用可以指导位置搜索和推荐系统的设计。我们分析了一个大型数据集,其中包含15个位置类别的gps移动设备上的位置搜索。我们基于原始地理距离(千米)和干预机会(第n个最近的)对用户位置距离进行建模。这两种模型都有助于预测用户兴趣,其中干预机会模型表现得更好。我们发现显著的类别间差异。例如,在17.7%的情况下,选择最近的电影院,而最近的餐馆只有2.1%的情况。总的来说,我们建议在搜索和推荐系统中建模用户的位置偏好时考虑类别信息。
{"title":"Inter-Category Variation in Location Search","authors":"Chia-Jung Lee, Nick Craswell, Vanessa Murdock","doi":"10.1145/2766462.2767797","DOIUrl":"https://doi.org/10.1145/2766462.2767797","url":null,"abstract":"When searching for place entities such as businesses or points of interest, the desired place may be close (finding the nearest ATM) or far away (finding a hotel in another city). Understanding the role of distance in predicting user interests can guide the design of location search and recommendation systems. We analyze a large dataset of location searches on GPS-enabled mobile devices with 15 location categories. We model user-location distance based on raw geographic distance (kilometers) and intervening opportunities (nth closest). Both models are helpful in predicting user interests, with the intervening opportunity model performing somewhat better. We find significant inter-category variation. For instance, the closest movie theater is selected in 17.7% of cases, while the closest restaurant in only 2.1% of cases. Overall, we recommend taking category information into account when modeling location preferences of users in search and recommendation systems.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129493213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Impact of Surrogate Assessments on High-Recall Retrieval 替代评价对高查全率检索的影响
Adam Roegiest, G. Cormack, C. Clarke, Maura R. Grossman
We are concerned with the effect of using a surrogate assessor to train a passive (i.e., batch) supervised-learning method to rank documents for subsequent review, where the effectiveness of the ranking will be evaluated using a different assessor deemed to be authoritative. Previous studies suggest that surrogate assessments may be a reasonable proxy for authoritative assessments for this task. Nonetheless, concern persists in some application domains---such as electronic discovery---that errors in surrogate training assessments will be amplified by the learning method, materially degrading performance. We demonstrate, through a re-analysis of data used in previous studies, that, with passive supervised-learning methods, using surrogate assessments for training can substantially impair classifier performance, relative to using the same deemed-authoritative assessor for both training and assessment. In particular, using a single surrogate to replace the authoritative assessor for training often yields a ranking that must be traversed much lower to achieve the same level of recall as the ranking that would have resulted had the authoritative assessor been used for training. We also show that steps can be taken to mitigate, and sometimes overcome, the impact of surrogate assessments for training: relevance assessments may be diversified through the use of multiple surrogates; and, a more liberal view of relevance can be adopted by having the surrogate label borderline documents as relevant. By taking these steps, rankings derived from surrogate assessments can match, and sometimes exceed, the performance of the ranking that would have been achieved, had the authority been used for training. Finally, we show that our results still hold when the role of surrogate and authority are interchanged, indicating that the results may simply reflect differing conceptions of relevance between surrogate and authority, as opposed to the authority having special skill or knowledge lacked by the surrogate.
我们关注的是使用代理评估器来训练被动(即批处理)监督学习方法对文档进行排名以供后续审查的效果,其中排名的有效性将使用被认为是权威的不同评估器进行评估。先前的研究表明,替代评估可能是这项任务的权威评估的合理代理。尽管如此,在某些应用领域(如电子发现),人们仍然担心替代训练评估中的错误会被学习方法放大,从而严重降低性能。通过对先前研究中使用的数据的重新分析,我们证明,在被动监督学习方法中,相对于在训练和评估中使用相同的被认为是权威的评估器,使用替代评估进行训练会严重损害分类器的性能。特别是,使用一个代理来代替权威评估器进行培训,通常会产生一个必须遍历更低的排名,才能达到与使用权威评估器进行培训所产生的排名相同的召回水平。我们还表明,可以采取措施减轻、有时甚至克服替代评估对培训的影响:相关性评估可以通过使用多个替代评估来实现多样化;而且,可以采用一种更自由的相关性观点,即让代理将边缘文档标记为相关。通过采取这些步骤,从替代评估得出的排名可以达到,有时甚至超过,如果将该权威用于培训,将会达到的排名表现。最后,我们表明,当代理和权威的角色互换时,我们的结果仍然成立,这表明结果可能只是反映了代理和权威之间相关性的不同概念,而不是代理缺乏特殊技能或知识的权威。
{"title":"Impact of Surrogate Assessments on High-Recall Retrieval","authors":"Adam Roegiest, G. Cormack, C. Clarke, Maura R. Grossman","doi":"10.1145/2766462.2767754","DOIUrl":"https://doi.org/10.1145/2766462.2767754","url":null,"abstract":"We are concerned with the effect of using a surrogate assessor to train a passive (i.e., batch) supervised-learning method to rank documents for subsequent review, where the effectiveness of the ranking will be evaluated using a different assessor deemed to be authoritative. Previous studies suggest that surrogate assessments may be a reasonable proxy for authoritative assessments for this task. Nonetheless, concern persists in some application domains---such as electronic discovery---that errors in surrogate training assessments will be amplified by the learning method, materially degrading performance. We demonstrate, through a re-analysis of data used in previous studies, that, with passive supervised-learning methods, using surrogate assessments for training can substantially impair classifier performance, relative to using the same deemed-authoritative assessor for both training and assessment. In particular, using a single surrogate to replace the authoritative assessor for training often yields a ranking that must be traversed much lower to achieve the same level of recall as the ranking that would have resulted had the authoritative assessor been used for training. We also show that steps can be taken to mitigate, and sometimes overcome, the impact of surrogate assessments for training: relevance assessments may be diversified through the use of multiple surrogates; and, a more liberal view of relevance can be adopted by having the surrogate label borderline documents as relevant. By taking these steps, rankings derived from surrogate assessments can match, and sometimes exceed, the performance of the ranking that would have been achieved, had the authority been used for training. Finally, we show that our results still hold when the role of surrogate and authority are interchanged, indicating that the results may simply reflect differing conceptions of relevance between surrogate and authority, as opposed to the authority having special skill or knowledge lacked by the surrogate.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129488819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Probabilistic Multileave for Online Retrieval Evaluation 基于概率多leave的在线检索评价
Anne Schuth, Robert-Jan Bruintjes, Fritjof Buüttner, J. Doorn, C. Groenland, Harrie Oosterhuis, Cong-Nguyen Tran, Bastiaan S. Veeling, Jos van der Velde, R. Wechsler, David Woudenberg, M. de Rijke
Online evaluation methods for information retrieval use implicit signals such as clicks from users to infer preferences between rankers. A highly sensitive way of inferring these preferences is through interleaved comparisons. Recently, interleaved comparisons methods that allow for simultaneous evaluation of more than two rankers have been introduced. These so-called multileaving methods are even more sensitive than their interleaving counterparts. Probabilistic interleaving--whose main selling point is the potential for reuse of historical data--has no multileaving counterpart yet. We propose probabilistic multileave and empirically show that it is highly sensitive and unbiased. An important implication of this result is that historical interactions with multileaved comparisons can be reused, allowing for ranker comparisons that need much less user interaction data. Furthermore, we show that our method, as opposed to earlier sensitive multileaving methods, scales well when the number of rankers increases.
信息检索的在线评价方法使用用户点击等隐式信号来推断排名者之间的偏好。推断这些偏好的一种高度敏感的方法是通过交错比较。最近,已经引入了允许同时评估两个以上排名的交错比较方法。这些所谓的多重离开方法甚至比交错离开方法更加敏感。概率交错——其主要卖点是历史数据重用的潜力——目前还没有对应的多间隔。我们提出了概率多leave,并实证证明了它具有高度的敏感性和无偏性。该结果的一个重要含义是,可以重用与多叶比较的历史交互,从而允许需要更少用户交互数据的排名比较。此外,我们表明,与早期的敏感多离开方法相反,当排名器数量增加时,我们的方法可以很好地扩展。
{"title":"Probabilistic Multileave for Online Retrieval Evaluation","authors":"Anne Schuth, Robert-Jan Bruintjes, Fritjof Buüttner, J. Doorn, C. Groenland, Harrie Oosterhuis, Cong-Nguyen Tran, Bastiaan S. Veeling, Jos van der Velde, R. Wechsler, David Woudenberg, M. de Rijke","doi":"10.1145/2766462.2767838","DOIUrl":"https://doi.org/10.1145/2766462.2767838","url":null,"abstract":"Online evaluation methods for information retrieval use implicit signals such as clicks from users to infer preferences between rankers. A highly sensitive way of inferring these preferences is through interleaved comparisons. Recently, interleaved comparisons methods that allow for simultaneous evaluation of more than two rankers have been introduced. These so-called multileaving methods are even more sensitive than their interleaving counterparts. Probabilistic interleaving--whose main selling point is the potential for reuse of historical data--has no multileaving counterpart yet. We propose probabilistic multileave and empirically show that it is highly sensitive and unbiased. An important implication of this result is that historical interactions with multileaved comparisons can be reused, allowing for ranker comparisons that need much less user interaction data. Furthermore, we show that our method, as opposed to earlier sensitive multileaving methods, scales well when the number of rankers increases.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128231587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
How Random Decisions Affect Selective Distributed Search 随机决策如何影响选择性分布式搜索
Zhuyun Dai, Yubin Kim, James P. Callan
Selective distributed search is a retrieval architecture that reduces search costs by partitioning a corpus into topical shards such that only a few shards need to be searched for each query. Prior research created topical shards by using random seed documents to cluster a random sample of the full corpus. The resource selection algorithm might use a different random sample of the corpus. These random components make selective search non-deterministic. This paper studies how these random components affect experimental results. Experiments on two ClueWeb09 corpora and four query sets show that in spite of random components, selective search is stable for most queries.
选择性分布式搜索是一种检索体系结构,它通过将语料库划分为主题碎片来降低搜索成本,这样每个查询只需要搜索几个碎片。先前的研究通过使用随机种子文档对完整语料库的随机样本进行聚类来创建主题碎片。资源选择算法可能使用语料库的不同随机样本。这些随机成分使得选择性搜索不确定。本文研究了这些随机成分对实验结果的影响。在两个ClueWeb09语料库和四个查询集上的实验表明,尽管存在随机成分,选择性搜索对大多数查询是稳定的。
{"title":"How Random Decisions Affect Selective Distributed Search","authors":"Zhuyun Dai, Yubin Kim, James P. Callan","doi":"10.1145/2766462.2767796","DOIUrl":"https://doi.org/10.1145/2766462.2767796","url":null,"abstract":"Selective distributed search is a retrieval architecture that reduces search costs by partitioning a corpus into topical shards such that only a few shards need to be searched for each query. Prior research created topical shards by using random seed documents to cluster a random sample of the full corpus. The resource selection algorithm might use a different random sample of the corpus. These random components make selective search non-deterministic. This paper studies how these random components affect experimental results. Experiments on two ClueWeb09 corpora and four query sets show that in spite of random components, selective search is stable for most queries.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"48 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131478051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
GeoSoCa: Exploiting Geographical, Social and Categorical Correlations for Point-of-Interest Recommendations GeoSoCa:为兴趣点推荐开发地理、社会和分类相关性
Jiadong Zhang, Chi-Yin Chow
Recommending users with their preferred points-of-interest (POIs), e.g., museums and restaurants, has become an important feature for location-based social networks (LBSNs), which benefits people to explore new places and businesses to discover potential customers. However, because users only check in a few POIs in an LBSN, the user-POI check-in interaction is highly sparse, which renders a big challenge for POI recommendations. To tackle this challenge, in this study we propose a new POI recommendation approach called GeoSoCa through exploiting geographical correlations, social correlations and categorical correlations among users and POIs. The geographical, social and categorical correlations can be learned from the historical check-in data of users on POIs and utilized to predict the relevance score of a user to an unvisited POI so as to make recommendations for users. First, in GeoSoCa we propose a kernel estimation method with an adaptive bandwidth to determine a personalized check-in distribution of POIs for each user that naturally models the geographical correlations between POIs. Then, GeoSoCa aggregates the check-in frequency or rating of a user's friends on a POI and models the social check-in frequency or rating as a power-law distribution to employ the social correlations between users. Further, GeoSoCa applies the bias of a user on a POI category to weigh the popularity of a POI in the corresponding category and models the weighed popularity as a power-law distribution to leverage the categorical correlations between POIs. Finally, we conduct a comprehensive performance evaluation for GeoSoCa using two large-scale real-world check-in data sets collected from Foursquare and Yelp. Experimental results show that GeoSoCa achieves significantly superior recommendation quality compared to other state-of-the-art POI recommendation techniques.
向用户推荐他们喜欢的兴趣点(poi),例如博物馆和餐馆,已经成为基于位置的社交网络(LBSNs)的一个重要功能,它有利于人们探索新的地方和企业,发现潜在的客户。然而,由于用户只签入LBSN中的几个POI,因此用户-POI签入交互非常稀疏,这对POI推荐提出了很大的挑战。为了应对这一挑战,在本研究中,我们通过利用用户和POI之间的地理相关性、社会相关性和分类相关性,提出了一种名为GeoSoCa的新的POI推荐方法。可以从用户在POI上的历史签到数据中了解地理、社会和分类相关性,并利用这些相关性预测用户与未访问POI的相关性评分,从而为用户提供推荐。首先,在GeoSoCa中,我们提出了一种具有自适应带宽的核估计方法,以确定每个用户的poi的个性化签入分布,该方法自然地模拟了poi之间的地理相关性。然后,GeoSoCa在POI上汇总用户朋友的签到频率或评分,并将社交签到频率或评分建模为幂律分布,以利用用户之间的社交相关性。此外,GeoSoCa应用用户对POI类别的偏见来衡量相应类别中POI的受欢迎程度,并将加权后的受欢迎程度建模为幂律分布,以利用POI之间的分类相关性。最后,我们使用从Foursquare和Yelp收集的两个大规模真实签到数据集对GeoSoCa进行了全面的性能评估。实验结果表明,与其他最先进的POI推荐技术相比,GeoSoCa的推荐质量显著提高。
{"title":"GeoSoCa: Exploiting Geographical, Social and Categorical Correlations for Point-of-Interest Recommendations","authors":"Jiadong Zhang, Chi-Yin Chow","doi":"10.1145/2766462.2767711","DOIUrl":"https://doi.org/10.1145/2766462.2767711","url":null,"abstract":"Recommending users with their preferred points-of-interest (POIs), e.g., museums and restaurants, has become an important feature for location-based social networks (LBSNs), which benefits people to explore new places and businesses to discover potential customers. However, because users only check in a few POIs in an LBSN, the user-POI check-in interaction is highly sparse, which renders a big challenge for POI recommendations. To tackle this challenge, in this study we propose a new POI recommendation approach called GeoSoCa through exploiting geographical correlations, social correlations and categorical correlations among users and POIs. The geographical, social and categorical correlations can be learned from the historical check-in data of users on POIs and utilized to predict the relevance score of a user to an unvisited POI so as to make recommendations for users. First, in GeoSoCa we propose a kernel estimation method with an adaptive bandwidth to determine a personalized check-in distribution of POIs for each user that naturally models the geographical correlations between POIs. Then, GeoSoCa aggregates the check-in frequency or rating of a user's friends on a POI and models the social check-in frequency or rating as a power-law distribution to employ the social correlations between users. Further, GeoSoCa applies the bias of a user on a POI category to weigh the popularity of a POI in the corresponding category and models the weighed popularity as a power-law distribution to leverage the categorical correlations between POIs. Finally, we conduct a comprehensive performance evaluation for GeoSoCa using two large-scale real-world check-in data sets collected from Foursquare and Yelp. Experimental results show that GeoSoCa achieves significantly superior recommendation quality compared to other state-of-the-art POI recommendation techniques.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131506676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 297
An Efficient and Scalable MetaFeature-based Document Classification Approach based on Massively Parallel Computing 基于大规模并行计算的高效可扩展元特征文档分类方法
Sérgio D. Canuto, Marcos André Gonçalves, W. M. D. Santos, Thierson Couto, W. Martins
The unprecedented growth of available data nowadays has stimulated the development of new methods for organizing and extracting useful knowledge from this immense amount of data. Automatic Document Classification (ADC) is one of such methods, that uses machine learning techniques to build models capable of automatically associating documents to well-defined semantic classes. ADC is the basis of many important applications such as language identification, sentiment analysis, recommender systems, spam filtering, among others. Recently, the use of meta-features has been shown to substantially improve the effectiveness of ADC algorithms. In particular, the use of meta-features that make a combined use of local information (through kNN-based features) and global information (through category centroids) has produced promising results. However, the generation of these meta-features is very costly in terms of both, memory consumption and runtime since there is the need to constantly call the kNN algorithm. We take advantage of the current manycore GPU architecture and present a massively parallel version of the kNN algorithm for highly dimensional and sparse datasets (which is the case for ADC). Our experimental results show that we can obtain speedup gains of up to 15x while reducing memory consumption in more than 5000x when compared to a state-of-the-art parallel baseline. This opens up the possibility of applying meta-features based classification in large collections of documents, that would otherwise take too much time or require the use of an expensive computational platform.
如今,可用数据的空前增长刺激了从海量数据中组织和提取有用知识的新方法的发展。自动文档分类(ADC)就是这样一种方法,它使用机器学习技术来构建能够自动将文档与定义良好的语义类关联起来的模型。ADC是许多重要应用的基础,如语言识别、情感分析、推荐系统、垃圾邮件过滤等。最近,元特征的使用已被证明可以大大提高ADC算法的有效性。特别是,结合使用局部信息(通过基于knn的特征)和全局信息(通过类别质心)的元特征的使用产生了有希望的结果。然而,这些元特征的生成在内存消耗和运行时间方面都非常昂贵,因为需要不断调用kNN算法。我们利用当前的多核GPU架构,为高维和稀疏数据集(ADC的情况)提供了大规模并行版本的kNN算法。我们的实验结果表明,与最先进的并行基线相比,我们可以获得高达15倍的加速增益,同时将内存消耗减少5000倍以上。这开启了在大型文档集合中应用基于元特征的分类的可能性,否则将花费太多时间或需要使用昂贵的计算平台。
{"title":"An Efficient and Scalable MetaFeature-based Document Classification Approach based on Massively Parallel Computing","authors":"Sérgio D. Canuto, Marcos André Gonçalves, W. M. D. Santos, Thierson Couto, W. Martins","doi":"10.1145/2766462.2767743","DOIUrl":"https://doi.org/10.1145/2766462.2767743","url":null,"abstract":"The unprecedented growth of available data nowadays has stimulated the development of new methods for organizing and extracting useful knowledge from this immense amount of data. Automatic Document Classification (ADC) is one of such methods, that uses machine learning techniques to build models capable of automatically associating documents to well-defined semantic classes. ADC is the basis of many important applications such as language identification, sentiment analysis, recommender systems, spam filtering, among others. Recently, the use of meta-features has been shown to substantially improve the effectiveness of ADC algorithms. In particular, the use of meta-features that make a combined use of local information (through kNN-based features) and global information (through category centroids) has produced promising results. However, the generation of these meta-features is very costly in terms of both, memory consumption and runtime since there is the need to constantly call the kNN algorithm. We take advantage of the current manycore GPU architecture and present a massively parallel version of the kNN algorithm for highly dimensional and sparse datasets (which is the case for ADC). Our experimental results show that we can obtain speedup gains of up to 15x while reducing memory consumption in more than 5000x when compared to a state-of-the-art parallel baseline. This opens up the possibility of applying meta-features based classification in large collections of documents, that would otherwise take too much time or require the use of an expensive computational platform.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128732551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Load-sensitive CPU Power Management for Web Search Engines 负载敏感的CPU电源管理Web搜索引擎
Matteo Catena, C. Macdonald, N. Tonellotto
Web search engine companies require power-hungry data centers with thousands of servers to efficiently perform searches on a large scale. This permits the search engines to serve high arrival rates of user queries with low latency, but poses economical and environmental concerns due to the power consumption of the servers. Existing power saving techniques sacrifice the raw performance of a server for reduced power absorption, by scaling the frequency of the server's CPU according to its utilization. For instance, current Linux kernels include frequency governors i.e., mechanisms designed to dynamically throttle the CPU operational frequency. However, such general-domain techniques work at the operating system level and have no knowledge about the querying operations of the server. In this work, we propose to delegate CPU power management to search engine-specific governors. These can leverage knowledge coming from the querying operations, such as the query server utilization and load. By exploiting such additional knowledge, we can appropriately throttle the CPU frequency thereby reducing the query server power consumption. Experiments are conducted upon the TREC ClueWeb09 corpus and the query stream from the MSN 2006 query log. Results show that we can reduce up to ~24% a server power consumption, with only limited drawbacks in effectiveness w.r.t. a system running at maximum CPU frequency to promote query processing quality.
Web搜索引擎公司需要耗电的数据中心和数千台服务器来高效地执行大规模搜索。这允许搜索引擎以低延迟为用户查询提供高到达率的服务,但由于服务器的功耗,会带来经济和环境问题。现有的节能技术通过根据利用率调整服务器CPU的频率,牺牲服务器的原始性能来降低功率吸收。例如,当前的Linux内核包括频率调控器,即设计用于动态调节CPU操作频率的机制。但是,这种通用域技术在操作系统级别上工作,并且不了解服务器的查询操作。在这项工作中,我们建议将CPU电源管理委托给特定于搜索引擎的调控器。它们可以利用来自查询操作的知识,例如查询服务器利用率和负载。通过利用这些额外的知识,我们可以适当地调节CPU频率,从而降低查询服务器的功耗。在TREC ClueWeb09语料库和MSN 2006查询日志的查询流上进行了实验。结果表明,我们可以减少高达24%的服务器功耗,并且在效率上只有有限的缺点,例如系统在最大CPU频率下运行以提高查询处理质量。
{"title":"Load-sensitive CPU Power Management for Web Search Engines","authors":"Matteo Catena, C. Macdonald, N. Tonellotto","doi":"10.1145/2766462.2767809","DOIUrl":"https://doi.org/10.1145/2766462.2767809","url":null,"abstract":"Web search engine companies require power-hungry data centers with thousands of servers to efficiently perform searches on a large scale. This permits the search engines to serve high arrival rates of user queries with low latency, but poses economical and environmental concerns due to the power consumption of the servers. Existing power saving techniques sacrifice the raw performance of a server for reduced power absorption, by scaling the frequency of the server's CPU according to its utilization. For instance, current Linux kernels include frequency governors i.e., mechanisms designed to dynamically throttle the CPU operational frequency. However, such general-domain techniques work at the operating system level and have no knowledge about the querying operations of the server. In this work, we propose to delegate CPU power management to search engine-specific governors. These can leverage knowledge coming from the querying operations, such as the query server utilization and load. By exploiting such additional knowledge, we can appropriately throttle the CPU frequency thereby reducing the query server power consumption. Experiments are conducted upon the TREC ClueWeb09 corpus and the query stream from the MSN 2006 query log. Results show that we can reduce up to ~24% a server power consumption, with only limited drawbacks in effectiveness w.r.t. a system running at maximum CPU frequency to promote query processing quality.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128829373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Comparing Approaches for Query Autocompletion 比较查询自动完成的方法
Giovanni Di Santo, R. McCreadie, C. Macdonald, I. Ounis
Within a search engine, query auto-completion aims to predict the final query the user wants to enter as they type, with the aim of reducing query entry time and potentially preparing the search results in advance of query submission. There are a large number of approaches to automatically rank candidate queries for the purposes of auto-completion. However, no study exists that compares these approaches on a single dataset. Hence, in this paper, we present a comparison study between current approaches to rank candidate query completions for the user query as it is typed. Using a query-log and document corpus from a commercial medical search engine, we study the performance of 11 candidate query ranking approaches from the literature and analyze where they are effective. We show that the most effective approaches to query auto-completion are largely dependent on the number of characters that the user has typed so far, with the most effective approach differing for short and long prefixes. Moreover, we show that if personalized information is available about the searcher, this additional information can be used to more effectively rank query candidate completions, regardless of the prefix length.
在搜索引擎中,查询自动完成旨在预测用户键入时想要输入的最终查询,目的是减少查询输入时间,并可能在查询提交之前准备好搜索结果。有很多方法可以自动对候选查询进行排序,以实现自动完成。然而,目前还没有研究在单一数据集上对这些方法进行比较。因此,在本文中,我们提出了一项比较研究,在用户查询输入时对候选查询补全排序的当前方法之间进行比较。使用来自商业医疗搜索引擎的查询日志和文档语料库,我们研究了文献中11种候选查询排序方法的性能,并分析了它们在哪些方面是有效的。我们展示了查询自动完成的最有效方法在很大程度上取决于用户到目前为止输入的字符数量,对于短前缀和长前缀,最有效的方法是不同的。此外,我们表明,如果有关于搜索者的个性化信息,那么这些附加信息可以用于更有效地对查询候选补全进行排序,而不管前缀长度如何。
{"title":"Comparing Approaches for Query Autocompletion","authors":"Giovanni Di Santo, R. McCreadie, C. Macdonald, I. Ounis","doi":"10.1145/2766462.2767829","DOIUrl":"https://doi.org/10.1145/2766462.2767829","url":null,"abstract":"Within a search engine, query auto-completion aims to predict the final query the user wants to enter as they type, with the aim of reducing query entry time and potentially preparing the search results in advance of query submission. There are a large number of approaches to automatically rank candidate queries for the purposes of auto-completion. However, no study exists that compares these approaches on a single dataset. Hence, in this paper, we present a comparison study between current approaches to rank candidate query completions for the user query as it is typed. Using a query-log and document corpus from a commercial medical search engine, we study the performance of 11 candidate query ranking approaches from the literature and analyze where they are effective. We show that the most effective approaches to query auto-completion are largely dependent on the number of characters that the user has typed so far, with the most effective approach differing for short and long prefixes. Moreover, we show that if personalized information is available about the searcher, this additional information can be used to more effectively rank query candidate completions, regardless of the prefix length.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122511803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
期刊
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1