首页 > 最新文献

Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval最新文献

英文 中文
Session details: Session 5A: Deep Learning 会议详情:5A:深度学习
Berthier Ribeiro-Neto
{"title":"Session details: Session 5A: Deep Learning","authors":"Berthier Ribeiro-Neto","doi":"10.1145/3255927","DOIUrl":"https://doi.org/10.1145/3255927","url":null,"abstract":"","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125120979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modelling Term Dependence with Copulas 用copula建模项相关性
Carsten Eickhoff, A. D. Vries, Thomas Hofmann
Many generative language and relevance models assume conditional independence between the likelihood of observing individual terms. This assumption is obviously naive, but also hard to replace or relax. There are only very few term pairs that actually show significant conditional dependencies while the vast majority of co-located terms has no implications on the document's topical nature or relevance towards a given topic. It is exactly this situation that we capture in a formal framework: A limited number of meaningful dependencies in a system of largely independent observations. Making use of the formal copula framework, we describe the strength of causal dependency in terms of a number of established term co-occurrence metrics. Our experiments based on the well known ClueWeb'12 corpus and TREC 2013 topics indicate significant performance gains in terms of retrieval performance when we formally account for the dependency structure underlying pieces of natural language text.
许多生成语言和关联模型在观察单个术语的可能性之间假定条件独立。这种假设显然是幼稚的,但也很难取代或放松。只有很少的术语对显示出明显的条件依赖关系,而绝大多数位于同一位置的术语对文档的主题性质或与给定主题的相关性没有任何影响。我们在正式框架中捕捉到的正是这种情况:在一个很大程度上独立观察的系统中,有限数量的有意义的依赖关系。利用正式的copula框架,我们根据一些已建立的术语共现度量来描述因果依赖的强度。我们基于众所周知的ClueWeb'12语料库和TREC 2013主题的实验表明,当我们正式考虑自然语言文本片段的依赖结构时,在检索性能方面获得了显著的性能提升。
{"title":"Modelling Term Dependence with Copulas","authors":"Carsten Eickhoff, A. D. Vries, Thomas Hofmann","doi":"10.1145/2766462.2767831","DOIUrl":"https://doi.org/10.1145/2766462.2767831","url":null,"abstract":"Many generative language and relevance models assume conditional independence between the likelihood of observing individual terms. This assumption is obviously naive, but also hard to replace or relax. There are only very few term pairs that actually show significant conditional dependencies while the vast majority of co-located terms has no implications on the document's topical nature or relevance towards a given topic. It is exactly this situation that we capture in a formal framework: A limited number of meaningful dependencies in a system of largely independent observations. Making use of the formal copula framework, we describe the strength of causal dependency in terms of a number of established term co-occurrence metrics. Our experiments based on the well known ClueWeb'12 corpus and TREC 2013 topics indicate significant performance gains in terms of retrieval performance when we formally account for the dependency structure underlying pieces of natural language text.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125223440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
On the Reusability of Open Test Collections 开放测试集合的可重用性研究
Seyyed Hadi Hashemi, C. Clarke, Adriel Dean-Hall, J. Kamps, Julia Kiseleva
Creating test collections for modern search tasks is increasingly more challenging due to the growing scale and dynamic nature of content, and need for richer contextualization of the statements of request. To address these issues, the TREC Contextual Suggestion Track explored an open test collection, where participants were allowed to submit any web page as a result for a personalized venue recommendation task. This prompts the question on the reusability of the resulting test collection: How does the open nature affect the pooling process? Can participants reliably evaluate variant runs with the resulting qrels? Can other teams evaluate new runs reliably? In short, does the set of pooled and judged documents effectively produce a post hoc test collection? Our main findings are the following: First, while there is a strongly significant rank correlation, the effect of pooling is notable and results in underestimation of performance, implying the evaluation of non-pooled systems should be done with great care. Second, we extensively analyze impacts of open corpus on the fraction of judged documents, explaining how low recall affects the reusability, and how the personalization and low pooling depth aggravate that problem. Third, we outline a potential solution by deriving a fixed corpus from open web submissions.
由于内容的规模和动态特性的增长,以及对请求语句更丰富的上下文化的需求,为现代搜索任务创建测试集合变得越来越具有挑战性。为了解决这些问题,TREC上下文建议跟踪探索了一个开放的测试集,参与者可以提交任何网页作为个性化场地推荐任务的结果。这就引出了关于结果测试集合的可重用性的问题:开放性质如何影响池过程?参与者是否能够可靠地评估不同的跑步结果?其他团队能可靠地评估新运行吗?简而言之,汇集和判断的文档集是否有效地生成了一个事后测试集合?我们的主要发现如下:首先,虽然存在非常显著的等级相关性,但池化的影响是显著的,并且会导致对性能的低估,这意味着对非池化系统的评估应该非常小心。其次,我们广泛地分析了开放语料库对被判文档比例的影响,解释了低召回率如何影响可重用性,以及个性化和低池化深度如何加剧了这一问题。第三,我们概述了一个潜在的解决方案,从开放的网络提交中获得一个固定的语料库。
{"title":"On the Reusability of Open Test Collections","authors":"Seyyed Hadi Hashemi, C. Clarke, Adriel Dean-Hall, J. Kamps, Julia Kiseleva","doi":"10.1145/2766462.2767788","DOIUrl":"https://doi.org/10.1145/2766462.2767788","url":null,"abstract":"Creating test collections for modern search tasks is increasingly more challenging due to the growing scale and dynamic nature of content, and need for richer contextualization of the statements of request. To address these issues, the TREC Contextual Suggestion Track explored an open test collection, where participants were allowed to submit any web page as a result for a personalized venue recommendation task. This prompts the question on the reusability of the resulting test collection: How does the open nature affect the pooling process? Can participants reliably evaluate variant runs with the resulting qrels? Can other teams evaluate new runs reliably? In short, does the set of pooled and judged documents effectively produce a post hoc test collection? Our main findings are the following: First, while there is a strongly significant rank correlation, the effect of pooling is notable and results in underestimation of performance, implying the evaluation of non-pooled systems should be done with great care. Second, we extensively analyze impacts of open corpus on the fraction of judged documents, explaining how low recall affects the reusability, and how the personalization and low pooling depth aggravate that problem. Third, we outline a potential solution by deriving a fixed corpus from open web submissions.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"34 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131151302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Twitter Sentiment Analysis with Deep Convolutional Neural Networks 用深度卷积神经网络分析Twitter情绪
Aliaksei Severyn, Alessandro Moschitti
This paper describes our deep learning system for sentiment analysis of tweets. The main contribution of this work is a new model for initializing the parameter weights of the convolutional neural network, which is crucial to train an accurate model while avoiding the need to inject any additional features. Briefly, we use an unsupervised neural language model to train initial word embeddings that are further tuned by our deep learning model on a distant supervised corpus. At a final stage, the pre-trained parameters of the network are used to initialize the model. We train the latter on the supervised training data recently made available by the official system evaluation campaign on Twitter Sentiment Analysis organized by Semeval-2015. A comparison between the results of our approach and the systems participating in the challenge on the official test sets, suggests that our model could be ranked in the first two positions in both the phrase-level subtask A (among 11 teams) and on the message-level subtask B (among 40 teams). This is an important evidence on the practical value of our solution.
本文描述了我们用于tweet情感分析的深度学习系统。这项工作的主要贡献是一个用于初始化卷积神经网络参数权重的新模型,这对于训练准确的模型至关重要,同时避免需要注入任何额外的特征。简而言之,我们使用无监督神经语言模型来训练初始词嵌入,这些词嵌入通过我们的深度学习模型在远程监督语料库上进一步调整。在最后阶段,使用网络的预训练参数初始化模型。我们根据最近由Semeval-2015组织的Twitter情感分析官方系统评估活动提供的监督训练数据对后者进行训练。将我们的方法的结果与官方测试集上参与挑战的系统进行比较,表明我们的模型可以在短语级子任务A(在11个团队中)和消息级子任务B(在40个团队中)中排名前两个位置。这是一个重要的证据,证明我们的解决方案的实用价值。
{"title":"Twitter Sentiment Analysis with Deep Convolutional Neural Networks","authors":"Aliaksei Severyn, Alessandro Moschitti","doi":"10.1145/2766462.2767830","DOIUrl":"https://doi.org/10.1145/2766462.2767830","url":null,"abstract":"This paper describes our deep learning system for sentiment analysis of tweets. The main contribution of this work is a new model for initializing the parameter weights of the convolutional neural network, which is crucial to train an accurate model while avoiding the need to inject any additional features. Briefly, we use an unsupervised neural language model to train initial word embeddings that are further tuned by our deep learning model on a distant supervised corpus. At a final stage, the pre-trained parameters of the network are used to initialize the model. We train the latter on the supervised training data recently made available by the official system evaluation campaign on Twitter Sentiment Analysis organized by Semeval-2015. A comparison between the results of our approach and the systems participating in the challenge on the official test sets, suggests that our model could be ranked in the first two positions in both the phrase-level subtask A (among 11 teams) and on the message-level subtask B (among 40 teams). This is an important evidence on the practical value of our solution.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131484161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 583
Session details: Session 7A: Assessing 会议详情:会议7A:评估
J. Zobel
{"title":"Session details: Session 7A: Assessing","authors":"J. Zobel","doi":"10.1145/3255934","DOIUrl":"https://doi.org/10.1145/3255934","url":null,"abstract":"","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133327849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finding Answers in Web Search 在网络搜索中找到答案
E. Yulianti
There are many informational queries that could be answered with a text passage, thereby not requiring the searcher to access the full web document. When building manual annotations of answer passages for TREC queries, Keikha et al. [6] confirmed that many such queries can be answered with just passages. By presenting the answers directly in the search result page, user information needs will be addressed more rapidly so that reduces user interaction (click) with the search result page [3] and gives a significant positive effect on user satisfaction [2, 7]. In the context of general web search, the problem of finding answer passages has not been explored extensively. Retrieving relevant passages has been studied in TREC HARD track [1] and in INEX [5], but relevant passages are not required to contain answers. One of the tasks in the TREC genomics track [4] was to find answer passages on biomedical literature. Previous work has shown that current passage retrieval methods that focus on topical relevance are not effective at finding answers [6]. Therefore, more knowledge is required to identify answers in a document. Bernstein et al. [2] has studied an approach to extract inline direct answers for search result using paid crowdsourcing service. Such an approach, however, is expensive and not practical to be applied for all possible information needs. A fully automatic process in finding answers remains a research challenge. The aim of this thesis is to find passages in the documents that contain answers to a user's query. In this research, we proposed to use a summarization technique through taking advantage of Community Question Answering (CQA) content. In our previous work, we have shown the benefit of using social media to generate more accurate summaries of web documents [8], but this was not designed to present answer in the summary. With the high volume of questions and answers posted in CQA, we believe that there are many questions that have been previously asked in CQA that are the same as or related to actual web queries, for which their best answers can guide us to extract answers in the document. As an initial work, we proposed using term distributions extracted from best answers for top matching questions in one of leading CQA sites, Yahoo! Answers (Y!A), for answer summaries generation. An experiment was done by comparing our summaries with reference answers built in previous work [6], finding some level of success. A manuscript is prepared for this result. Next, as an extension of our work above, we were interested to see whether the documents that have better quality answer summaries should be ranked higher in the result list. A set of features are derived from answer summaries to re-rank documents in the result list. Our experiment shows that answer summaries can be used to improve state-of-the-art document ranking. The method is also shown to outperform a current re-ranking approach using comprehensive document quality features. A
有许多信息查询可以用文本段落来回答,因此不需要搜索者访问完整的web文档。Keikha等人[6]在为TREC查询构建答案段落的手动注释时,证实了许多这样的查询可以只用段落来回答。通过在搜索结果页面中直接呈现答案,可以更快地满足用户的信息需求,减少用户与搜索结果页面的交互(点击)[3],对用户满意度有显著的正向影响[2,7]。在一般网络搜索的背景下,寻找答案段落的问题还没有得到广泛的探讨。TREC HARD track[1]和INEX[5]已经研究了检索相关文章,但相关文章不需要包含答案。TREC基因组学轨道[4]的任务之一是查找生物医学文献的答案段落。先前的研究表明,当前关注主题相关性的文章检索方法在寻找答案方面并不有效[6]。因此,需要更多的知识来识别文档中的答案。Bernstein等人[2]研究了一种使用付费众包服务提取搜索结果内联直接答案的方法。然而,这种方法代价高昂,而且不实际,不能适用于所有可能的信息需求。寻找答案的全自动过程仍然是一项研究挑战。本文的目的是在文档中找到包含用户查询答案的段落。在本研究中,我们提出了利用社区问答(CQA)内容的摘要技术。在我们之前的工作中,我们已经展示了使用社交媒体生成更准确的web文档摘要的好处[8],但这并不是为了在摘要中给出答案。由于CQA中发布了大量的问题和答案,我们相信在CQA中有许多之前被问过的问题与实际的web查询相同或相关,它们的最佳答案可以指导我们在文档中提取答案。作为一项初步工作,我们建议使用从最佳答案中提取的术语分布来解决一个领先的CQA网站Yahoo!答案(Y!A),用于生成答案摘要。我们做了一个实验,将我们的总结与之前工作[6]中构建的参考答案进行比较,发现了一定程度的成功。为这个结果准备了一份手稿。接下来,作为我们上述工作的延伸,我们很想知道具有更好质量的答案摘要的文档是否应该在结果列表中排名更高。从答案摘要中派生出一组特性,以便在结果列表中对文档重新排序。我们的实验表明,答案摘要可以用来提高最先进的文档排名。该方法还显示优于当前使用综合文档质量特征的重新排序方法。为此结果提交了一份手稿。在未来的工作中,我们计划对Y!的顶级匹配问题及其对应的最佳答案进行更深入的分析。以便更好地了解它们对生成的摘要和重新排序结果的好处。例如,Y!的最佳答案在不同的相关度上的结果有何不同?A,用来生成摘要。也有机会改进Y!生成答案摘要,例如通过预测Y的最佳答案的质量。A对应于查询。我们还打算结合相关的Y!当有来自Y!的问题时,在初始结果列表中添加一个页面。A,与查询匹配得很好。接下来,重要的是要考虑为没有来自CQA的相关结果的查询生成答案摘要的方法。
{"title":"Finding Answers in Web Search","authors":"E. Yulianti","doi":"10.1145/2766462.2767846","DOIUrl":"https://doi.org/10.1145/2766462.2767846","url":null,"abstract":"There are many informational queries that could be answered with a text passage, thereby not requiring the searcher to access the full web document. When building manual annotations of answer passages for TREC queries, Keikha et al. [6] confirmed that many such queries can be answered with just passages. By presenting the answers directly in the search result page, user information needs will be addressed more rapidly so that reduces user interaction (click) with the search result page [3] and gives a significant positive effect on user satisfaction [2, 7]. In the context of general web search, the problem of finding answer passages has not been explored extensively. Retrieving relevant passages has been studied in TREC HARD track [1] and in INEX [5], but relevant passages are not required to contain answers. One of the tasks in the TREC genomics track [4] was to find answer passages on biomedical literature. Previous work has shown that current passage retrieval methods that focus on topical relevance are not effective at finding answers [6]. Therefore, more knowledge is required to identify answers in a document. Bernstein et al. [2] has studied an approach to extract inline direct answers for search result using paid crowdsourcing service. Such an approach, however, is expensive and not practical to be applied for all possible information needs. A fully automatic process in finding answers remains a research challenge. The aim of this thesis is to find passages in the documents that contain answers to a user's query. In this research, we proposed to use a summarization technique through taking advantage of Community Question Answering (CQA) content. In our previous work, we have shown the benefit of using social media to generate more accurate summaries of web documents [8], but this was not designed to present answer in the summary. With the high volume of questions and answers posted in CQA, we believe that there are many questions that have been previously asked in CQA that are the same as or related to actual web queries, for which their best answers can guide us to extract answers in the document. As an initial work, we proposed using term distributions extracted from best answers for top matching questions in one of leading CQA sites, Yahoo! Answers (Y!A), for answer summaries generation. An experiment was done by comparing our summaries with reference answers built in previous work [6], finding some level of success. A manuscript is prepared for this result. Next, as an extension of our work above, we were interested to see whether the documents that have better quality answer summaries should be ranked higher in the result list. A set of features are derived from answer summaries to re-rank documents in the result list. Our experiment shows that answer summaries can be used to improve state-of-the-art document ranking. The method is also shown to outperform a current re-ranking approach using comprehensive document quality features. A ","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133774279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GeoSoCa: Exploiting Geographical, Social and Categorical Correlations for Point-of-Interest Recommendations GeoSoCa:为兴趣点推荐开发地理、社会和分类相关性
Jiadong Zhang, Chi-Yin Chow
Recommending users with their preferred points-of-interest (POIs), e.g., museums and restaurants, has become an important feature for location-based social networks (LBSNs), which benefits people to explore new places and businesses to discover potential customers. However, because users only check in a few POIs in an LBSN, the user-POI check-in interaction is highly sparse, which renders a big challenge for POI recommendations. To tackle this challenge, in this study we propose a new POI recommendation approach called GeoSoCa through exploiting geographical correlations, social correlations and categorical correlations among users and POIs. The geographical, social and categorical correlations can be learned from the historical check-in data of users on POIs and utilized to predict the relevance score of a user to an unvisited POI so as to make recommendations for users. First, in GeoSoCa we propose a kernel estimation method with an adaptive bandwidth to determine a personalized check-in distribution of POIs for each user that naturally models the geographical correlations between POIs. Then, GeoSoCa aggregates the check-in frequency or rating of a user's friends on a POI and models the social check-in frequency or rating as a power-law distribution to employ the social correlations between users. Further, GeoSoCa applies the bias of a user on a POI category to weigh the popularity of a POI in the corresponding category and models the weighed popularity as a power-law distribution to leverage the categorical correlations between POIs. Finally, we conduct a comprehensive performance evaluation for GeoSoCa using two large-scale real-world check-in data sets collected from Foursquare and Yelp. Experimental results show that GeoSoCa achieves significantly superior recommendation quality compared to other state-of-the-art POI recommendation techniques.
向用户推荐他们喜欢的兴趣点(poi),例如博物馆和餐馆,已经成为基于位置的社交网络(LBSNs)的一个重要功能,它有利于人们探索新的地方和企业,发现潜在的客户。然而,由于用户只签入LBSN中的几个POI,因此用户-POI签入交互非常稀疏,这对POI推荐提出了很大的挑战。为了应对这一挑战,在本研究中,我们通过利用用户和POI之间的地理相关性、社会相关性和分类相关性,提出了一种名为GeoSoCa的新的POI推荐方法。可以从用户在POI上的历史签到数据中了解地理、社会和分类相关性,并利用这些相关性预测用户与未访问POI的相关性评分,从而为用户提供推荐。首先,在GeoSoCa中,我们提出了一种具有自适应带宽的核估计方法,以确定每个用户的poi的个性化签入分布,该方法自然地模拟了poi之间的地理相关性。然后,GeoSoCa在POI上汇总用户朋友的签到频率或评分,并将社交签到频率或评分建模为幂律分布,以利用用户之间的社交相关性。此外,GeoSoCa应用用户对POI类别的偏见来衡量相应类别中POI的受欢迎程度,并将加权后的受欢迎程度建模为幂律分布,以利用POI之间的分类相关性。最后,我们使用从Foursquare和Yelp收集的两个大规模真实签到数据集对GeoSoCa进行了全面的性能评估。实验结果表明,与其他最先进的POI推荐技术相比,GeoSoCa的推荐质量显著提高。
{"title":"GeoSoCa: Exploiting Geographical, Social and Categorical Correlations for Point-of-Interest Recommendations","authors":"Jiadong Zhang, Chi-Yin Chow","doi":"10.1145/2766462.2767711","DOIUrl":"https://doi.org/10.1145/2766462.2767711","url":null,"abstract":"Recommending users with their preferred points-of-interest (POIs), e.g., museums and restaurants, has become an important feature for location-based social networks (LBSNs), which benefits people to explore new places and businesses to discover potential customers. However, because users only check in a few POIs in an LBSN, the user-POI check-in interaction is highly sparse, which renders a big challenge for POI recommendations. To tackle this challenge, in this study we propose a new POI recommendation approach called GeoSoCa through exploiting geographical correlations, social correlations and categorical correlations among users and POIs. The geographical, social and categorical correlations can be learned from the historical check-in data of users on POIs and utilized to predict the relevance score of a user to an unvisited POI so as to make recommendations for users. First, in GeoSoCa we propose a kernel estimation method with an adaptive bandwidth to determine a personalized check-in distribution of POIs for each user that naturally models the geographical correlations between POIs. Then, GeoSoCa aggregates the check-in frequency or rating of a user's friends on a POI and models the social check-in frequency or rating as a power-law distribution to employ the social correlations between users. Further, GeoSoCa applies the bias of a user on a POI category to weigh the popularity of a POI in the corresponding category and models the weighed popularity as a power-law distribution to leverage the categorical correlations between POIs. Finally, we conduct a comprehensive performance evaluation for GeoSoCa using two large-scale real-world check-in data sets collected from Foursquare and Yelp. Experimental results show that GeoSoCa achieves significantly superior recommendation quality compared to other state-of-the-art POI recommendation techniques.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131506676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 297
Leveraging User Reviews to Improve Accuracy for Mobile App Retrieval 利用用户评论提高移动应用检索的准确性
Dae Hoon Park, Mengwen Liu, ChengXiang Zhai, Haohong Wang
Smartphones and tablets with their apps pervaded our everyday life, leading to a new demand for search tools to help users find the right apps to satisfy their immediate needs. While there are a few commercial mobile app search engines available, the new task of mobile app retrieval has not yet been rigorously studied. Indeed, there does not yet exist a test collection for quantitatively evaluating this new retrieval task. In this paper, we first study the effectiveness of the state-of-the-art retrieval models for the app retrieval task using a new app retrieval test data we created. We then propose and study a novel approach that generates a new representation for each app. Our key idea is to leverage user reviews to find out important features of apps and bridge vocabulary gap between app developers and users. Specifically, we jointly model app descriptions and user reviews using topic model in order to generate app representations while excluding noise in reviews. Experiment results indicate that the proposed approach is effective and outperforms the state-of-the-art retrieval models for app retrieval.
智能手机和平板电脑及其应用程序在我们的日常生活中无处不在,这导致了对搜索工具的新需求,以帮助用户找到合适的应用程序来满足他们的即时需求。虽然有一些商业移动应用搜索引擎可用,但移动应用检索的新任务尚未得到严格的研究。事实上,目前还没有一个测试集合可以定量地评估这个新的检索任务。在本文中,我们首先使用我们创建的新的应用程序检索测试数据研究了最先进的检索模型对应用程序检索任务的有效性。然后,我们提出并研究了一种为每个应用生成新表示的新方法。我们的主要想法是利用用户评论来发现应用的重要功能,并弥合应用开发者和用户之间的词汇差距。具体来说,我们使用主题模型对应用描述和用户评论进行联合建模,以生成应用表示,同时排除评论中的噪音。实验结果表明,该方法是有效的,并且在应用程序检索方面优于现有的检索模型。
{"title":"Leveraging User Reviews to Improve Accuracy for Mobile App Retrieval","authors":"Dae Hoon Park, Mengwen Liu, ChengXiang Zhai, Haohong Wang","doi":"10.1145/2766462.2767759","DOIUrl":"https://doi.org/10.1145/2766462.2767759","url":null,"abstract":"Smartphones and tablets with their apps pervaded our everyday life, leading to a new demand for search tools to help users find the right apps to satisfy their immediate needs. While there are a few commercial mobile app search engines available, the new task of mobile app retrieval has not yet been rigorously studied. Indeed, there does not yet exist a test collection for quantitatively evaluating this new retrieval task. In this paper, we first study the effectiveness of the state-of-the-art retrieval models for the app retrieval task using a new app retrieval test data we created. We then propose and study a novel approach that generates a new representation for each app. Our key idea is to leverage user reviews to find out important features of apps and bridge vocabulary gap between app developers and users. Specifically, we jointly model app descriptions and user reviews using topic model in order to generate app representations while excluding noise in reviews. Experiment results indicate that the proposed approach is effective and outperforms the state-of-the-art retrieval models for app retrieval.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133310244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Towards a Game-Theoretic Framework for Information Retrieval 面向信息检索的博弈论框架
ChengXiang Zhai
The task of information retrieval (IR) has traditionally been defined as to rank a collection of documents in response to a query. While this definition has enabled most research progress in IR so far, it does not model accurately the actual retrieval task in a real IR application, where users tend to be engaged in an interactive process with multipe queries, and optimizing the overall performance of an IR system on an entire search session is far more important than its performance on an individual query. In this talk, I will present a new game-theoretic formulation of the IR problem where the key idea is to model information retrieval as a process of a search engine and a user playing a cooperative game, with a shared goal of satisfying the user's information need (or more generally helping the user complete a task) while minimizing the user's effort and the resource overhead on the retrieval system. Such a game-theoretic framework offers several benefits. First, it naturally suggests optimization of the overall utility of an interactive retrieval system over a whole search session, thus breaking the limitation of the traditional formulation that optimizes ranking of documents for a single query. Second, it models the interactions between users and a search engine, and thus can optimize the collaboration of a search engine and its users, maximizing the "combined intelligence" of a system and users. Finally, it can serve as a unified framework for optimizing both interactive information retrieval and active relevance judgment acquisition through crowdsourcing. I will discuss how the new framework can not only cover several emerging directions in current IR research as special cases, but also open up many interesting new research directions in IR.
传统上,信息检索(IR)任务被定义为根据查询对文档集合进行排序。虽然这个定义迄今为止推动了IR的大多数研究进展,但它并没有准确地模拟真实IR应用程序中的实际检索任务,因为用户倾向于参与多个查询的交互过程,并且在整个搜索会话上优化IR系统的整体性能远比在单个查询上的性能重要得多。在这次演讲中,我将提出一个新的IR问题的博弈论公式,其关键思想是将信息检索建模为搜索引擎和用户玩合作游戏的过程,具有满足用户信息需求(或更一般地帮助用户完成任务)的共同目标,同时最大限度地减少用户的努力和检索系统的资源开销。这样的博弈论框架提供了几个好处。首先,它自然地建议在整个搜索会话中对交互式检索系统的整体效用进行优化,从而打破了为单个查询优化文档排序的传统公式的限制。其次,它对用户和搜索引擎之间的交互进行建模,从而优化搜索引擎和用户之间的协作,最大限度地提高系统和用户的“联合智能”。最后,它可以作为通过众包优化交互式信息检索和主动关联判断获取的统一框架。我将讨论新框架如何不仅涵盖当前IR研究的几个新兴方向作为特例,而且还开辟了许多有趣的IR新研究方向。
{"title":"Towards a Game-Theoretic Framework for Information Retrieval","authors":"ChengXiang Zhai","doi":"10.1145/2766462.2767853","DOIUrl":"https://doi.org/10.1145/2766462.2767853","url":null,"abstract":"The task of information retrieval (IR) has traditionally been defined as to rank a collection of documents in response to a query. While this definition has enabled most research progress in IR so far, it does not model accurately the actual retrieval task in a real IR application, where users tend to be engaged in an interactive process with multipe queries, and optimizing the overall performance of an IR system on an entire search session is far more important than its performance on an individual query. In this talk, I will present a new game-theoretic formulation of the IR problem where the key idea is to model information retrieval as a process of a search engine and a user playing a cooperative game, with a shared goal of satisfying the user's information need (or more generally helping the user complete a task) while minimizing the user's effort and the resource overhead on the retrieval system. Such a game-theoretic framework offers several benefits. First, it naturally suggests optimization of the overall utility of an interactive retrieval system over a whole search session, thus breaking the limitation of the traditional formulation that optimizes ranking of documents for a single query. Second, it models the interactions between users and a search engine, and thus can optimize the collaboration of a search engine and its users, maximizing the \"combined intelligence\" of a system and users. Finally, it can serve as a unified framework for optimizing both interactive information retrieval and active relevance judgment acquisition through crowdsourcing. I will discuss how the new framework can not only cover several emerging directions in current IR research as special cases, but also open up many interesting new research directions in IR.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133888311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Comparing Approaches for Query Autocompletion 比较查询自动完成的方法
Giovanni Di Santo, R. McCreadie, C. Macdonald, I. Ounis
Within a search engine, query auto-completion aims to predict the final query the user wants to enter as they type, with the aim of reducing query entry time and potentially preparing the search results in advance of query submission. There are a large number of approaches to automatically rank candidate queries for the purposes of auto-completion. However, no study exists that compares these approaches on a single dataset. Hence, in this paper, we present a comparison study between current approaches to rank candidate query completions for the user query as it is typed. Using a query-log and document corpus from a commercial medical search engine, we study the performance of 11 candidate query ranking approaches from the literature and analyze where they are effective. We show that the most effective approaches to query auto-completion are largely dependent on the number of characters that the user has typed so far, with the most effective approach differing for short and long prefixes. Moreover, we show that if personalized information is available about the searcher, this additional information can be used to more effectively rank query candidate completions, regardless of the prefix length.
在搜索引擎中,查询自动完成旨在预测用户键入时想要输入的最终查询,目的是减少查询输入时间,并可能在查询提交之前准备好搜索结果。有很多方法可以自动对候选查询进行排序,以实现自动完成。然而,目前还没有研究在单一数据集上对这些方法进行比较。因此,在本文中,我们提出了一项比较研究,在用户查询输入时对候选查询补全排序的当前方法之间进行比较。使用来自商业医疗搜索引擎的查询日志和文档语料库,我们研究了文献中11种候选查询排序方法的性能,并分析了它们在哪些方面是有效的。我们展示了查询自动完成的最有效方法在很大程度上取决于用户到目前为止输入的字符数量,对于短前缀和长前缀,最有效的方法是不同的。此外,我们表明,如果有关于搜索者的个性化信息,那么这些附加信息可以用于更有效地对查询候选补全进行排序,而不管前缀长度如何。
{"title":"Comparing Approaches for Query Autocompletion","authors":"Giovanni Di Santo, R. McCreadie, C. Macdonald, I. Ounis","doi":"10.1145/2766462.2767829","DOIUrl":"https://doi.org/10.1145/2766462.2767829","url":null,"abstract":"Within a search engine, query auto-completion aims to predict the final query the user wants to enter as they type, with the aim of reducing query entry time and potentially preparing the search results in advance of query submission. There are a large number of approaches to automatically rank candidate queries for the purposes of auto-completion. However, no study exists that compares these approaches on a single dataset. Hence, in this paper, we present a comparison study between current approaches to rank candidate query completions for the user query as it is typed. Using a query-log and document corpus from a commercial medical search engine, we study the performance of 11 candidate query ranking approaches from the literature and analyze where they are effective. We show that the most effective approaches to query auto-completion are largely dependent on the number of characters that the user has typed so far, with the most effective approach differing for short and long prefixes. Moreover, we show that if personalized information is available about the searcher, this additional information can be used to more effectively rank query candidate completions, regardless of the prefix length.","PeriodicalId":297035,"journal":{"name":"Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122511803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
期刊
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1