首页 > 最新文献

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval最新文献

英文 中文
Faster and smaller inverted indices with treaps 更快和更小的倒排索引与堆
Roberto Konow, G. Navarro, C. Clarke, A. López-Ortiz
We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. To achieve compression we represent the treap topology using compact data structures. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. Results show that our index uses about 20% less space, and performs queries up to three times faster, than state-of-the-art compact representations.
我们引入了一种新的倒排索引表示,它在使用更少的空间的同时执行更快的排名并和交集。我们的索引基于trap数据结构,它允许我们相交/合并文档标识符,同时按频率设置阈值,而不是使用代价较高的两步经典处理方法。为了实现压缩,我们使用紧凑的数据结构来表示堆拓扑。此外,处理不变量允许我们优雅地对文档标识符和频率进行编码。结果表明,我们的索引使用的空间减少了20%,执行查询的速度比最先进的紧凑表示快三倍。
{"title":"Faster and smaller inverted indices with treaps","authors":"Roberto Konow, G. Navarro, C. Clarke, A. López-Ortiz","doi":"10.1145/2484028.2484088","DOIUrl":"https://doi.org/10.1145/2484028.2484088","url":null,"abstract":"We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. To achieve compression we represent the treap topology using compact data structures. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. Results show that our index uses about 20% less space, and performs queries up to three times faster, than state-of-the-art compact representations.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115504624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Improving LDA topic models for microblogs via tweet pooling and automatic labeling 基于tweet池和自动标注的微博LDA主题模型改进
Rishabh Mehrotra, S. Sanner, Wray L. Buntine, Lexing Xie
Twitter, or the world of 140 characters poses serious challenges to the efficacy of topic models on short, messy text. While topic models such as Latent Dirichlet Allocation (LDA) have a long history of successful application to news articles and academic abstracts, they are often less coherent when applied to microblog content like Twitter. In this paper, we investigate methods to improve topics learned from Twitter content without modifying the basic machinery of LDA; we achieve this through various pooling schemes that aggregate tweets in a data preprocessing step for LDA. We empirically establish that a novel method of tweet pooling by hashtags leads to a vast improvement in a variety of measures for topic coherence across three diverse Twitter datasets in comparison to an unmodified LDA baseline and a variety of pooling schemes. An additional contribution of automatic hashtag labeling further improves on the hashtag pooling results for a subset of metrics. Overall, these two novel schemes lead to significantly improved LDA topic models on Twitter content.
Twitter,或140个字符的世界,对主题模型在短小杂乱的文本上的有效性提出了严峻的挑战。虽然潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)等主题模型在新闻文章和学术摘要的成功应用方面有着悠久的历史,但它们在应用于Twitter等微博内容时往往不那么连贯。在本文中,我们研究了在不修改LDA基本机制的情况下改进从Twitter内容中学习的主题的方法;我们通过各种池化方案来实现这一点,这些方案在LDA的数据预处理步骤中聚合tweet。我们通过经验证明,与未修改的LDA基线和各种池化方案相比,通过标签进行tweet池化的新方法可以在三个不同Twitter数据集的各种主题一致性度量方面取得巨大进步。自动标签标记的另一个贡献是进一步改进了指标子集的标签池结果。总的来说,这两种新方案显著改善了Twitter内容的LDA主题模型。
{"title":"Improving LDA topic models for microblogs via tweet pooling and automatic labeling","authors":"Rishabh Mehrotra, S. Sanner, Wray L. Buntine, Lexing Xie","doi":"10.1145/2484028.2484166","DOIUrl":"https://doi.org/10.1145/2484028.2484166","url":null,"abstract":"Twitter, or the world of 140 characters poses serious challenges to the efficacy of topic models on short, messy text. While topic models such as Latent Dirichlet Allocation (LDA) have a long history of successful application to news articles and academic abstracts, they are often less coherent when applied to microblog content like Twitter. In this paper, we investigate methods to improve topics learned from Twitter content without modifying the basic machinery of LDA; we achieve this through various pooling schemes that aggregate tweets in a data preprocessing step for LDA. We empirically establish that a novel method of tweet pooling by hashtags leads to a vast improvement in a variety of measures for topic coherence across three diverse Twitter datasets in comparison to an unmodified LDA baseline and a variety of pooling schemes. An additional contribution of automatic hashtag labeling further improves on the hashtag pooling results for a subset of metrics. Overall, these two novel schemes lead to significantly improved LDA topic models on Twitter content.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124888841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 474
Collaborative factorization for recommender systems 推荐系统的协同分解
Chaosheng Fan, Yanyan Lan, J. Guo, Zuoquan Lin, Xueqi Cheng
Recommender system has become an effective tool for information filtering, which usually provides the most useful items to users by a top-k ranking list. Traditional recommendation techniques such as Nearest Neighbors (NN) and Matrix Factorization (MF) have been widely used in real recommender systems. However, neither approaches can well accomplish recommendation task since that: (1) most NN methods leverage the neighbor's behaviors for prediction, which may suffer the severe data sparsity problem; (2) MF methods are less sensitive to sparsity, but neighbors' influences on latent factors are not fully explored, since the latent factors are often used independently. To overcome the above problems, we propose a new framework for recommender systems, called collaborative factorization. It expresses the user as the combination of his own factors and those of the neighbors', called collaborative latent factors, and a ranking loss is then utilized for optimization. The advantage of our approach is that it can both enjoy the merits of NN and MF methods. In this paper, we take the logistic loss in RankNet and the likelihood loss in ListMLE as examples, and the corresponding collaborative factorization methods are called CoF-Net and CoF-MLE. Our experimental results on three benchmark datasets show that they are more effective than several state-of-the-art recommendation methods.
推荐系统已经成为一种有效的信息过滤工具,它通常通过top-k的排序列表向用户提供最有用的项目。传统的推荐技术如最近邻(NN)和矩阵分解(MF)在实际推荐系统中得到了广泛的应用。然而,这两种方法都不能很好地完成推荐任务,因为:(1)大多数神经网络方法利用邻居的行为进行预测,这可能会遭受严重的数据稀疏性问题;(2) MF方法对稀疏度的敏感性较低,但由于潜在因素往往是独立使用的,所以邻域对潜在因素的影响没有得到充分的探讨。为了克服上述问题,我们提出了一个新的推荐系统框架,称为协作分解。它将用户表示为自己的因素和邻居的因素的组合,称为协作潜在因素,然后利用排名损失进行优化。我们的方法的优点是它可以同时享受神经网络和MF方法的优点。本文以RankNet中的逻辑损失和ListMLE中的似然损失为例,将相应的协同分解方法分别称为CoF-Net和CoF-MLE。我们在三个基准数据集上的实验结果表明,它们比几种最先进的推荐方法更有效。
{"title":"Collaborative factorization for recommender systems","authors":"Chaosheng Fan, Yanyan Lan, J. Guo, Zuoquan Lin, Xueqi Cheng","doi":"10.1145/2484028.2484176","DOIUrl":"https://doi.org/10.1145/2484028.2484176","url":null,"abstract":"Recommender system has become an effective tool for information filtering, which usually provides the most useful items to users by a top-k ranking list. Traditional recommendation techniques such as Nearest Neighbors (NN) and Matrix Factorization (MF) have been widely used in real recommender systems. However, neither approaches can well accomplish recommendation task since that: (1) most NN methods leverage the neighbor's behaviors for prediction, which may suffer the severe data sparsity problem; (2) MF methods are less sensitive to sparsity, but neighbors' influences on latent factors are not fully explored, since the latent factors are often used independently. To overcome the above problems, we propose a new framework for recommender systems, called collaborative factorization. It expresses the user as the combination of his own factors and those of the neighbors', called collaborative latent factors, and a ranking loss is then utilized for optimization. The advantage of our approach is that it can both enjoy the merits of NN and MF methods. In this paper, we take the logistic loss in RankNet and the likelihood loss in ListMLE as examples, and the corresponding collaborative factorization methods are called CoF-Net and CoF-MLE. Our experimental results on three benchmark datasets show that they are more effective than several state-of-the-art recommendation methods.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121558533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Kinship contextualization: utilizing the preceding and following structural elements 亲属关系语境化:利用前后结构要素
Muhammad Ali Norozi, Paavo Arvola
The textual context of an element, structurally, contains traces of evidences. Utilizing this context in scoring is called contextualization. In this study we hypothesize that the context of an XML-element originated from its textit{preceding} and textit{following} elements in the sequential ordering of a document improves the quality of retrieval. In the tree form of the document's structure, textit{kinship} contextualization means, contextualization based on the horizontal and vertical elements in the textit{kinship tree,} or elements in closer to a wider structural kinship. We have tested several variants of kinship contextualization and verified notable improvements in comparison with the baseline system and gold standards in the retrieval of focused elements.
一个元素的文本语境,在结构上包含证据的痕迹。在评分中利用这种情境被称为情境化。在本研究中,我们假设xml元素的上下文来源于文档顺序中的textit{前}textit{一个和后}一个元素,从而提高了检索的质量。在文献结构的树形中,textit{亲属关系}语境化是指,基于亲属关系textit{树中横向和纵向元素的语境化},或更接近于更广泛的结构性亲属关系元素的语境化。我们已经测试了亲属关系语境化的几种变体,并验证了在检索重点要素方面与基线系统和金标准相比的显着改进。
{"title":"Kinship contextualization: utilizing the preceding and following structural elements","authors":"Muhammad Ali Norozi, Paavo Arvola","doi":"10.1145/2484028.2484111","DOIUrl":"https://doi.org/10.1145/2484028.2484111","url":null,"abstract":"The textual context of an element, structurally, contains traces of evidences. Utilizing this context in scoring is called contextualization. In this study we hypothesize that the context of an XML-element originated from its textit{preceding} and textit{following} elements in the sequential ordering of a document improves the quality of retrieval. In the tree form of the document's structure, textit{kinship} contextualization means, contextualization based on the horizontal and vertical elements in the textit{kinship tree,} or elements in closer to a wider structural kinship. We have tested several variants of kinship contextualization and verified notable improvements in comparison with the baseline system and gold standards in the retrieval of focused elements.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125816443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Exploring semi-automatic nugget extraction for Japanese one click access evaluation 探索用于日语一键访问评价的半自动金块提取
Matthew Ekstrand-Abueg, Virgil Pavlu, Makoto P. Kato, T. Sakai, Takehiro Yamamoto, Mayu Iwata
Building test collections based on nuggets is useful evaluating systems that return documents, answers, or summaries. However, nugget construction requires a lot of manual work and is not feasible for large query sets. Towards an efficient and scalable nugget-based evaluation, we study the applicability of semi-automatic nugget extraction in the context of the ongoing NTCIR One Click Access (1CLICK) task. We compare manually-extracted and semi-automatically-extracted Japanese nuggets to demonstrate the coverage and efficiency of the semi-automatic nugget extraction. Our findings suggest that the manual nugget extraction can be replaced with a direct adaptation of the English semi-automatic nugget extraction system, especially for queries for which the user desires broad answers from free-form text.
基于掘金构建测试集合对于评估返回文档、答案或摘要的系统非常有用。然而,核块构造需要大量的手工工作,并且不适合大型查询集。为了实现高效、可扩展的基于金块的评估,我们研究了半自动金块提取在正在进行的NTCIR一键访问(1CLICK)任务中的适用性。我们比较了人工提取和半自动提取的日本金块,以证明半自动金块提取的覆盖范围和效率。我们的研究结果表明,人工块提取可以被直接适应的英语半自动块提取系统所取代,特别是对于用户希望从自由格式文本中获得广泛答案的查询。
{"title":"Exploring semi-automatic nugget extraction for Japanese one click access evaluation","authors":"Matthew Ekstrand-Abueg, Virgil Pavlu, Makoto P. Kato, T. Sakai, Takehiro Yamamoto, Mayu Iwata","doi":"10.1145/2484028.2484153","DOIUrl":"https://doi.org/10.1145/2484028.2484153","url":null,"abstract":"Building test collections based on nuggets is useful evaluating systems that return documents, answers, or summaries. However, nugget construction requires a lot of manual work and is not feasible for large query sets. Towards an efficient and scalable nugget-based evaluation, we study the applicability of semi-automatic nugget extraction in the context of the ongoing NTCIR One Click Access (1CLICK) task. We compare manually-extracted and semi-automatically-extracted Japanese nuggets to demonstrate the coverage and efficiency of the semi-automatic nugget extraction. Our findings suggest that the manual nugget extraction can be replaced with a direct adaptation of the English semi-automatic nugget extraction system, especially for queries for which the user desires broad answers from free-form text.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129463492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Competence-based song recommendation 基于能力的歌曲推荐
L. Shou, Kuang Mao, Xinyuan Luo, Ke Chen, Gang Chen, Tianlei Hu
Singing is a popular social activity and a good way of expressing one's feelings. One important reason for unsuccessful singing performance is because the singer fails to choose a suitable song. In this paper, we propose a novel singing competence-based song recommendation framework. It is distinguished from most existing music recommendation systems which rely on the computation of listeners' interests or similarity. We model a singer's vocal competence as singer profile, which takes voice pitch, intensity, and quality into consideration. Then we propose techniques to acquire singer profiles. We also present a song profile model which is used to construct a human annotated song database. Finally, we propose a learning-to-rank scheme for recommending songs by singer profile. The experimental study on real singers demonstrates the effectiveness of our approach and its advantages over two baseline methods. To the best of our knowledge, our work is the first to study competence-based song recommendation.
唱歌是一种流行的社会活动,也是表达情感的好方法。歌唱表演不成功的一个重要原因是歌手没有选择合适的歌曲。在本文中,我们提出了一个新的基于歌唱能力的歌曲推荐框架。它区别于大多数现有的音乐推荐系统依赖于听众兴趣或相似度的计算。我们将歌手的声音能力建模为歌手的形象,其中考虑了音高,强度和质量。然后,我们提出了获取歌手资料的技术。我们还提出了一个歌曲轮廓模型,用于构建人类注释歌曲数据库。最后,我们提出了一种根据歌手个人资料推荐歌曲的学习排序方案。通过对真实歌手的实验研究,证明了该方法的有效性和优于两种基线方法的优点。据我们所知,我们的工作是第一个研究基于能力的歌曲推荐。
{"title":"Competence-based song recommendation","authors":"L. Shou, Kuang Mao, Xinyuan Luo, Ke Chen, Gang Chen, Tianlei Hu","doi":"10.1145/2484028.2484048","DOIUrl":"https://doi.org/10.1145/2484028.2484048","url":null,"abstract":"Singing is a popular social activity and a good way of expressing one's feelings. One important reason for unsuccessful singing performance is because the singer fails to choose a suitable song. In this paper, we propose a novel singing competence-based song recommendation framework. It is distinguished from most existing music recommendation systems which rely on the computation of listeners' interests or similarity. We model a singer's vocal competence as singer profile, which takes voice pitch, intensity, and quality into consideration. Then we propose techniques to acquire singer profiles. We also present a song profile model which is used to construct a human annotated song database. Finally, we propose a learning-to-rank scheme for recommending songs by singer profile. The experimental study on real singers demonstrates the effectiveness of our approach and its advantages over two baseline methods. To the best of our knowledge, our work is the first to study competence-based song recommendation.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126969741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Search result diversification in resource selection for federated search 联邦搜索资源选择中的搜索结果多样化
Dzung Hong, Luo Si
Prior research in resource selection for federated search mainly focused on selecting a small number of information sources that are most relevant to a user query. However, result novelty and diversification are largely unexplored, which does not reflect the various kinds of information needs of users in real world applications. This paper proposes two general approaches to model both result relevance and diversification in selecting sources, in order to provide more comprehensive coverage of multiple aspects of a user query. The first approach focuses on diversifying the document ranking on a centralized sample database before selecting information sources under the framework of Relevant Document Distribution Estimation (ReDDE). The second approach first evaluates the relevance of information sources with respect to each aspect of the query, and then ranks the sources based on the novelty and relevance that they offer. Both approaches can be applied with a wide range of existing resource selection algorithms such as ReDDE, CRCS, CORI and Big Document. Moreover, this paper proposes a learning based approach to combine multiple resource selection algorithms for result diversification, which can further improve the performance. We propose a set of new metrics for resource selection in federated search to evaluate the diversification performance of different approaches. To our best knowledge, this is the first piece of work that addresses the problem of search result diversification in federated search. The effectiveness of the proposed approaches has been demonstrated by an extensive set of experiments on the federated search testbed of the Clueweb dataset.
先前关于联邦搜索资源选择的研究主要集中在选择与用户查询最相关的少数信息源。然而,结果的新颖性和多样性在很大程度上尚未得到开发,这并不能反映现实世界应用中用户的各种信息需求。为了更全面地覆盖用户查询的多个方面,本文提出了两种通用的方法来建模结果相关性和选择来源的多样性。第一种方法侧重于在相关文档分布估计(ReDDE)框架下,在选择信息源之前,在集中的样本数据库上多样化文档排名。第二种方法首先根据查询的每个方面评估信息源的相关性,然后根据它们提供的新颖性和相关性对信息源进行排序。这两种方法都可以广泛应用于现有的资源选择算法,如ReDDE、CRCS、CORI和Big Document。此外,本文提出了一种基于学习的方法,将多种资源选择算法结合起来实现结果多样化,进一步提高了性能。我们提出了一套新的联邦搜索资源选择指标,以评估不同方法的多样化性能。据我们所知,这是解决联邦搜索中搜索结果多样化问题的第一部分工作。在Clueweb数据集的联邦搜索测试平台上进行了大量的实验,证明了所提出方法的有效性。
{"title":"Search result diversification in resource selection for federated search","authors":"Dzung Hong, Luo Si","doi":"10.1145/2484028.2484091","DOIUrl":"https://doi.org/10.1145/2484028.2484091","url":null,"abstract":"Prior research in resource selection for federated search mainly focused on selecting a small number of information sources that are most relevant to a user query. However, result novelty and diversification are largely unexplored, which does not reflect the various kinds of information needs of users in real world applications. This paper proposes two general approaches to model both result relevance and diversification in selecting sources, in order to provide more comprehensive coverage of multiple aspects of a user query. The first approach focuses on diversifying the document ranking on a centralized sample database before selecting information sources under the framework of Relevant Document Distribution Estimation (ReDDE). The second approach first evaluates the relevance of information sources with respect to each aspect of the query, and then ranks the sources based on the novelty and relevance that they offer. Both approaches can be applied with a wide range of existing resource selection algorithms such as ReDDE, CRCS, CORI and Big Document. Moreover, this paper proposes a learning based approach to combine multiple resource selection algorithms for result diversification, which can further improve the performance. We propose a set of new metrics for resource selection in federated search to evaluate the diversification performance of different approaches. To our best knowledge, this is the first piece of work that addresses the problem of search result diversification in federated search. The effectiveness of the proposed approaches has been demonstrated by an extensive set of experiments on the federated search testbed of the Clueweb dataset.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121999153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Improving search result summaries by using searcher behavior data 通过使用搜索者行为数据改进搜索结果摘要
Mikhail S. Ageev, Dmitry Lagun, Eugene Agichtein
Query-biased search result summaries, or "snippets", help users decide whether a result is relevant for their information need, and have become increasingly important for helping searchers with difficult or ambiguous search tasks. Previously published snippet generation algorithms have been primarily based on selecting document fragments most similar to the query, which does not take into account which parts of the document the searchers actually found useful. We present a new approach to improving result summaries by incorporating post-click searcher behavior data, such as mouse cursor movements and scrolling over the result documents. To achieve this aim, we develop a method for collecting behavioral data with precise association between searcher intent, document examination behavior, and the corresponding document fragments. In turn, this allows us to incorporate page examination behavior signals into a novel Behavior-Biased Snippet generation system (BeBS). By mining searcher examination data, BeBS infers document fragments of most interest to users, and combines this evidence with text-based features to select the most promising fragments for inclusion in the result summary. Our extensive experiments and analysis demonstrate that our method improves the quality of result summaries compared to existing state-of-the-art methods. We believe that this work opens a new direction for improving search result presentation, and we make available the code and the search behavior data used in this study to encourage further research in this area.
偏向于查询的搜索结果摘要,或“片段”,帮助用户确定结果是否与他们的信息需求相关,并且在帮助搜索者处理困难或模糊的搜索任务方面变得越来越重要。以前发布的片段生成算法主要基于选择与查询最相似的文档片段,而没有考虑到搜索者认为文档的哪些部分是有用的。我们提出了一种新的方法,通过合并点击后搜索者行为数据来改进结果摘要,例如鼠标光标移动和在结果文档上滚动。为了实现这一目标,我们开发了一种收集行为数据的方法,这些数据在搜索者意图、文档检查行为和相应的文档片段之间具有精确的关联。反过来,这允许我们将页面检查行为信号合并到一个新的行为偏差片段生成系统(BeBS)中。通过挖掘搜索者检查数据,BeBS推断出用户最感兴趣的文档片段,并将这些证据与基于文本的特征相结合,选择最有希望的片段包含在结果摘要中。我们广泛的实验和分析表明,与现有的最先进的方法相比,我们的方法提高了结果摘要的质量。我们相信这项工作为改进搜索结果的呈现开辟了一个新的方向,我们提供了本研究中使用的代码和搜索行为数据,以鼓励该领域的进一步研究。
{"title":"Improving search result summaries by using searcher behavior data","authors":"Mikhail S. Ageev, Dmitry Lagun, Eugene Agichtein","doi":"10.1145/2484028.2484093","DOIUrl":"https://doi.org/10.1145/2484028.2484093","url":null,"abstract":"Query-biased search result summaries, or \"snippets\", help users decide whether a result is relevant for their information need, and have become increasingly important for helping searchers with difficult or ambiguous search tasks. Previously published snippet generation algorithms have been primarily based on selecting document fragments most similar to the query, which does not take into account which parts of the document the searchers actually found useful. We present a new approach to improving result summaries by incorporating post-click searcher behavior data, such as mouse cursor movements and scrolling over the result documents. To achieve this aim, we develop a method for collecting behavioral data with precise association between searcher intent, document examination behavior, and the corresponding document fragments. In turn, this allows us to incorporate page examination behavior signals into a novel Behavior-Biased Snippet generation system (BeBS). By mining searcher examination data, BeBS infers document fragments of most interest to users, and combines this evidence with text-based features to select the most promising fragments for inclusion in the result summary. Our extensive experiments and analysis demonstrate that our method improves the quality of result summaries compared to existing state-of-the-art methods. We believe that this work opens a new direction for improving search result presentation, and we make available the code and the search behavior data used in this study to encourage further research in this area.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120945552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Characterizing stages of a multi-session complex search task through direct and indirect query modifications 通过直接和间接查询修改来描述多会话复杂搜索任务的各个阶段
Jiyin He, M. Bron, A. D. Vries
Search systems use context to effectively satisfy a user's information need as expressed by a query. Tasks are important factors in determining user context during search and many studies have been conducted that identify tasks and task stages through users' interaction behavior with search systems. The type of interaction available to users, however, depends on the type of search interface features available. Queries are the most pervasive input from users to express their information need regardless of the input method, e.g., typing keywords or clicking facets. Instead of characterizing interaction behavior in terms of interface specific components, we propose to characterize users' search behavior in terms of two types of query modification: (i) direct modification, which refers to reformulations of queries; and (ii) indirect modification, which refers to user operations on additional input components provided by various search interfaces. We investigate the utility of characterizing task stages through direct and indirect query reformulations in a case study and find that it is possible to effectively differentiate subsequent stages of the search task. We found that describing user interaction behavior in such a generic form allowed us to relate user actions to search task stages independent from the specific search interface deployed. The next step will then be to validate this idea in a setting with a wider palette of search tasks and tools.
搜索系统使用上下文来有效地满足查询所表达的用户信息需求。任务是决定搜索过程中用户语境的重要因素,许多研究通过用户与搜索系统的交互行为来确定任务和任务阶段。然而,用户可用的交互类型取决于可用的搜索界面特性的类型。查询是用户用来表达信息需求的最普遍的输入方式,无论使用何种输入法,例如,键入关键字或单击facet。我们不是用界面特定组件来描述交互行为,而是用两种类型的查询修改来描述用户的搜索行为:(i)直接修改,指的是查询的重新表述;(ii)间接修改,即用户对各种搜索界面提供的额外输入组件进行操作。我们在案例研究中研究了通过直接和间接查询重新表述来描述任务阶段的效用,并发现它可以有效地区分搜索任务的后续阶段。我们发现,以这种通用形式描述用户交互行为,使我们能够将用户操作与搜索任务阶段联系起来,而不依赖于所部署的特定搜索界面。下一步将是在更广泛的搜索任务和工具的设置中验证这个想法。
{"title":"Characterizing stages of a multi-session complex search task through direct and indirect query modifications","authors":"Jiyin He, M. Bron, A. D. Vries","doi":"10.1145/2484028.2484178","DOIUrl":"https://doi.org/10.1145/2484028.2484178","url":null,"abstract":"Search systems use context to effectively satisfy a user's information need as expressed by a query. Tasks are important factors in determining user context during search and many studies have been conducted that identify tasks and task stages through users' interaction behavior with search systems. The type of interaction available to users, however, depends on the type of search interface features available. Queries are the most pervasive input from users to express their information need regardless of the input method, e.g., typing keywords or clicking facets. Instead of characterizing interaction behavior in terms of interface specific components, we propose to characterize users' search behavior in terms of two types of query modification: (i) direct modification, which refers to reformulations of queries; and (ii) indirect modification, which refers to user operations on additional input components provided by various search interfaces. We investigate the utility of characterizing task stages through direct and indirect query reformulations in a case study and find that it is possible to effectively differentiate subsequent stages of the search task. We found that describing user interaction behavior in such a generic form allowed us to relate user actions to search task stages independent from the specific search interface deployed. The next step will then be to validate this idea in a setting with a wider palette of search tasks and tools.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121135772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Learning to personalize query auto-completion 学习个性化查询自动完成
Milad Shokouhi
Query auto-completion (QAC) is one of the most prominent features of modern search engines. The list of query candidates is generated according to the prefix entered by the user in the search box and is updated on each new key stroke. Query prefixes tend to be short and ambiguous, and existing models mostly rely on the past popularity of matching candidates for ranking. However, the popularity of certain queries may vary drastically across different demographics and users. For instance, while instagram and imdb have comparable popularities overall and are both legitimate candidates to show for prefix i, the former is noticeably more popular among young female users, and the latter is more likely to be issued by men. In this paper, we present a supervised framework for personalizing auto-completion ranking. We introduce a novel labelling strategy for generating offline training labels that can be used for learning personalized rankers. We compare the effectiveness of several user-specific and demographic-based features and show that among them, the user's long-term search history and location are the most effective for personalizing auto-completion rankers. We perform our experiments on the publicly available AOL query logs, and also on the larger-scale logs of Bing. The results suggest that supervised rankers enhanced by personalization features can significantly outperform the existing popularity-based base-lines, in terms of mean reciprocal rank (MRR) by up to 9%.
查询自动完成(QAC)是现代搜索引擎最突出的特性之一。查询候选列表根据用户在搜索框中输入的前缀生成,并在每次新的按键时更新。查询前缀往往很短且模棱两可,现有模型主要依赖于过去匹配候选项的流行程度来进行排序。然而,某些查询的受欢迎程度在不同的人口统计数据和用户之间可能会有很大差异。例如,虽然instagram和imdb的总体受欢迎程度相当,而且都是前缀i的合法候选,但前者在年轻女性用户中明显更受欢迎,而后者更可能由男性发布。在本文中,我们提出了一个个性化自动完成排名的监督框架。我们引入了一种新的标签策略,用于生成离线训练标签,用于学习个性化排名器。我们比较了几个特定于用户和基于人口统计的功能的有效性,并表明其中,用户的长期搜索历史和位置对于个性化自动完成排名最有效。我们在公开可用的AOL查询日志和Bing的更大规模日志上执行实验。结果表明,通过个性化特征增强的监督排序器在平均倒数排名(MRR)方面显著优于现有的基于人气的基线,最高可达9%。
{"title":"Learning to personalize query auto-completion","authors":"Milad Shokouhi","doi":"10.1145/2484028.2484076","DOIUrl":"https://doi.org/10.1145/2484028.2484076","url":null,"abstract":"Query auto-completion (QAC) is one of the most prominent features of modern search engines. The list of query candidates is generated according to the prefix entered by the user in the search box and is updated on each new key stroke. Query prefixes tend to be short and ambiguous, and existing models mostly rely on the past popularity of matching candidates for ranking. However, the popularity of certain queries may vary drastically across different demographics and users. For instance, while instagram and imdb have comparable popularities overall and are both legitimate candidates to show for prefix i, the former is noticeably more popular among young female users, and the latter is more likely to be issued by men. In this paper, we present a supervised framework for personalizing auto-completion ranking. We introduce a novel labelling strategy for generating offline training labels that can be used for learning personalized rankers. We compare the effectiveness of several user-specific and demographic-based features and show that among them, the user's long-term search history and location are the most effective for personalizing auto-completion rankers. We perform our experiments on the publicly available AOL query logs, and also on the larger-scale logs of Bing. The results suggest that supervised rankers enhanced by personalization features can significantly outperform the existing popularity-based base-lines, in terms of mean reciprocal rank (MRR) by up to 9%.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125099560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 189
期刊
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1