首页 > 最新文献

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval最新文献

英文 中文
Search result diversification in resource selection for federated search 联邦搜索资源选择中的搜索结果多样化
Dzung Hong, Luo Si
Prior research in resource selection for federated search mainly focused on selecting a small number of information sources that are most relevant to a user query. However, result novelty and diversification are largely unexplored, which does not reflect the various kinds of information needs of users in real world applications. This paper proposes two general approaches to model both result relevance and diversification in selecting sources, in order to provide more comprehensive coverage of multiple aspects of a user query. The first approach focuses on diversifying the document ranking on a centralized sample database before selecting information sources under the framework of Relevant Document Distribution Estimation (ReDDE). The second approach first evaluates the relevance of information sources with respect to each aspect of the query, and then ranks the sources based on the novelty and relevance that they offer. Both approaches can be applied with a wide range of existing resource selection algorithms such as ReDDE, CRCS, CORI and Big Document. Moreover, this paper proposes a learning based approach to combine multiple resource selection algorithms for result diversification, which can further improve the performance. We propose a set of new metrics for resource selection in federated search to evaluate the diversification performance of different approaches. To our best knowledge, this is the first piece of work that addresses the problem of search result diversification in federated search. The effectiveness of the proposed approaches has been demonstrated by an extensive set of experiments on the federated search testbed of the Clueweb dataset.
先前关于联邦搜索资源选择的研究主要集中在选择与用户查询最相关的少数信息源。然而,结果的新颖性和多样性在很大程度上尚未得到开发,这并不能反映现实世界应用中用户的各种信息需求。为了更全面地覆盖用户查询的多个方面,本文提出了两种通用的方法来建模结果相关性和选择来源的多样性。第一种方法侧重于在相关文档分布估计(ReDDE)框架下,在选择信息源之前,在集中的样本数据库上多样化文档排名。第二种方法首先根据查询的每个方面评估信息源的相关性,然后根据它们提供的新颖性和相关性对信息源进行排序。这两种方法都可以广泛应用于现有的资源选择算法,如ReDDE、CRCS、CORI和Big Document。此外,本文提出了一种基于学习的方法,将多种资源选择算法结合起来实现结果多样化,进一步提高了性能。我们提出了一套新的联邦搜索资源选择指标,以评估不同方法的多样化性能。据我们所知,这是解决联邦搜索中搜索结果多样化问题的第一部分工作。在Clueweb数据集的联邦搜索测试平台上进行了大量的实验,证明了所提出方法的有效性。
{"title":"Search result diversification in resource selection for federated search","authors":"Dzung Hong, Luo Si","doi":"10.1145/2484028.2484091","DOIUrl":"https://doi.org/10.1145/2484028.2484091","url":null,"abstract":"Prior research in resource selection for federated search mainly focused on selecting a small number of information sources that are most relevant to a user query. However, result novelty and diversification are largely unexplored, which does not reflect the various kinds of information needs of users in real world applications. This paper proposes two general approaches to model both result relevance and diversification in selecting sources, in order to provide more comprehensive coverage of multiple aspects of a user query. The first approach focuses on diversifying the document ranking on a centralized sample database before selecting information sources under the framework of Relevant Document Distribution Estimation (ReDDE). The second approach first evaluates the relevance of information sources with respect to each aspect of the query, and then ranks the sources based on the novelty and relevance that they offer. Both approaches can be applied with a wide range of existing resource selection algorithms such as ReDDE, CRCS, CORI and Big Document. Moreover, this paper proposes a learning based approach to combine multiple resource selection algorithms for result diversification, which can further improve the performance. We propose a set of new metrics for resource selection in federated search to evaluate the diversification performance of different approaches. To our best knowledge, this is the first piece of work that addresses the problem of search result diversification in federated search. The effectiveness of the proposed approaches has been demonstrated by an extensive set of experiments on the federated search testbed of the Clueweb dataset.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121999153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Improving search result summaries by using searcher behavior data 通过使用搜索者行为数据改进搜索结果摘要
Mikhail S. Ageev, Dmitry Lagun, Eugene Agichtein
Query-biased search result summaries, or "snippets", help users decide whether a result is relevant for their information need, and have become increasingly important for helping searchers with difficult or ambiguous search tasks. Previously published snippet generation algorithms have been primarily based on selecting document fragments most similar to the query, which does not take into account which parts of the document the searchers actually found useful. We present a new approach to improving result summaries by incorporating post-click searcher behavior data, such as mouse cursor movements and scrolling over the result documents. To achieve this aim, we develop a method for collecting behavioral data with precise association between searcher intent, document examination behavior, and the corresponding document fragments. In turn, this allows us to incorporate page examination behavior signals into a novel Behavior-Biased Snippet generation system (BeBS). By mining searcher examination data, BeBS infers document fragments of most interest to users, and combines this evidence with text-based features to select the most promising fragments for inclusion in the result summary. Our extensive experiments and analysis demonstrate that our method improves the quality of result summaries compared to existing state-of-the-art methods. We believe that this work opens a new direction for improving search result presentation, and we make available the code and the search behavior data used in this study to encourage further research in this area.
偏向于查询的搜索结果摘要,或“片段”,帮助用户确定结果是否与他们的信息需求相关,并且在帮助搜索者处理困难或模糊的搜索任务方面变得越来越重要。以前发布的片段生成算法主要基于选择与查询最相似的文档片段,而没有考虑到搜索者认为文档的哪些部分是有用的。我们提出了一种新的方法,通过合并点击后搜索者行为数据来改进结果摘要,例如鼠标光标移动和在结果文档上滚动。为了实现这一目标,我们开发了一种收集行为数据的方法,这些数据在搜索者意图、文档检查行为和相应的文档片段之间具有精确的关联。反过来,这允许我们将页面检查行为信号合并到一个新的行为偏差片段生成系统(BeBS)中。通过挖掘搜索者检查数据,BeBS推断出用户最感兴趣的文档片段,并将这些证据与基于文本的特征相结合,选择最有希望的片段包含在结果摘要中。我们广泛的实验和分析表明,与现有的最先进的方法相比,我们的方法提高了结果摘要的质量。我们相信这项工作为改进搜索结果的呈现开辟了一个新的方向,我们提供了本研究中使用的代码和搜索行为数据,以鼓励该领域的进一步研究。
{"title":"Improving search result summaries by using searcher behavior data","authors":"Mikhail S. Ageev, Dmitry Lagun, Eugene Agichtein","doi":"10.1145/2484028.2484093","DOIUrl":"https://doi.org/10.1145/2484028.2484093","url":null,"abstract":"Query-biased search result summaries, or \"snippets\", help users decide whether a result is relevant for their information need, and have become increasingly important for helping searchers with difficult or ambiguous search tasks. Previously published snippet generation algorithms have been primarily based on selecting document fragments most similar to the query, which does not take into account which parts of the document the searchers actually found useful. We present a new approach to improving result summaries by incorporating post-click searcher behavior data, such as mouse cursor movements and scrolling over the result documents. To achieve this aim, we develop a method for collecting behavioral data with precise association between searcher intent, document examination behavior, and the corresponding document fragments. In turn, this allows us to incorporate page examination behavior signals into a novel Behavior-Biased Snippet generation system (BeBS). By mining searcher examination data, BeBS infers document fragments of most interest to users, and combines this evidence with text-based features to select the most promising fragments for inclusion in the result summary. Our extensive experiments and analysis demonstrate that our method improves the quality of result summaries compared to existing state-of-the-art methods. We believe that this work opens a new direction for improving search result presentation, and we make available the code and the search behavior data used in this study to encourage further research in this area.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120945552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Incorporating vertical results into search click models 将垂直结果整合到搜索点击模型中
Chao Wang, Yiqun Liu, Min Zhang, Shaoping Ma, Meihong Zheng, Jing Qian, Kuo Zhang
In modern search engines, an increasing number of search result pages (SERPs) are federated from multiple specialized search engines (called verticals, such as Image or Video). As an effective approach to interpret users' click-through behavior as feedback information, most click models were designed to reduce the position bias and improve ranking performance of ordinary search results, which have homogeneous appearances. However, when vertical results are combined with ordinary ones, significant differences in presentation may lead to user behavior biases and thus failure of state-of-the-art click models. With the help of a popular commercial search engine in China, we collected a large scale log data set which contains behavior information on both vertical and ordinary results. We also performed eye-tracking analysis to study user's real-world examining behavior. According these analysis, we found that different result appearances may cause different behavior biases both for vertical results (local effect) and for the whole result lists (global effect). These biases include: examine bias for vertical results (especially those with multimedia components), trust bias for result lists with vertical results, and a higher probability of result revisitation for vertical results. Based on these findings, a novel click model considering these biases besides position bias was constructed to describe interaction with SERPs containing verticals. Experimental results show that the new Vertical-aware Click Model (VCM) is better at interpreting user click behavior on federated searches in terms of both log-likelihood and perplexity than existing models.
在现代搜索引擎中,越来越多的搜索结果页面(serp)由多个专门的搜索引擎(称为垂直搜索引擎,如Image或Video)联合而成。作为一种将用户点击行为解释为反馈信息的有效方法,大多数点击模型的设计都是为了减少位置偏差,提高普通搜索结果的排名性能,而普通搜索结果具有同质的外观。然而,当垂直结果与普通结果结合在一起时,表现的显著差异可能会导致用户行为偏差,从而导致最先进的点击模型失败。在中国一个流行的商业搜索引擎的帮助下,我们收集了一个大规模的日志数据集,其中包含垂直和普通结果的行为信息。我们还进行了眼动追踪分析,以研究用户在现实世界中的检查行为。根据这些分析,我们发现不同的结果出现可能导致不同的行为偏差,无论是垂直结果(局部效应)还是整个结果列表(全局效应)。这些偏差包括:垂直结果的检验偏差(特别是那些带有多媒体组件的结果),垂直结果的结果列表的信任偏差,以及垂直结果的更高的结果重访概率。基于这些发现,我们构建了一个新的点击模型,除了考虑位置偏差之外,还考虑了这些偏差,以描述与包含垂直内容的serp的交互。实验结果表明,与现有模型相比,新的垂直感知点击模型(VCM)在对数似然和困惑度方面都能更好地解释用户在联邦搜索中的点击行为。
{"title":"Incorporating vertical results into search click models","authors":"Chao Wang, Yiqun Liu, Min Zhang, Shaoping Ma, Meihong Zheng, Jing Qian, Kuo Zhang","doi":"10.1145/2484028.2484036","DOIUrl":"https://doi.org/10.1145/2484028.2484036","url":null,"abstract":"In modern search engines, an increasing number of search result pages (SERPs) are federated from multiple specialized search engines (called verticals, such as Image or Video). As an effective approach to interpret users' click-through behavior as feedback information, most click models were designed to reduce the position bias and improve ranking performance of ordinary search results, which have homogeneous appearances. However, when vertical results are combined with ordinary ones, significant differences in presentation may lead to user behavior biases and thus failure of state-of-the-art click models. With the help of a popular commercial search engine in China, we collected a large scale log data set which contains behavior information on both vertical and ordinary results. We also performed eye-tracking analysis to study user's real-world examining behavior. According these analysis, we found that different result appearances may cause different behavior biases both for vertical results (local effect) and for the whole result lists (global effect). These biases include: examine bias for vertical results (especially those with multimedia components), trust bias for result lists with vertical results, and a higher probability of result revisitation for vertical results. Based on these findings, a novel click model considering these biases besides position bias was constructed to describe interaction with SERPs containing verticals. Experimental results show that the new Vertical-aware Click Model (VCM) is better at interpreting user click behavior on federated searches in terms of both log-likelihood and perplexity than existing models.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114794224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
Workshop on health search and discovery: helping users and advancing medicine 健康搜索和发现:帮助用户和推进医学研讨会
Ryen W. White, E. Yom-Tov, E. Horvitz, Eugene Agichtein, W. Hersh
This workshop brings together researchers and practitioners from industry and academia to discuss search and discovery in the medi-cal domain. The event focuses on ways to make medical and health information more accessible to laypeople (including enhancements to ranking algorithms and search interfaces), and how we can dis-cover new medical facts and phenomena from information sought online, as evidenced in query streams and other sources such as social media. This domain also offers many opportunities for appli-cations that monitor and improve quality of life of those affected by medical conditions, by providing tools to support their health-related information behavior.
本次研讨会汇集了来自工业界和学术界的研究人员和实践者,讨论医疗领域的搜索和发现。该活动的重点是如何使非专业人员更容易获得医疗和卫生信息(包括增强排名算法和搜索界面),以及我们如何从在线搜索的信息中发现新的医疗事实和现象,如查询流和社交媒体等其他来源所证明的那样。该领域还为应用程序提供了许多机会,这些应用程序通过提供工具来支持与健康相关的信息行为,监测并改善受医疗条件影响者的生活质量。
{"title":"Workshop on health search and discovery: helping users and advancing medicine","authors":"Ryen W. White, E. Yom-Tov, E. Horvitz, Eugene Agichtein, W. Hersh","doi":"10.1145/2484028.2484220","DOIUrl":"https://doi.org/10.1145/2484028.2484220","url":null,"abstract":"This workshop brings together researchers and practitioners from industry and academia to discuss search and discovery in the medi-cal domain. The event focuses on ways to make medical and health information more accessible to laypeople (including enhancements to ranking algorithms and search interfaces), and how we can dis-cover new medical facts and phenomena from information sought online, as evidenced in query streams and other sources such as social media. This domain also offers many opportunities for appli-cations that monitor and improve quality of life of those affected by medical conditions, by providing tools to support their health-related information behavior.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115149662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Faster and smaller inverted indices with treaps 更快和更小的倒排索引与堆
Roberto Konow, G. Navarro, C. Clarke, A. López-Ortiz
We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. To achieve compression we represent the treap topology using compact data structures. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. Results show that our index uses about 20% less space, and performs queries up to three times faster, than state-of-the-art compact representations.
我们引入了一种新的倒排索引表示,它在使用更少的空间的同时执行更快的排名并和交集。我们的索引基于trap数据结构,它允许我们相交/合并文档标识符,同时按频率设置阈值,而不是使用代价较高的两步经典处理方法。为了实现压缩,我们使用紧凑的数据结构来表示堆拓扑。此外,处理不变量允许我们优雅地对文档标识符和频率进行编码。结果表明,我们的索引使用的空间减少了20%,执行查询的速度比最先进的紧凑表示快三倍。
{"title":"Faster and smaller inverted indices with treaps","authors":"Roberto Konow, G. Navarro, C. Clarke, A. López-Ortiz","doi":"10.1145/2484028.2484088","DOIUrl":"https://doi.org/10.1145/2484028.2484088","url":null,"abstract":"We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods. To achieve compression we represent the treap topology using compact data structures. Further, the treap invariants allow us to elegantly encode differentially both document identifiers and frequencies. Results show that our index uses about 20% less space, and performs queries up to three times faster, than state-of-the-art compact representations.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115504624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Recommending personalized touristic sights using google places 使用谷歌位置推荐个性化的旅游景点
Maya Sappelli, S. Verberne, Wessel Kraaij
The purpose of the Contextual Suggestion track, an evaluation task at the TREC 2012 conference, is to suggest personalized tourist activities to an individual, given a certain location and time. In our content-based approach, we collected initial recommendations using the location context as search query in Google Places. We first ranked the recommendations based on their textual similarity to the user profiles. In order to improve the ranking of popular sights, we combined the initial ranking with rankings based on Google Search, popularity and categories. Finally, we performed filtering based on the temporal context. Overall, our system performed well above average and median, and outperformed the baseline - Google Places only -- run.
作为TREC 2012会议的一项评估任务,“情境建议”的目的是在给定的地点和时间内向个人建议个性化的旅游活动。在我们基于内容的方法中,我们使用位置上下文作为Google Places中的搜索查询来收集初始推荐。我们首先根据与用户资料的文本相似性对推荐进行排名。为了提高热门景点的排名,我们将最初的排名与基于谷歌搜索、受欢迎程度和类别的排名结合起来。最后,我们基于时间上下文执行过滤。总体而言,我们的系统的表现远远高于平均水平和中位数,并且优于基准(仅限Google Places)。
{"title":"Recommending personalized touristic sights using google places","authors":"Maya Sappelli, S. Verberne, Wessel Kraaij","doi":"10.1145/2484028.2484155","DOIUrl":"https://doi.org/10.1145/2484028.2484155","url":null,"abstract":"The purpose of the Contextual Suggestion track, an evaluation task at the TREC 2012 conference, is to suggest personalized tourist activities to an individual, given a certain location and time. In our content-based approach, we collected initial recommendations using the location context as search query in Google Places. We first ranked the recommendations based on their textual similarity to the user profiles. In order to improve the ranking of popular sights, we combined the initial ranking with rankings based on Google Search, popularity and categories. Finally, we performed filtering based on the temporal context. Overall, our system performed well above average and median, and outperformed the baseline - Google Places only -- run.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116182226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Collaborative factorization for recommender systems 推荐系统的协同分解
Chaosheng Fan, Yanyan Lan, J. Guo, Zuoquan Lin, Xueqi Cheng
Recommender system has become an effective tool for information filtering, which usually provides the most useful items to users by a top-k ranking list. Traditional recommendation techniques such as Nearest Neighbors (NN) and Matrix Factorization (MF) have been widely used in real recommender systems. However, neither approaches can well accomplish recommendation task since that: (1) most NN methods leverage the neighbor's behaviors for prediction, which may suffer the severe data sparsity problem; (2) MF methods are less sensitive to sparsity, but neighbors' influences on latent factors are not fully explored, since the latent factors are often used independently. To overcome the above problems, we propose a new framework for recommender systems, called collaborative factorization. It expresses the user as the combination of his own factors and those of the neighbors', called collaborative latent factors, and a ranking loss is then utilized for optimization. The advantage of our approach is that it can both enjoy the merits of NN and MF methods. In this paper, we take the logistic loss in RankNet and the likelihood loss in ListMLE as examples, and the corresponding collaborative factorization methods are called CoF-Net and CoF-MLE. Our experimental results on three benchmark datasets show that they are more effective than several state-of-the-art recommendation methods.
推荐系统已经成为一种有效的信息过滤工具,它通常通过top-k的排序列表向用户提供最有用的项目。传统的推荐技术如最近邻(NN)和矩阵分解(MF)在实际推荐系统中得到了广泛的应用。然而,这两种方法都不能很好地完成推荐任务,因为:(1)大多数神经网络方法利用邻居的行为进行预测,这可能会遭受严重的数据稀疏性问题;(2) MF方法对稀疏度的敏感性较低,但由于潜在因素往往是独立使用的,所以邻域对潜在因素的影响没有得到充分的探讨。为了克服上述问题,我们提出了一个新的推荐系统框架,称为协作分解。它将用户表示为自己的因素和邻居的因素的组合,称为协作潜在因素,然后利用排名损失进行优化。我们的方法的优点是它可以同时享受神经网络和MF方法的优点。本文以RankNet中的逻辑损失和ListMLE中的似然损失为例,将相应的协同分解方法分别称为CoF-Net和CoF-MLE。我们在三个基准数据集上的实验结果表明,它们比几种最先进的推荐方法更有效。
{"title":"Collaborative factorization for recommender systems","authors":"Chaosheng Fan, Yanyan Lan, J. Guo, Zuoquan Lin, Xueqi Cheng","doi":"10.1145/2484028.2484176","DOIUrl":"https://doi.org/10.1145/2484028.2484176","url":null,"abstract":"Recommender system has become an effective tool for information filtering, which usually provides the most useful items to users by a top-k ranking list. Traditional recommendation techniques such as Nearest Neighbors (NN) and Matrix Factorization (MF) have been widely used in real recommender systems. However, neither approaches can well accomplish recommendation task since that: (1) most NN methods leverage the neighbor's behaviors for prediction, which may suffer the severe data sparsity problem; (2) MF methods are less sensitive to sparsity, but neighbors' influences on latent factors are not fully explored, since the latent factors are often used independently. To overcome the above problems, we propose a new framework for recommender systems, called collaborative factorization. It expresses the user as the combination of his own factors and those of the neighbors', called collaborative latent factors, and a ranking loss is then utilized for optimization. The advantage of our approach is that it can both enjoy the merits of NN and MF methods. In this paper, we take the logistic loss in RankNet and the likelihood loss in ListMLE as examples, and the corresponding collaborative factorization methods are called CoF-Net and CoF-MLE. Our experimental results on three benchmark datasets show that they are more effective than several state-of-the-art recommendation methods.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121558533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Exploring semi-automatic nugget extraction for Japanese one click access evaluation 探索用于日语一键访问评价的半自动金块提取
Matthew Ekstrand-Abueg, Virgil Pavlu, Makoto P. Kato, T. Sakai, Takehiro Yamamoto, Mayu Iwata
Building test collections based on nuggets is useful evaluating systems that return documents, answers, or summaries. However, nugget construction requires a lot of manual work and is not feasible for large query sets. Towards an efficient and scalable nugget-based evaluation, we study the applicability of semi-automatic nugget extraction in the context of the ongoing NTCIR One Click Access (1CLICK) task. We compare manually-extracted and semi-automatically-extracted Japanese nuggets to demonstrate the coverage and efficiency of the semi-automatic nugget extraction. Our findings suggest that the manual nugget extraction can be replaced with a direct adaptation of the English semi-automatic nugget extraction system, especially for queries for which the user desires broad answers from free-form text.
基于掘金构建测试集合对于评估返回文档、答案或摘要的系统非常有用。然而,核块构造需要大量的手工工作,并且不适合大型查询集。为了实现高效、可扩展的基于金块的评估,我们研究了半自动金块提取在正在进行的NTCIR一键访问(1CLICK)任务中的适用性。我们比较了人工提取和半自动提取的日本金块,以证明半自动金块提取的覆盖范围和效率。我们的研究结果表明,人工块提取可以被直接适应的英语半自动块提取系统所取代,特别是对于用户希望从自由格式文本中获得广泛答案的查询。
{"title":"Exploring semi-automatic nugget extraction for Japanese one click access evaluation","authors":"Matthew Ekstrand-Abueg, Virgil Pavlu, Makoto P. Kato, T. Sakai, Takehiro Yamamoto, Mayu Iwata","doi":"10.1145/2484028.2484153","DOIUrl":"https://doi.org/10.1145/2484028.2484153","url":null,"abstract":"Building test collections based on nuggets is useful evaluating systems that return documents, answers, or summaries. However, nugget construction requires a lot of manual work and is not feasible for large query sets. Towards an efficient and scalable nugget-based evaluation, we study the applicability of semi-automatic nugget extraction in the context of the ongoing NTCIR One Click Access (1CLICK) task. We compare manually-extracted and semi-automatically-extracted Japanese nuggets to demonstrate the coverage and efficiency of the semi-automatic nugget extraction. Our findings suggest that the manual nugget extraction can be replaced with a direct adaptation of the English semi-automatic nugget extraction system, especially for queries for which the user desires broad answers from free-form text.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129463492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Informational friend recommendation in social media 社交媒体中的信息好友推荐
Shengxian Wan, Yanyan Lan, J. Guo, Chaosheng Fan, Xueqi Cheng
It is well recognized that users rely on social media (e.g. Twitter or Digg) to fulfill two common needs (i.e. social need and informational need) that is to keep in touch with their friends in the real world and to have access to information they are interested in. Traditional friend recommendation methods in social media mainly focus on a user's social need, but seldom address their informational need (i.e. suggesting friends that can provide information one may be interested in but have not been able to obtain so far). In this paper, we propose to recommend friends according to the informational utility, which stands for the degree to which a friend satisfies the target user's unfulfilled informational need, called informational friend recommendation. In order to capture users' informational need, we view a post in social media as an item and utilize collaborative filtering techniques to predict the rating for each post. The candidate friends are then ranked according to their informational utility for recommendation. In addition, we also show how to further consider diversity in such recommendations. Experiments on benchmark datasets demonstrate that our approach can significantly outperform the traditional friend recommendation methods under informational evaluation measures.
众所周知,用户依赖社交媒体(如Twitter或Digg)来满足两种共同的需求(即社交需求和信息需求),即与现实世界中的朋友保持联系,并获得他们感兴趣的信息。传统的社交媒体好友推荐方法主要关注用户的社交需求,而很少关注用户的信息需求(即推荐可以提供自己可能感兴趣但目前还无法获得的信息的朋友)。在本文中,我们提出根据信息效用推荐朋友,即朋友满足目标用户未满足的信息需求的程度,称为信息推荐。为了捕捉用户的信息需求,我们将社交媒体中的帖子视为一个项目,并利用协同过滤技术来预测每个帖子的评级。然后根据推荐的信息效用对候选朋友进行排名。此外,我们还展示了如何在这些建议中进一步考虑多样性。在基准数据集上的实验表明,在信息评价度量下,我们的方法可以显著优于传统的朋友推荐方法。
{"title":"Informational friend recommendation in social media","authors":"Shengxian Wan, Yanyan Lan, J. Guo, Chaosheng Fan, Xueqi Cheng","doi":"10.1145/2484028.2484179","DOIUrl":"https://doi.org/10.1145/2484028.2484179","url":null,"abstract":"It is well recognized that users rely on social media (e.g. Twitter or Digg) to fulfill two common needs (i.e. social need and informational need) that is to keep in touch with their friends in the real world and to have access to information they are interested in. Traditional friend recommendation methods in social media mainly focus on a user's social need, but seldom address their informational need (i.e. suggesting friends that can provide information one may be interested in but have not been able to obtain so far). In this paper, we propose to recommend friends according to the informational utility, which stands for the degree to which a friend satisfies the target user's unfulfilled informational need, called informational friend recommendation. In order to capture users' informational need, we view a post in social media as an item and utilize collaborative filtering techniques to predict the rating for each post. The candidate friends are then ranked according to their informational utility for recommendation. In addition, we also show how to further consider diversity in such recommendations. Experiments on benchmark datasets demonstrate that our approach can significantly outperform the traditional friend recommendation methods under informational evaluation measures.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131202271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Improving LDA topic models for microblogs via tweet pooling and automatic labeling 基于tweet池和自动标注的微博LDA主题模型改进
Rishabh Mehrotra, S. Sanner, Wray L. Buntine, Lexing Xie
Twitter, or the world of 140 characters poses serious challenges to the efficacy of topic models on short, messy text. While topic models such as Latent Dirichlet Allocation (LDA) have a long history of successful application to news articles and academic abstracts, they are often less coherent when applied to microblog content like Twitter. In this paper, we investigate methods to improve topics learned from Twitter content without modifying the basic machinery of LDA; we achieve this through various pooling schemes that aggregate tweets in a data preprocessing step for LDA. We empirically establish that a novel method of tweet pooling by hashtags leads to a vast improvement in a variety of measures for topic coherence across three diverse Twitter datasets in comparison to an unmodified LDA baseline and a variety of pooling schemes. An additional contribution of automatic hashtag labeling further improves on the hashtag pooling results for a subset of metrics. Overall, these two novel schemes lead to significantly improved LDA topic models on Twitter content.
Twitter,或140个字符的世界,对主题模型在短小杂乱的文本上的有效性提出了严峻的挑战。虽然潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)等主题模型在新闻文章和学术摘要的成功应用方面有着悠久的历史,但它们在应用于Twitter等微博内容时往往不那么连贯。在本文中,我们研究了在不修改LDA基本机制的情况下改进从Twitter内容中学习的主题的方法;我们通过各种池化方案来实现这一点,这些方案在LDA的数据预处理步骤中聚合tweet。我们通过经验证明,与未修改的LDA基线和各种池化方案相比,通过标签进行tweet池化的新方法可以在三个不同Twitter数据集的各种主题一致性度量方面取得巨大进步。自动标签标记的另一个贡献是进一步改进了指标子集的标签池结果。总的来说,这两种新方案显著改善了Twitter内容的LDA主题模型。
{"title":"Improving LDA topic models for microblogs via tweet pooling and automatic labeling","authors":"Rishabh Mehrotra, S. Sanner, Wray L. Buntine, Lexing Xie","doi":"10.1145/2484028.2484166","DOIUrl":"https://doi.org/10.1145/2484028.2484166","url":null,"abstract":"Twitter, or the world of 140 characters poses serious challenges to the efficacy of topic models on short, messy text. While topic models such as Latent Dirichlet Allocation (LDA) have a long history of successful application to news articles and academic abstracts, they are often less coherent when applied to microblog content like Twitter. In this paper, we investigate methods to improve topics learned from Twitter content without modifying the basic machinery of LDA; we achieve this through various pooling schemes that aggregate tweets in a data preprocessing step for LDA. We empirically establish that a novel method of tweet pooling by hashtags leads to a vast improvement in a variety of measures for topic coherence across three diverse Twitter datasets in comparison to an unmodified LDA baseline and a variety of pooling schemes. An additional contribution of automatic hashtag labeling further improves on the hashtag pooling results for a subset of metrics. Overall, these two novel schemes lead to significantly improved LDA topic models on Twitter content.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124888841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 474
期刊
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1