首页 > 最新文献

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval最新文献

英文 中文
SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation SIGIR 2013用户行为建模与信息检索评估研讨会
C. Clarke, Luanne Freund, Mark D. Smucker, Emine Yilmaz
The SIGIR 2013 Workshop on Modeling User Behavior for Information Retrieval Evaluation (MUBE 2013) brings together people to discuss existing and new approaches, ways to collaborate, and other ideas and issues involved in improving information retrieval evaluation through the modeling of user behavior.
SIGIR 2013年信息检索评估用户行为建模研讨会(MUBE 2013)将人们聚集在一起,讨论现有的和新的方法,协作的方式,以及通过用户行为建模来改进信息检索评估的其他想法和问题。
{"title":"SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation","authors":"C. Clarke, Luanne Freund, Mark D. Smucker, Emine Yilmaz","doi":"10.1145/2484028.2484222","DOIUrl":"https://doi.org/10.1145/2484028.2484222","url":null,"abstract":"The SIGIR 2013 Workshop on Modeling User Behavior for Information Retrieval Evaluation (MUBE 2013) brings together people to discuss existing and new approaches, ways to collaborate, and other ideas and issues involved in improving information retrieval evaluation through the modeling of user behavior.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133883871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Opportunity model for e-commerce recommendation: right product; right time 电商推荐的机会模型:合适的产品正确的时间
Jian Wang, Yi Zhang
Most of existing e-commerce recommender systems aim to recommend the right product to a user, based on whether the user is likely to purchase or like a product. On the other hand, the effectiveness of recommendations also depends on the time of the recommendation. Let us take a user who just purchased a laptop as an example. She may purchase a replacement battery in 2 years (assuming that the laptop's original battery often fails to work around that time) and purchase a new laptop in another 2 years. In this case, it is not a good idea to recommend a new laptop or a replacement battery right after the user purchased the new laptop. It could hurt the user's satisfaction of the recommender system if she receives a potentially right product recommendation at the wrong time. We argue that a system should not only recommend the most relevant item, but also recommend at the right time. This paper studies the new problem: how to recommend the right product at the right time? We adapt the proportional hazards modeling approach in survival analysis to the recommendation research field and propose a new opportunity model to explicitly incorporate time in an e-commerce recommender system. The new model estimates the joint probability of a user making a follow-up purchase of a particular product at a particular time. This joint purchase probability can be leveraged by recommender systems in various scenarios, including the zero-query pull-based recommendation scenario (e.g. recommendation on an e-commerce web site) and a proactive push-based promotion scenario (e.g. email or text message based marketing). We evaluate the opportunity modeling approach with multiple metrics. Experimental results on a data collected by a real-world e-commerce website(shop.com) show that it can predict a user's follow-up purchase behavior at a particular time with descent accuracy. In addition, the opportunity model significantly improves the conversion rate in pull-based systems and the user satisfaction/utility in push-based systems.
大多数现有的电子商务推荐系统的目标是根据用户是否可能购买或喜欢某种产品,向用户推荐合适的产品。另一方面,推荐的有效性也取决于推荐的时间。让我们以一个刚刚购买了笔记本电脑的用户为例。她可能会在两年内购买更换电池(假设笔记本电脑的原始电池经常无法正常工作),并在两年内购买新笔记本电脑。在这种情况下,在用户购买新笔记本电脑后立即建议更换新笔记本电脑或更换电池并不是一个好主意。如果用户在错误的时间收到了可能正确的产品推荐,可能会影响用户对推荐系统的满意度。我们认为,系统不仅应该推荐最相关的项目,而且应该在正确的时间进行推荐。本文研究的新问题是:如何在合适的时间推荐合适的产品?本文将生存分析中的比例风险建模方法应用于推荐研究领域,提出了一种新的机会模型来明确地将时间纳入电子商务推荐系统。新模型估计用户在特定时间购买特定产品的联合概率。推荐系统可以在各种场景中利用这种联合购买概率,包括基于零查询的拉式推荐场景(例如电子商务网站的推荐)和基于主动推送的推广场景(例如基于电子邮件或短信的营销)。我们用多个指标来评估机会建模方法。在实际电子商务网站(shop.com)收集的数据上进行的实验结果表明,该方法可以准确预测用户在特定时间的后续购买行为。此外,机会模型显著提高了基于拉的系统的转化率和基于推的系统的用户满意度/效用。
{"title":"Opportunity model for e-commerce recommendation: right product; right time","authors":"Jian Wang, Yi Zhang","doi":"10.1145/2484028.2484067","DOIUrl":"https://doi.org/10.1145/2484028.2484067","url":null,"abstract":"Most of existing e-commerce recommender systems aim to recommend the right product to a user, based on whether the user is likely to purchase or like a product. On the other hand, the effectiveness of recommendations also depends on the time of the recommendation. Let us take a user who just purchased a laptop as an example. She may purchase a replacement battery in 2 years (assuming that the laptop's original battery often fails to work around that time) and purchase a new laptop in another 2 years. In this case, it is not a good idea to recommend a new laptop or a replacement battery right after the user purchased the new laptop. It could hurt the user's satisfaction of the recommender system if she receives a potentially right product recommendation at the wrong time. We argue that a system should not only recommend the most relevant item, but also recommend at the right time. This paper studies the new problem: how to recommend the right product at the right time? We adapt the proportional hazards modeling approach in survival analysis to the recommendation research field and propose a new opportunity model to explicitly incorporate time in an e-commerce recommender system. The new model estimates the joint probability of a user making a follow-up purchase of a particular product at a particular time. This joint purchase probability can be leveraged by recommender systems in various scenarios, including the zero-query pull-based recommendation scenario (e.g. recommendation on an e-commerce web site) and a proactive push-based promotion scenario (e.g. email or text message based marketing). We evaluate the opportunity modeling approach with multiple metrics. Experimental results on a data collected by a real-world e-commerce website(shop.com) show that it can predict a user's follow-up purchase behavior at a particular time with descent accuracy. In addition, the opportunity model significantly improves the conversion rate in pull-based systems and the user satisfaction/utility in push-based systems.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133935672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 164
Fast document-at-a-time query processing using two-tier indexes 使用两层索引快速处理一次一个文档的查询
Cristian Rossi, E. Moura, A. Carvalho, A. D. Silva
In this paper we present two new algorithms designed to reduce the overall time required to process top-k queries. These algorithms are based on the document-at-a-time approach and modify the best baseline we found in the literature, Blockmax WAND (BMW), to take advantage of a two-tiered index, in which the first tier is a small index containing only the higher impact entries of each inverted list. This small index is used to pre-process the query before accessing a larger index in the second tier, resulting in considerable speeding up the whole process. The first algorithm we propose, named BMW-CS, achieves higher performance, but may result in small changes in the top results provided in the final ranking. The second algorithm, named BMW-t, preserves the top results and, while slower than BMW-CS, it is faster than BMW. In our experiments, BMW-CS was more than 40 times faster than BMW when computing top 10 results, and, while it does not guarantee preserving the top results, it preserved all ranking results evaluated at this level.
在本文中,我们提出了两种新的算法,旨在减少处理top-k查询所需的总时间。这些算法基于每次文件的方法,并修改了我们在文献中发现的最佳基线,Blockmax WAND (BMW),以利用双层索引,其中第一层是一个小索引,仅包含每个倒排表中影响较大的条目。这个小索引用于在访问第二层更大的索引之前对查询进行预处理,从而大大加快了整个过程。我们提出的第一个算法名为BMW-CS,它实现了更高的性能,但可能会导致最终排名中提供的顶级结果发生微小变化。第二种算法名为BMW-t,它保留了最前面的结果,虽然比BMW- cs慢,但比BMW快。在我们的实验中,宝马- cs在计算前10名结果时比宝马快40倍以上,虽然它不保证保留前10名结果,但它保留了在该级别评估的所有排名结果。
{"title":"Fast document-at-a-time query processing using two-tier indexes","authors":"Cristian Rossi, E. Moura, A. Carvalho, A. D. Silva","doi":"10.1145/2484028.2484085","DOIUrl":"https://doi.org/10.1145/2484028.2484085","url":null,"abstract":"In this paper we present two new algorithms designed to reduce the overall time required to process top-k queries. These algorithms are based on the document-at-a-time approach and modify the best baseline we found in the literature, Blockmax WAND (BMW), to take advantage of a two-tiered index, in which the first tier is a small index containing only the higher impact entries of each inverted list. This small index is used to pre-process the query before accessing a larger index in the second tier, resulting in considerable speeding up the whole process. The first algorithm we propose, named BMW-CS, achieves higher performance, but may result in small changes in the top results provided in the final ranking. The second algorithm, named BMW-t, preserves the top results and, while slower than BMW-CS, it is faster than BMW. In our experiments, BMW-CS was more than 40 times faster than BMW when computing top 10 results, and, while it does not guarantee preserving the top results, it preserved all ranking results evaluated at this level.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134325487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Multimedia recommendation: technology and techniques 多媒体推荐:技术与技巧
Jialie Shen, Meng Wang, Shuicheng Yan, Peng Cui
In recent years, we have witnessed a rapid growth in the availability of digital multimedia on various application platforms and domains. Consequently, the problem of information overload has become more and more serious. In order to tackle the challenge, various multimedia recommendation technologies have been developed by different research communities (e.g., multimedia systems, information retrieval, machine learning and computer version). Meanwhile, many commercial web systems (e.g., Flick, YouTube, and Last.fm) have successfully applied recommendation techniques to provide users personalized content and services in a convenient and flexible way. When looking back, the information retrieval (IR) community has a long history of studying and contributing recommender system design and related issues. It has been proven that the recommender systems can effectively assist users in handling information overload and provide high-quality personalization. While several courses were dedicated to multimedia retrieval in the recent decade, to the best of our knowledge, the tutorial is the first one specifically focusing on multimedia recommender systems and their applications on various domains and media contents. We plan to summarize the research along this direction and provide an impetus for further research on this important topic
近年来,我们看到数字多媒体在各种应用平台和领域的可用性迅速增长。因此,信息超载的问题变得越来越严重。为了应对这一挑战,不同的研究团体开发了各种多媒体推荐技术(如多媒体系统、信息检索、机器学习和计算机版本)。与此同时,许多商业网站系统(如Flick、YouTube、Last.fm)已经成功地应用了推荐技术,以方便灵活的方式为用户提供个性化的内容和服务。回顾过去,信息检索界对推荐系统设计及相关问题的研究和贡献由来已久。实践证明,推荐系统可以有效地帮助用户处理信息过载,提供高质量的个性化服务。虽然近十年来有几门课程致力于多媒体检索,但据我们所知,本教程是第一个专门关注多媒体推荐系统及其在各种领域和媒体内容上的应用的教程。我们计划沿着这一方向总结研究,为这一重要课题的进一步研究提供动力
{"title":"Multimedia recommendation: technology and techniques","authors":"Jialie Shen, Meng Wang, Shuicheng Yan, Peng Cui","doi":"10.1145/2484028.2484194","DOIUrl":"https://doi.org/10.1145/2484028.2484194","url":null,"abstract":"In recent years, we have witnessed a rapid growth in the availability of digital multimedia on various application platforms and domains. Consequently, the problem of information overload has become more and more serious. In order to tackle the challenge, various multimedia recommendation technologies have been developed by different research communities (e.g., multimedia systems, information retrieval, machine learning and computer version). Meanwhile, many commercial web systems (e.g., Flick, YouTube, and Last.fm) have successfully applied recommendation techniques to provide users personalized content and services in a convenient and flexible way. When looking back, the information retrieval (IR) community has a long history of studying and contributing recommender system design and related issues. It has been proven that the recommender systems can effectively assist users in handling information overload and provide high-quality personalization. While several courses were dedicated to multimedia retrieval in the recent decade, to the best of our knowledge, the tutorial is the first one specifically focusing on multimedia recommender systems and their applications on various domains and media contents. We plan to summarize the research along this direction and provide an impetus for further research on this important topic","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131703322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Information seeking in digital cultural heritage with PATHS 利用PATHS在数字文化遗产中寻找信息
M. Hall, Paul D. Clough, Samuel Fernando, Paula Goodale, Mark Stevenson, Eneko Agirre, Arantxa Otegi, Aitor Soroa Etxabe, K. Fernie, Jillian R. Griffiths, Runar Bergheim
Current Information Retrieval systems for digital cultural heritage support only the actual search aspect of the information seeking process. This demonstration presents the second PATHS system which provides the exploration, analysis, and sense-making features to support the full information seeking process.
目前的数字文化遗产信息检索系统仅支持信息查找过程的实际检索方面。本演示展示了第二个PATHS系统,该系统提供了探索、分析和意义构建功能,以支持完整的信息搜索过程。
{"title":"Information seeking in digital cultural heritage with PATHS","authors":"M. Hall, Paul D. Clough, Samuel Fernando, Paula Goodale, Mark Stevenson, Eneko Agirre, Arantxa Otegi, Aitor Soroa Etxabe, K. Fernie, Jillian R. Griffiths, Runar Bergheim","doi":"10.1145/2484028.2484210","DOIUrl":"https://doi.org/10.1145/2484028.2484210","url":null,"abstract":"Current Information Retrieval systems for digital cultural heritage support only the actual search aspect of the information seeking process. This demonstration presents the second PATHS system which provides the exploration, analysis, and sense-making features to support the full information seeking process.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126592005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Beliefs and biases in web search 网络搜索中的信念和偏见
Ryen W. White
People's beliefs, and unconscious biases that arise from those beliefs, influence their judgment, decision making, and actions, as is commonly accepted among psychologists. Biases can be observed in information retrieval in situations where searchers seek or are presented with information that significantly deviates from the truth. There is little understanding of the impact of such biases in search. In this paper we study search-related biases via multiple probes: an exploratory retrospective survey, human labeling of the captions and results returned by a Web search engine, and a large-scale log analysis of search behavior on that engine. Targeting yes-no questions in the critical domain of health search, we show that Web searchers exhibit their own biases and are also subject to bias from the search engine. We clearly observe searchers favoring positive information over negative and more than expected given base rates based on consensus answers from physicians. We also show that search engines strongly favor a particular, usually positive, perspective, irrespective of the truth. Importantly, we show that these biases can be counterproductive and affect search outcomes; in our study, around half of the answers that searchers settled on were actually incorrect. Our findings have implications for search engine design, including the development of ranking algorithms that con-sider the desire to satisfy searchers (by validating their beliefs) and providing accurate answers and properly considering base rates. Incorporating likelihood information into search is particularly important for consequential tasks, such as those with a medical focus.
心理学家普遍认为,人们的信念以及由这些信念产生的无意识偏见会影响他们的判断、决策和行动。在信息检索中,当搜索者寻找或呈现的信息明显偏离事实时,可以观察到偏见。人们对这种偏见在搜索中的影响知之甚少。在本文中,我们通过多个探针来研究与搜索相关的偏差:探索性回顾性调查,对Web搜索引擎返回的标题和结果进行人工标记,以及对该引擎上的搜索行为进行大规模日志分析。针对健康搜索的关键领域中的是-否问题,我们表明网络搜索者表现出他们自己的偏见,也受到搜索引擎的偏见的影响。我们清楚地观察到搜索者倾向于积极的信息而不是消极的,并且超出了基于医生共识答案的基本比率的预期。我们还表明,搜索引擎强烈倾向于一个特定的,通常是积极的观点,而不管事实如何。重要的是,我们表明这些偏见可能会适得其反,影响搜索结果;在我们的研究中,搜索者确定的答案中大约有一半实际上是不正确的。我们的发现对搜索引擎的设计有启示,包括考虑满足搜索者的愿望(通过验证他们的信念)、提供准确的答案和适当考虑基本率的排名算法的发展。将可能性信息整合到搜索中对于后续任务(例如以医学为重点的任务)尤为重要。
{"title":"Beliefs and biases in web search","authors":"Ryen W. White","doi":"10.1145/2484028.2484053","DOIUrl":"https://doi.org/10.1145/2484028.2484053","url":null,"abstract":"People's beliefs, and unconscious biases that arise from those beliefs, influence their judgment, decision making, and actions, as is commonly accepted among psychologists. Biases can be observed in information retrieval in situations where searchers seek or are presented with information that significantly deviates from the truth. There is little understanding of the impact of such biases in search. In this paper we study search-related biases via multiple probes: an exploratory retrospective survey, human labeling of the captions and results returned by a Web search engine, and a large-scale log analysis of search behavior on that engine. Targeting yes-no questions in the critical domain of health search, we show that Web searchers exhibit their own biases and are also subject to bias from the search engine. We clearly observe searchers favoring positive information over negative and more than expected given base rates based on consensus answers from physicians. We also show that search engines strongly favor a particular, usually positive, perspective, irrespective of the truth. Importantly, we show that these biases can be counterproductive and affect search outcomes; in our study, around half of the answers that searchers settled on were actually incorrect. Our findings have implications for search engine design, including the development of ranking algorithms that con-sider the desire to satisfy searchers (by validating their beliefs) and providing accurate answers and properly considering base rates. Incorporating likelihood information into search is particularly important for consequential tasks, such as those with a medical focus.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131018208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 166
Semantic hashing using tags and topic modeling 使用标签和主题建模的语义散列
Qifan Wang, Dan Zhang, Luo Si
It is an important research problem to design efficient and effective solutions for large scale similarity search. One popular strategy is to represent data examples as compact binary codes through semantic hashing, which has produced promising results with fast search speed and low storage cost. Many existing semantic hashing methods generate binary codes for documents by modeling document relationships based on similarity in a keyword feature space. Two major limitations in existing methods are: (1) Tag information is often associated with documents in many real world applications, but has not been fully exploited yet; (2) The similarity in keyword feature space does not fully reflect semantic relationships that go beyond keyword matching. This paper proposes a novel hashing approach, Semantic Hashing using Tags and Topic Modeling (SHTTM), to incorporate both the tag information and the similarity information from probabilistic topic modeling. In particular, a unified framework is designed for ensuring hashing codes to be consistent with tag information by a formal latent factor model and preserving the document topic/semantic similarity that goes beyond keyword matching. An iterative coordinate descent procedure is proposed for learning the optimal hashing codes. An extensive set of empirical studies on four different datasets has been conducted to demonstrate the advantages of the proposed SHTTM approach against several other state-of-the-art semantic hashing techniques. Furthermore, experimental results indicate that the modeling of tag information and utilizing topic modeling are beneficial for improving the effectiveness of hashing separately, while the combination of these two techniques in the unified framework obtains even better results.
为大规模相似搜索设计高效的解决方案是一个重要的研究问题。一种流行的策略是通过语义散列将数据示例表示为紧凑的二进制代码,这种方法以快速的搜索速度和低存储成本产生了有希望的结果。许多现有的语义哈希方法是通过在关键字特征空间中基于相似性对文档关系建模来生成文档的二进制代码。现有方法的两个主要限制是:(1)标签信息在许多实际应用中经常与文档相关联,但尚未得到充分利用;(2)关键词特征空间的相似度没有完全反映出关键词匹配之外的语义关系。本文提出了一种新的哈希方法,即使用标签和主题建模的语义哈希方法(SHTTM),该方法将标签信息和概率主题建模的相似度信息结合起来。特别是设计了统一的框架,通过形式化的潜在因素模型确保散列代码与标签信息一致,并保持文档主题/语义相似性,超越关键字匹配。提出了一种迭代坐标下降法来学习最优哈希码。针对四种不同的数据集进行了广泛的实证研究,以证明所提出的SHTTM方法相对于其他几种最先进的语义哈希技术的优势。实验结果表明,分别对标签信息进行建模和利用主题建模有利于提高哈希算法的有效性,而将这两种技术在统一的框架下结合使用可以获得更好的效果。
{"title":"Semantic hashing using tags and topic modeling","authors":"Qifan Wang, Dan Zhang, Luo Si","doi":"10.1145/2484028.2484037","DOIUrl":"https://doi.org/10.1145/2484028.2484037","url":null,"abstract":"It is an important research problem to design efficient and effective solutions for large scale similarity search. One popular strategy is to represent data examples as compact binary codes through semantic hashing, which has produced promising results with fast search speed and low storage cost. Many existing semantic hashing methods generate binary codes for documents by modeling document relationships based on similarity in a keyword feature space. Two major limitations in existing methods are: (1) Tag information is often associated with documents in many real world applications, but has not been fully exploited yet; (2) The similarity in keyword feature space does not fully reflect semantic relationships that go beyond keyword matching. This paper proposes a novel hashing approach, Semantic Hashing using Tags and Topic Modeling (SHTTM), to incorporate both the tag information and the similarity information from probabilistic topic modeling. In particular, a unified framework is designed for ensuring hashing codes to be consistent with tag information by a formal latent factor model and preserving the document topic/semantic similarity that goes beyond keyword matching. An iterative coordinate descent procedure is proposed for learning the optimal hashing codes. An extensive set of empirical studies on four different datasets has been conducted to demonstrate the advantages of the proposed SHTTM approach against several other state-of-the-art semantic hashing techniques. Furthermore, experimental results indicate that the modeling of tag information and utilizing topic modeling are beneficial for improving the effectiveness of hashing separately, while the combination of these two techniques in the unified framework obtains even better results.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133615799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Toward self-correcting search engines: using underperforming queries to improve search 走向自我纠正搜索引擎:使用表现不佳的查询来改进搜索
Ahmed Hassan Awadallah, Ryen W. White, Yi-Min Wang
Search engines receive queries with a broad range of different search intents. However, they do not perform equally well for all queries. Understanding where search engines perform poorly is critical for improving their performance. In this paper, we present a method for automatically identifying poorly-performing query groups where a search engine may not meet searcher needs. This allows us to create coherent query clusters that help system design-ers generate actionable insights about necessary changes and helps learning-to-rank algorithms better learn relevance signals via spe-cialized rankers. The result is a framework capable of estimating dissatisfaction from Web search logs and learning to improve per-formance for dissatisfied queries. Through experimentation, we show that our method yields good quality groups that align with established retrieval performance metrics. We also show that we can significantly improve retrieval effectiveness via specialized rankers, and that coherent grouping of underperforming queries generated by our method is important in improving each group.
搜索引擎接收具有广泛不同搜索意图的查询。然而,它们并不是对所有查询都表现得同样好。了解搜索引擎在哪些方面表现不佳对于提高其性能至关重要。在本文中,我们提出了一种自动识别性能较差的查询组的方法,其中搜索引擎可能无法满足搜索者的需求。这使我们能够创建连贯的查询集群,帮助系统设计人员生成有关必要更改的可操作见解,并帮助学习排序算法通过专门的排序器更好地学习相关信号。其结果是一个框架能够从Web搜索日志中估计不满意程度,并学习如何提高不满意查询的性能。通过实验,我们表明我们的方法产生了与已建立的检索性能指标一致的高质量组。我们还表明,我们可以通过专门的排名器显著提高检索效率,并且我们的方法生成的表现不佳的查询的连贯分组对于改进每个组都很重要。
{"title":"Toward self-correcting search engines: using underperforming queries to improve search","authors":"Ahmed Hassan Awadallah, Ryen W. White, Yi-Min Wang","doi":"10.1145/2484028.2484043","DOIUrl":"https://doi.org/10.1145/2484028.2484043","url":null,"abstract":"Search engines receive queries with a broad range of different search intents. However, they do not perform equally well for all queries. Understanding where search engines perform poorly is critical for improving their performance. In this paper, we present a method for automatically identifying poorly-performing query groups where a search engine may not meet searcher needs. This allows us to create coherent query clusters that help system design-ers generate actionable insights about necessary changes and helps learning-to-rank algorithms better learn relevance signals via spe-cialized rankers. The result is a framework capable of estimating dissatisfaction from Web search logs and learning to improve per-formance for dissatisfied queries. Through experimentation, we show that our method yields good quality groups that align with established retrieval performance metrics. We also show that we can significantly improve retrieval effectiveness via specialized rankers, and that coherent grouping of underperforming queries generated by our method is important in improving each group.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115527562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
IRWR: incremental random walk with restart IRWR:带重启的增量随机漫步
Weiren Yu, Xuemin Lin
Random Walk with Restart (RWR) has become an appealing measure of node proximities in emerging applications eg recommender systems and automatic image captioning. In practice, a real graph is typically large, and is frequently updated with small changes. It is often cost-inhibitive to recompute proximities from scratch via emph{batch} algorithms when the graph is updated. This paper focuses on the incremental computations of RWR in a dynamic graph, whose edges often change over time. The prior attempt of RWR [1] deploys kdash to find top-$k$ highest proximity nodes for a given query, which involves a strategy to incrementally emph{estimate} upper proximity bounds. However, due to its aim to prune needless calculation, such an incremental strategy is emph{approximate}: in $O(1)$ time for each node. The main contribution of this paper is to devise an emph{exact} and fast incremental algorithm of RWR for edge updates. Our solution, IRWR!, can incrementally compute any node proximity in $O(1)$ time for each edge update without loss of exactness. The empirical evaluations show the high efficiency and exactness of IRWR for computing proximities on dynamic networks against its batch counterparts.
随机行走与重启(RWR)已经成为新兴应用中节点接近度的一个有吸引力的度量eg推荐系统和自动图像字幕。在实践中,一个真正的图通常很大,并且经常更新一些小的变化。当图更新时,通过emph{批处理}算法从头开始重新计算接近度通常是成本抑制的。本文主要研究动态图中边随时间变化的RWR的增量计算。RWR[1]的先前尝试部署kdash来查找给定查询的top- $k$最高接近节点,这涉及到增量emph{估计}接近上限的策略。然而,由于其目的是减少不必要的计算,这种增量策略是emph{近似}的:在$O(1)$时间为每个节点。本文的主要贡献是设计了一种emph{精确}、快速的RWR增量边缘更新算法。我们的解决方案,IRWR !,可以在不损失准确性的情况下,在$O(1)$时间内增量地计算每次边缘更新的任何节点接近度。经验评价表明,相对于批量网络,IRWR在动态网络上计算接近度的效率和准确性较高。
{"title":"IRWR: incremental random walk with restart","authors":"Weiren Yu, Xuemin Lin","doi":"10.1145/2484028.2484114","DOIUrl":"https://doi.org/10.1145/2484028.2484114","url":null,"abstract":"Random Walk with Restart (RWR) has become an appealing measure of node proximities in emerging applications eg recommender systems and automatic image captioning. In practice, a real graph is typically large, and is frequently updated with small changes. It is often cost-inhibitive to recompute proximities from scratch via emph{batch} algorithms when the graph is updated. This paper focuses on the incremental computations of RWR in a dynamic graph, whose edges often change over time. The prior attempt of RWR [1] deploys kdash to find top-$k$ highest proximity nodes for a given query, which involves a strategy to incrementally emph{estimate} upper proximity bounds. However, due to its aim to prune needless calculation, such an incremental strategy is emph{approximate}: in $O(1)$ time for each node. The main contribution of this paper is to devise an emph{exact} and fast incremental algorithm of RWR for edge updates. Our solution, IRWR!, can incrementally compute any node proximity in $O(1)$ time for each edge update without loss of exactness. The empirical evaluations show the high efficiency and exactness of IRWR for computing proximities on dynamic networks against its batch counterparts.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115042001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Task differentiation for personal search evaluation 个人搜索评估的任务区分
S. S. Sadeghi
{"title":"Task differentiation for personal search evaluation","authors":"S. S. Sadeghi","doi":"10.1145/2484028.2484236","DOIUrl":"https://doi.org/10.1145/2484028.2484236","url":null,"abstract":"","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117059880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1