首页 > 最新文献

Proceedings of the 22nd ACM international conference on Information & Knowledge Management最新文献

英文 中文
Supporting exploratory people search: a study of factor transparency and user control 支持探索性人员搜索:因素透明度和用户控制的研究
Shuguang Han, Daqing He, Jiepu Jiang, Zhen Yue
People search is an active research topic in recent years. Related works includes expert finding, collaborator recommendation, link prediction and social matching. However, the diverse objectives and exploratory nature of those tasks make it difficult to develop a flexible method for people search that works for every task. In this project, we developed PeopleExplorer, an interactive people search system to support exploratory search tasks when looking for people. In the system, users could specify their task objectives by selecting and adjusting key criteria. Three criteria were considered: the content relevance, the candidate authoritativeness and the social similarity between the user and the candidates. This project represents a first attempt to add transparency to exploratory people search, and to give users full control over the search process. The system was evaluated through an experiment with 24 participants undertaking four different tasks. The results show that with comparable time and effort, users of our system performed significantly better in their people search tasks than those using the baseline system. Users of our system also exhibited many unique behaviors in query reformulation and candidate selection. We found that users' general perceptions about three criteria varied during different tasks, which confirms our assumptions regarding modeling task difference and user variance in people search systems.
人物搜索是近年来一个活跃的研究课题。相关工作包括专家寻找、合作者推荐、链接预测和社会匹配。然而,这些任务的不同目标和探索性使得很难开发一种适用于每个任务的灵活的人员搜索方法。在这个项目中,我们开发了PeopleExplorer,这是一个交互式的人物搜索系统,在寻找人物时支持探索性搜索任务。在系统中,用户可以通过选择和调整关键标准来指定自己的任务目标。考虑了三个标准:内容相关性,候选人权威性和用户与候选人之间的社会相似性。该项目首次尝试为探索性人员搜索增加透明度,并让用户完全控制搜索过程。通过对24名参与者进行四项不同任务的实验,该系统得到了评估。结果表明,在相当的时间和精力下,我们系统的用户在他们的人员搜索任务中表现得比使用基线系统的用户要好得多。系统用户在查询重构和候选项选择上也表现出了许多独特的行为。我们发现,在不同的任务中,用户对三个标准的总体看法是不同的,这证实了我们关于在人物搜索系统中建模任务差异和用户差异的假设。
{"title":"Supporting exploratory people search: a study of factor transparency and user control","authors":"Shuguang Han, Daqing He, Jiepu Jiang, Zhen Yue","doi":"10.1145/2505515.2505684","DOIUrl":"https://doi.org/10.1145/2505515.2505684","url":null,"abstract":"People search is an active research topic in recent years. Related works includes expert finding, collaborator recommendation, link prediction and social matching. However, the diverse objectives and exploratory nature of those tasks make it difficult to develop a flexible method for people search that works for every task. In this project, we developed PeopleExplorer, an interactive people search system to support exploratory search tasks when looking for people. In the system, users could specify their task objectives by selecting and adjusting key criteria. Three criteria were considered: the content relevance, the candidate authoritativeness and the social similarity between the user and the candidates. This project represents a first attempt to add transparency to exploratory people search, and to give users full control over the search process. The system was evaluated through an experiment with 24 participants undertaking four different tasks. The results show that with comparable time and effort, users of our system performed significantly better in their people search tasks than those using the baseline system. Users of our system also exhibited many unique behaviors in query reformulation and candidate selection. We found that users' general perceptions about three criteria varied during different tasks, which confirms our assumptions regarding modeling task difference and user variance in people search systems.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"2012 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86356299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
User intent and assessor disagreement in web search evaluation 网络搜索评价中的用户意图与评估者分歧
G. Kazai, Emine Yilmaz, Nick Craswell, S. Tahaghoghi
Preference based methods for collecting relevance data for information retrieval (IR) evaluation have been shown to lead to better inter-assessor agreement than the traditional method of judging individual documents. However, little is known as to why preference judging reduces assessor disagreement and whether better agreement among assessors also means better agreement with user satisfaction, as signaled by user clicks. In this paper, we examine the relationship between assessor disagreement and various click based measures, such as click preference strength and user intent similarity, for judgments collected from editorial judges and crowd workers using single absolute, pairwise absolute and pairwise preference based judging methods. We find that trained judges are significantly more likely to agree with each other and with users than crowd workers, but inter-assessor agreement does not mean agreement with users. Switching to a pairwise judging mode improves crowdsourcing quality close to that of trained judges. We also find a relationship between intent similarity and assessor-user agreement, where the nature of the relationship changes across judging modes. Overall, our findings suggest that the awareness of different possible intents, enabled by pairwise judging, is a key reason of the improved agreement, and a crucial requirement when crowdsourcing relevance data.
基于偏好的收集信息检索(IR)评价相关数据的方法已被证明比传统的判断单个文档的方法能导致更好的评估者之间的一致。然而,很少有人知道为什么偏好判断减少了评估者的分歧,以及评估者之间更好的一致是否也意味着更好地符合用户满意度,正如用户点击所表明的那样。在本文中,我们研究了评估者分歧与各种基于点击的度量之间的关系,如点击偏好强度和用户意图相似度,使用单绝对、两两绝对和基于两两偏好的判断方法,从编辑评委和人群工作者那里收集判断。我们发现,训练有素的法官与群体工作者相比,更有可能相互认同,也更有可能与用户达成一致,但评估者之间的一致并不意味着与用户达成一致。切换到成对评判模式,众包的质量接近训练有素的评委。我们还发现了意图相似性和评估者-用户协议之间的关系,其中关系的性质在判断模式中发生了变化。总体而言,我们的研究结果表明,通过两两判断,对不同可能意图的意识是提高一致性的关键原因,也是众包相关数据时的关键要求。
{"title":"User intent and assessor disagreement in web search evaluation","authors":"G. Kazai, Emine Yilmaz, Nick Craswell, S. Tahaghoghi","doi":"10.1145/2505515.2505716","DOIUrl":"https://doi.org/10.1145/2505515.2505716","url":null,"abstract":"Preference based methods for collecting relevance data for information retrieval (IR) evaluation have been shown to lead to better inter-assessor agreement than the traditional method of judging individual documents. However, little is known as to why preference judging reduces assessor disagreement and whether better agreement among assessors also means better agreement with user satisfaction, as signaled by user clicks. In this paper, we examine the relationship between assessor disagreement and various click based measures, such as click preference strength and user intent similarity, for judgments collected from editorial judges and crowd workers using single absolute, pairwise absolute and pairwise preference based judging methods. We find that trained judges are significantly more likely to agree with each other and with users than crowd workers, but inter-assessor agreement does not mean agreement with users. Switching to a pairwise judging mode improves crowdsourcing quality close to that of trained judges. We also find a relationship between intent similarity and assessor-user agreement, where the nature of the relationship changes across judging modes. Overall, our findings suggest that the awareness of different possible intents, enabled by pairwise judging, is a key reason of the improved agreement, and a crucial requirement when crowdsourcing relevance data.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82123574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Interest mining from user tweets 从用户tweets中挖掘兴趣
Thuy Vu, V. Perez
We build a system to extract user interests from Twitter messages. Specifically, we extract interest candidates using linguistic patterns and rank them using four different keyphrase ranking techniques: TFIDF, TextRank, LDA-TextRank, and Relevance-Interestingness-Rank (RI-Rank). We also explore the complementary relation between TFIDF and TextRank in ranking interest candidates. Top ranked interests are evaluated with user feedback gathered from an online survey. The results show that TFIDF and TextRank are both suitable for extracting user interests from tweets. Moreover, the combination of TFIDF and TextRank consistently yields the highest user positive feedback.
我们建立了一个从Twitter消息中提取用户兴趣的系统。具体来说,我们使用语言模式提取兴趣候选项,并使用四种不同的关键词排名技术对它们进行排名:TFIDF、TextRank、LDA-TextRank和relevance - interestiness - rank (RI-Rank)。我们还探讨了TFIDF和TextRank在兴趣候选人排名中的互补关系。排名靠前的兴趣是通过在线调查收集到的用户反馈来评估的。结果表明,TFIDF和TextRank都适合于从tweets中提取用户兴趣。此外,TFIDF和TextRank的结合始终产生最高的用户积极反馈。
{"title":"Interest mining from user tweets","authors":"Thuy Vu, V. Perez","doi":"10.1145/2505515.2507883","DOIUrl":"https://doi.org/10.1145/2505515.2507883","url":null,"abstract":"We build a system to extract user interests from Twitter messages. Specifically, we extract interest candidates using linguistic patterns and rank them using four different keyphrase ranking techniques: TFIDF, TextRank, LDA-TextRank, and Relevance-Interestingness-Rank (RI-Rank). We also explore the complementary relation between TFIDF and TextRank in ranking interest candidates. Top ranked interests are evaluated with user feedback gathered from an online survey. The results show that TFIDF and TextRank are both suitable for extracting user interests from tweets. Moreover, the combination of TFIDF and TextRank consistently yields the highest user positive feedback.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82166977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Intelligent SSD: a turbo for big data mining 智能SSD:大数据挖掘的涡轮增压
Duck-Ho Bae, Jin-Hyung Kim, Yong-Yeon Jo, Sang-Wook Kim, Hyun-Kyo Oh, Chanik Park
This paper introduces the notion of intelligent SSDs. First, we present the design considerations of intelligent SSDs, and then examine their potential benefits under various settings in data mining applications.
本文介绍了智能固态硬盘的概念。首先,我们提出了智能ssd的设计考虑因素,然后研究了它们在数据挖掘应用中各种设置下的潜在优势。
{"title":"Intelligent SSD: a turbo for big data mining","authors":"Duck-Ho Bae, Jin-Hyung Kim, Yong-Yeon Jo, Sang-Wook Kim, Hyun-Kyo Oh, Chanik Park","doi":"10.1145/2505515.2507847","DOIUrl":"https://doi.org/10.1145/2505515.2507847","url":null,"abstract":"This paper introduces the notion of intelligent SSDs. First, we present the design considerations of intelligent SSDs, and then examine their potential benefits under various settings in data mining applications.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81370819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Mining entity attribute synonyms via compact clustering 通过紧凑聚类挖掘实体属性同义词
Yanen Li, B. Hsu, ChengXiang Zhai, Kuansan Wang
Entity attribute values, such as "lord of the rings" for movie.title or "infant" for shoe.gender, are atomic components of entity expressions. Discovering alternative surface forms of attribute values is important for improving entity recognition and retrieval. In this work, we propose a novel compact clustering framework to jointly identify synonyms for a set of attribute values. The framework can integrate signals from multiple information sources into a similarity function between attribute values. And the weights of these signals are optimized in an unsupervised manner. Extensive experiments across multiple domains demonstrate the effectiveness of our clustering framework for mining entity attribute synonyms.
实体属性值,例如“指环王”表示电影。鞋的标题或“婴儿”。性别是实体表达式的原子组成部分。发现属性值的替代表面形式对于改进实体识别和检索非常重要。在这项工作中,我们提出了一种新的紧凑聚类框架来共同识别一组属性值的同义词。该框架可以将来自多个信息源的信号整合成属性值之间的相似函数。并以无监督的方式对这些信号的权重进行优化。跨多个领域的大量实验证明了我们的聚类框架在挖掘实体属性同义词方面的有效性。
{"title":"Mining entity attribute synonyms via compact clustering","authors":"Yanen Li, B. Hsu, ChengXiang Zhai, Kuansan Wang","doi":"10.1145/2505515.2505608","DOIUrl":"https://doi.org/10.1145/2505515.2505608","url":null,"abstract":"Entity attribute values, such as \"lord of the rings\" for movie.title or \"infant\" for shoe.gender, are atomic components of entity expressions. Discovering alternative surface forms of attribute values is important for improving entity recognition and retrieval. In this work, we propose a novel compact clustering framework to jointly identify synonyms for a set of attribute values. The framework can integrate signals from multiple information sources into a similarity function between attribute values. And the weights of these signals are optimized in an unsupervised manner. Extensive experiments across multiple domains demonstrate the effectiveness of our clustering framework for mining entity attribute synonyms.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81563841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
TellMyRelevance!: predicting the relevance of web search results from cursor interactions TellMyRelevance !:根据游标交互预测网络搜索结果的相关性
Maximilian Speicher, A. Both, M. Gaedke
It is crucial for the success of a search-driven web application to answer users' queries in the best possible way. A common approach is to use click models for guessing the relevance of search results. However, these models are imprecise and waive valuable information one can gain from non-click user interactions. We introduce TellMyRelevance!---a novel automatic end-to-end pipeline for tracking cursor interactions at the client, analyzing these and learning according relevance models. Yet, the models depend on the layout of the search results page involved, which makes them difficult to evaluate and compare. Thus, we use a Random Mouse Cursor as an extension to our pipeline for generating layout-dependent baselines. Based on these, we can perform evaluations of real-world relevance models. A large-scale interaction log analysis showed that we can learn relevance models whose predictions compare favorably to predictions of an existing state-of-the-art click model.
以最好的方式回答用户的查询对于搜索驱动的web应用程序的成功是至关重要的。一种常见的方法是使用点击模型来猜测搜索结果的相关性。然而,这些模型是不精确的,并且放弃了可以从非点击用户交互中获得的有价值的信息。我们推出TellMyRelevance!-一种新颖的自动端到端管道,用于跟踪客户端的光标交互,分析这些并根据相关模型进行学习。然而,这些模型依赖于所涉及的搜索结果页面的布局,这使得它们难以评估和比较。因此,我们使用随机鼠标光标作为生成依赖于布局的基线的管道的扩展。基于这些,我们可以对现实世界的相关模型进行评估。大规模的交互日志分析表明,我们可以学习相关模型,其预测比现有的最先进的点击模型的预测更有利。
{"title":"TellMyRelevance!: predicting the relevance of web search results from cursor interactions","authors":"Maximilian Speicher, A. Both, M. Gaedke","doi":"10.1145/2505515.2505703","DOIUrl":"https://doi.org/10.1145/2505515.2505703","url":null,"abstract":"It is crucial for the success of a search-driven web application to answer users' queries in the best possible way. A common approach is to use click models for guessing the relevance of search results. However, these models are imprecise and waive valuable information one can gain from non-click user interactions. We introduce TellMyRelevance!---a novel automatic end-to-end pipeline for tracking cursor interactions at the client, analyzing these and learning according relevance models. Yet, the models depend on the layout of the search results page involved, which makes them difficult to evaluate and compare. Thus, we use a Random Mouse Cursor as an extension to our pipeline for generating layout-dependent baselines. Based on these, we can perform evaluations of real-world relevance models. A large-scale interaction log analysis showed that we can learn relevance models whose predictions compare favorably to predictions of an existing state-of-the-art click model.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"82 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83934151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Automated probabilistic modeling for relational data 关系数据的自动概率建模
Sameer Singh, T. Graepel
Probabilistic graphical model representations of relational data provide a number of desired features, such as inference of missing values, detection of errors, visualization of data, and probabilistic answers to relational queries. However, adoption has been slow due to the high level of expertise expected both in probability and in the domain from the user. Instead of requiring a domain expert to specify the probabilistic dependencies of the data, we present an approach that uses the relational DB schema to automatically construct a Bayesian graphical model for a database. This resulting model contains customized distributions for the attributes, latent variables that cluster the records, and factors that reflect and represent the foreign key links, whilst allowing efficient inference. Experiments demonstrate the accuracy of the model and scalability of inference on synthetic and real-world data.
关系数据的概率图形模型表示提供了许多所需的特性,例如缺失值的推断、错误检测、数据的可视化以及对关系查询的概率性回答。然而,由于对用户在概率和领域方面的高水平专业知识的期望,采用速度很慢。我们提出了一种方法,该方法使用关系数据库模式自动为数据库构建贝叶斯图形模型,而不是要求领域专家指定数据的概率依赖性。这个结果模型包含属性的自定义分布、聚集记录的潜在变量以及反映和表示外键链接的因素,同时允许有效的推断。实验证明了该模型在综合数据和实际数据上的准确性和可扩展性。
{"title":"Automated probabilistic modeling for relational data","authors":"Sameer Singh, T. Graepel","doi":"10.1145/2505515.2507828","DOIUrl":"https://doi.org/10.1145/2505515.2507828","url":null,"abstract":"Probabilistic graphical model representations of relational data provide a number of desired features, such as inference of missing values, detection of errors, visualization of data, and probabilistic answers to relational queries. However, adoption has been slow due to the high level of expertise expected both in probability and in the domain from the user. Instead of requiring a domain expert to specify the probabilistic dependencies of the data, we present an approach that uses the relational DB schema to automatically construct a Bayesian graphical model for a database. This resulting model contains customized distributions for the attributes, latent variables that cluster the records, and factors that reflect and represent the foreign key links, whilst allowing efficient inference. Experiments demonstrate the accuracy of the model and scalability of inference on synthetic and real-world data.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84389983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
An empirical study of top-n recommendation for venture finance 风险融资top-n推荐的实证研究
T. Stone, Weinan Zhang, Xiaoxue Zhao
This paper concerns the task of top-N investment opportunity recommendation in the domain of venture finance. By venture finance, specifically, we are interested in the investment activity of venture capital (VC) firms and their investment partners. We have access to a dataset of recorded venture financings (i.e., investments) by VCs and their investment partners in private US companies. This research was undertaken in partnership with Correlation Ventures, a venture capital firm who are pioneering the use of predictive analytics in order to better inform investment decision making. This paper undertakes a detailed empirical study and data analysis then demonstrates the efficacy of recommender systems in this novel application domain.
本文研究了风险投资领域的top-N投资机会推荐问题。具体来说,我们对风险投资(VC)公司及其投资伙伴的投资活动感兴趣。我们有一个记录了风投及其投资伙伴对美国私营公司的风险融资(即投资)的数据集。这项研究是与风险投资公司Correlation Ventures合作进行的,该公司率先使用预测分析来更好地为投资决策提供信息。本文进行了详细的实证研究和数据分析,然后证明了推荐系统在这一新的应用领域的有效性。
{"title":"An empirical study of top-n recommendation for venture finance","authors":"T. Stone, Weinan Zhang, Xiaoxue Zhao","doi":"10.1145/2505515.2507882","DOIUrl":"https://doi.org/10.1145/2505515.2507882","url":null,"abstract":"This paper concerns the task of top-N investment opportunity recommendation in the domain of venture finance. By venture finance, specifically, we are interested in the investment activity of venture capital (VC) firms and their investment partners. We have access to a dataset of recorded venture financings (i.e., investments) by VCs and their investment partners in private US companies. This research was undertaken in partnership with Correlation Ventures, a venture capital firm who are pioneering the use of predictive analytics in order to better inform investment decision making. This paper undertakes a detailed empirical study and data analysis then demonstrates the efficacy of recommender systems in this novel application domain.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84454030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Improving pseudo-relevance feedback via tweet selection 通过tweet选择改善伪相关性反馈
Taiki Miyanishi, Kazuhiro Seki, K. Uehara
Query expansion methods using pseudo-relevance feedback have been shown effective for microblog search because they can solve vocabulary mismatch problems often seen in searching short documents such as Twitter messages (tweets), which are limited to 140 characters. Pseudo-relevance feedback assumes that the top ranked documents in the initial search results are relevant and that they contain topic-related words appropriate for relevance feedback. However, those assumptions do not always hold in reality because the initial search results often contain many irrelevant documents. In such a case, only a few of the suggested expansion words may be useful with many others being useless or even harmful. To overcome the limitation of pseudo-relevance feedback for microblog search, we propose a novel query expansion method based on two-stage relevance feedback that models search interests by manual tweet selection and integration of lexical and temporal evidence into its relevance model. Our experiments using a corpus of microblog data (the Tweets2011 corpus) demonstrate that the proposed two-stage relevance feedback approaches considerably improve search result relevance over almost all topics.
使用伪相关反馈的查询扩展方法在微博搜索中被证明是有效的,因为它可以解决搜索短文档(如Twitter消息)时经常出现的词汇不匹配问题,这些短文档限制在140个字符以内。伪相关性反馈假设在初始搜索结果中排名靠前的文档是相关的,并且它们包含适合相关性反馈的主题相关单词。然而,这些假设在现实中并不总是成立,因为最初的搜索结果通常包含许多不相关的文档。在这种情况下,只有少数建议的扩展词可能是有用的,其他许多是无用的,甚至是有害的。为了克服伪相关反馈在微博搜索中的局限性,提出了一种基于两阶段相关反馈的查询扩展方法,该方法通过人工选择推文并将词汇和时间证据整合到关联模型中来建模搜索兴趣。我们使用微博数据语料库(Tweets2011语料库)进行的实验表明,所提出的两阶段相关性反馈方法大大提高了几乎所有主题的搜索结果相关性。
{"title":"Improving pseudo-relevance feedback via tweet selection","authors":"Taiki Miyanishi, Kazuhiro Seki, K. Uehara","doi":"10.1145/2505515.2505701","DOIUrl":"https://doi.org/10.1145/2505515.2505701","url":null,"abstract":"Query expansion methods using pseudo-relevance feedback have been shown effective for microblog search because they can solve vocabulary mismatch problems often seen in searching short documents such as Twitter messages (tweets), which are limited to 140 characters. Pseudo-relevance feedback assumes that the top ranked documents in the initial search results are relevant and that they contain topic-related words appropriate for relevance feedback. However, those assumptions do not always hold in reality because the initial search results often contain many irrelevant documents. In such a case, only a few of the suggested expansion words may be useful with many others being useless or even harmful. To overcome the limitation of pseudo-relevance feedback for microblog search, we propose a novel query expansion method based on two-stage relevance feedback that models search interests by manual tweet selection and integration of lexical and temporal evidence into its relevance model. Our experiments using a corpus of microblog data (the Tweets2011 corpus) demonstrate that the proposed two-stage relevance feedback approaches considerably improve search result relevance over almost all topics.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"98 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80545887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Building user profiles from topic models for personalised search 从主题模型构建用户配置文件,用于个性化搜索
Morgan Harvey, F. Crestani, Mark James Carman
Personalisation is an important area in the field of IR that attempts to adapt ranking algorithms so that the results returned are tuned towards the searcher's interests. In this work we use query logs to build personalised ranking models in which user profiles are constructed based on the representation of clicked documents over a topic space. Instead of employing a human-generated ontology, we use novel latent topic models to determine these topics. Our experiments show that by subtly introducing user profiles as part of the ranking algorithm, rather than by re-ranking an existing list, we can provide personalised ranked lists of documents which improve significantly over a non-personalised baseline. Further examination shows that the performance of the personalised system is particularly good in cases where prior knowledge of the search query is limited.
个性化是IR领域的一个重要领域,它试图调整排名算法,使返回的结果根据搜索者的兴趣进行调整。在这项工作中,我们使用查询日志来构建个性化排名模型,其中用户配置文件是基于在主题空间上单击文档的表示来构建的。我们使用新的潜在主题模型来确定这些主题,而不是使用人类生成的本体。我们的实验表明,通过巧妙地引入用户配置文件作为排名算法的一部分,而不是通过对现有列表进行重新排名,我们可以提供个性化的文档排名列表,这比非个性化的基线有显著提高。进一步的检查表明,在搜索查询的先验知识有限的情况下,个性化系统的性能特别好。
{"title":"Building user profiles from topic models for personalised search","authors":"Morgan Harvey, F. Crestani, Mark James Carman","doi":"10.1145/2505515.2505642","DOIUrl":"https://doi.org/10.1145/2505515.2505642","url":null,"abstract":"Personalisation is an important area in the field of IR that attempts to adapt ranking algorithms so that the results returned are tuned towards the searcher's interests. In this work we use query logs to build personalised ranking models in which user profiles are constructed based on the representation of clicked documents over a topic space. Instead of employing a human-generated ontology, we use novel latent topic models to determine these topics. Our experiments show that by subtly introducing user profiles as part of the ranking algorithm, rather than by re-ranking an existing list, we can provide personalised ranked lists of documents which improve significantly over a non-personalised baseline. Further examination shows that the performance of the personalised system is particularly good in cases where prior knowledge of the search query is limited.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89468788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
期刊
Proceedings of the 22nd ACM international conference on Information & Knowledge Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1