首页 > 最新文献

Proceedings of the 22nd ACM international conference on Information & Knowledge Management最新文献

英文 中文
Seeking provenance of information using social media 利用社交媒体寻找信息的来源
Pritam Gundecha, Zhuo Feng, Huan Liu
Social media propagates breaking news and disinformation alike fast and on an unsurpassed scale. Because of its democratizing nature, social media users can easily produce, receive, and propagate a piece of information without necessarily providing traceable information. Thus, there are no means for a user to verify the provenance (aka sources or originators) of information. The disinformation can cause tragic consequences to society and individuals. This work aims to take advantage of characteristics of social media to provide a solution to the problem of lacking traceable information. Such knowledge can provide additional context to received information such that a user can assess how much value, trust, and validity should be placed in it. In this paper, we are studying a novel research problem that facilitates the seeking of the provenance of information for a few known recipients (less than 1% of the total recipients) by recovering the paths it has taken from its originators. The proposed methodology exploits easily computable node centralities of a large social media network. The experimental results with Facebook and Twitter datasets show that the proposed mechanism is effective in correctly identifying the additional recipients and seeking the provenance of information.
社交媒体以前所未有的速度传播突发新闻和虚假信息。由于其民主化的性质,社交媒体用户可以很容易地生产、接收和传播一条信息,而不必提供可追溯的信息。因此,用户没有办法验证信息的来源(即来源或发起人)。虚假信息会给社会和个人带来悲剧性的后果。这项工作旨在利用社交媒体的特点,为缺乏可追溯信息的问题提供解决方案。这些知识可以为接收到的信息提供额外的上下文,这样用户就可以评估这些信息的价值、信任和有效性。在本文中,我们正在研究一个新的研究问题,该问题通过恢复信息从其始发者处采取的路径,为少数已知的接收者(少于总接收者的1%)寻找信息的来源。该方法利用了大型社交媒体网络中易于计算的节点中心性。基于Facebook和Twitter数据集的实验结果表明,该机制在正确识别额外接收者和寻找信息来源方面是有效的。
{"title":"Seeking provenance of information using social media","authors":"Pritam Gundecha, Zhuo Feng, Huan Liu","doi":"10.1145/2505515.2505633","DOIUrl":"https://doi.org/10.1145/2505515.2505633","url":null,"abstract":"Social media propagates breaking news and disinformation alike fast and on an unsurpassed scale. Because of its democratizing nature, social media users can easily produce, receive, and propagate a piece of information without necessarily providing traceable information. Thus, there are no means for a user to verify the provenance (aka sources or originators) of information. The disinformation can cause tragic consequences to society and individuals. This work aims to take advantage of characteristics of social media to provide a solution to the problem of lacking traceable information. Such knowledge can provide additional context to received information such that a user can assess how much value, trust, and validity should be placed in it. In this paper, we are studying a novel research problem that facilitates the seeking of the provenance of information for a few known recipients (less than 1% of the total recipients) by recovering the paths it has taken from its originators. The proposed methodology exploits easily computable node centralities of a large social media network. The experimental results with Facebook and Twitter datasets show that the proposed mechanism is effective in correctly identifying the additional recipients and seeking the provenance of information.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"116 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74679947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
PIKM 2013: the 6th ACM workshop for ph.d. students in information and knowledge management PIKM 2013:第六届ACM信息与知识管理博士生研讨会
Fabian M. Suchanek, A. Nica
The PIKM workshop gives Ph.D. students an opportunity to present their dissertation proposals at a global stage. Similarly to the CIKM, the PIKM workshop covers a wide range of topics in the areas of databases, information retrieval and knowledge management. Interdisciplinary work across these tracks is particularly encouraged.
PIKM研讨会为博士生提供了在全球舞台上展示他们的论文提案的机会。与CIKM类似,PIKM讲习班涵盖了数据库、信息检索和知识管理领域的广泛主题。特别鼓励这些领域的跨学科工作。
{"title":"PIKM 2013: the 6th ACM workshop for ph.d. students in information and knowledge management","authors":"Fabian M. Suchanek, A. Nica","doi":"10.1145/2505515.2505817","DOIUrl":"https://doi.org/10.1145/2505515.2505817","url":null,"abstract":"The PIKM workshop gives Ph.D. students an opportunity to present their dissertation proposals at a global stage. Similarly to the CIKM, the PIKM workshop covers a wide range of topics in the areas of databases, information retrieval and knowledge management. Interdisciplinary work across these tracks is particularly encouraged.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74250422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modeling temporal effects of human mobile behavior on location-based social networks 基于位置的社交网络中人类移动行为的时间效应建模
Huiji Gao, Jiliang Tang, Xia Hu, Huan Liu
The rapid growth of location-based social networks (LBSNs) invigorates an increasing number of LBSN users, providing an unprecedented opportunity to study human mobile behavior from spatial, temporal, and social aspects. Among these aspects, temporal effects offer an essential contextual cue for inferring a user's movement. Strong temporal cyclic patterns have been observed in user movement in LBSNs with their correlated spatial and social effects (i.e., temporal correlations). It is a propitious time to model these temporal effects (patterns and correlations) on a user's mobile behavior. In this paper, we present the first comprehensive study of temporal effects on LBSNs. We propose a general framework to exploit and model temporal cyclic patterns and their relationships with spatial and social data. The experimental results on two real-world LBSN datasets validate the power of temporal effects in capturing user mobile behavior, and demonstrate the ability of our framework to select the most effective location prediction algorithm under various combinations of prediction models.
基于位置的社交网络(LBSNs)的快速发展激发了越来越多的LBSNs用户,为从空间、时间和社会方面研究人类移动行为提供了前所未有的机会。在这些方面中,时间效应为推断用户的移动提供了重要的上下文线索。在LBSNs的用户移动中观察到强烈的时间循环模式及其相关的空间和社会效应(即时间相关性)。现在是对用户移动行为的这些时间效应(模式和相关性)进行建模的好时机。在本文中,我们首次对lbsn的时间效应进行了全面研究。我们提出了一个总体框架来开发和模拟时间周期模式及其与空间和社会数据的关系。在两个真实的LBSN数据集上的实验结果验证了时间效应在捕获用户移动行为方面的力量,并证明了我们的框架能够在各种预测模型组合下选择最有效的位置预测算法。
{"title":"Modeling temporal effects of human mobile behavior on location-based social networks","authors":"Huiji Gao, Jiliang Tang, Xia Hu, Huan Liu","doi":"10.1145/2505515.2505616","DOIUrl":"https://doi.org/10.1145/2505515.2505616","url":null,"abstract":"The rapid growth of location-based social networks (LBSNs) invigorates an increasing number of LBSN users, providing an unprecedented opportunity to study human mobile behavior from spatial, temporal, and social aspects. Among these aspects, temporal effects offer an essential contextual cue for inferring a user's movement. Strong temporal cyclic patterns have been observed in user movement in LBSNs with their correlated spatial and social effects (i.e., temporal correlations). It is a propitious time to model these temporal effects (patterns and correlations) on a user's mobile behavior. In this paper, we present the first comprehensive study of temporal effects on LBSNs. We propose a general framework to exploit and model temporal cyclic patterns and their relationships with spatial and social data. The experimental results on two real-world LBSN datasets validate the power of temporal effects in capturing user mobile behavior, and demonstrate the ability of our framework to select the most effective location prediction algorithm under various combinations of prediction models.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76951472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 100
Merged aggregate nearest neighbor query processing in road networks 道路网络中合并聚合最近邻查询处理
Weiwei Sun, Chong Chen, Baihua Zheng, Chunan Chen, Liang Zhu, Weimo Liu, Y. Huang
Aggregate nearest neighbor query, which returns a common interesting point that minimizes the aggregate distance for a given query point set, is one of the most important operations in spatial databases and their application domains. This paper addresses the problem of finding the aggregate nearest neighbor for a merged set that consists of the given query point set and multiple points needed to be selected from a candidate set, which we name as merged aggregate nearest neighbor(MANN) query. This paper proposes an effective algorithm to process MANN query in road networks based on our pruning strategies. Extensive experiments are conducted to examine the behaviors of the solutions and the overall experiments show that our strategies to minimize the response time are effective and achieve several orders of magnitude speedup compared with the baseline methods.
聚合最近邻查询(Aggregate nearest neighbor query)是空间数据库及其应用程序领域中最重要的操作之一,它返回一个公共感兴趣的点,使给定查询点集的聚合距离最小。本文研究了由给定的查询点集和需要从候选集中选择的多个点组成的合并集的最近邻查询问题,我们称之为合并的最近邻查询(MANN)。本文提出了一种有效的基于修剪策略的道路网络MANN查询处理算法。我们进行了大量的实验来检验解决方案的行为,总体实验表明,我们的策略最小化响应时间是有效的,并且与基线方法相比实现了几个数量级的加速。
{"title":"Merged aggregate nearest neighbor query processing in road networks","authors":"Weiwei Sun, Chong Chen, Baihua Zheng, Chunan Chen, Liang Zhu, Weimo Liu, Y. Huang","doi":"10.1145/2505515.2505738","DOIUrl":"https://doi.org/10.1145/2505515.2505738","url":null,"abstract":"Aggregate nearest neighbor query, which returns a common interesting point that minimizes the aggregate distance for a given query point set, is one of the most important operations in spatial databases and their application domains. This paper addresses the problem of finding the aggregate nearest neighbor for a merged set that consists of the given query point set and multiple points needed to be selected from a candidate set, which we name as merged aggregate nearest neighbor(MANN) query. This paper proposes an effective algorithm to process MANN query in road networks based on our pruning strategies. Extensive experiments are conducted to examine the behaviors of the solutions and the overall experiments show that our strategies to minimize the response time are effective and achieve several orders of magnitude speedup compared with the baseline methods.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77544814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Graph-of-word and TW-IDF: new approach to ad hoc IR 词图和TW-IDF:特殊IR的新方法
F. Rousseau, M. Vazirgiannis
In this paper, we introduce novel document representation (graph-of-word) and retrieval model (TW-IDF) for ad hoc IR. Questioning the term independence assumption behind the traditional bag-of-word model, we propose a different representation of a document that captures the relationships between the terms using an unweighted directed graph of terms. From this graph, we extract at indexing time meaningful term weights (TW) that replace traditional term frequencies (TF) and from which we define a novel scoring function, namely TW-IDF, by analogy with TF-IDF. This approach leads to a retrieval model that consistently and significantly outperforms BM25 and in some cases its extension BM25+ on various standard TREC datasets. In particular, experiments show that counting the number of different contexts in which a term occurs inside a document is more effective and relevant to search than considering an overall concave term frequency in the context of ad hoc IR.
在本文中,我们引入了一种新的文本表示(词图)和检索模型(TW-IDF)。质疑传统词袋模型背后的术语独立性假设,我们提出了一种不同的文档表示,该表示使用术语的非加权有向图来捕获术语之间的关系。从这个图中,我们在索引时间提取有意义的术语权重(TW),取代传统的术语频率(TF),并由此定义一个新的评分函数,即TW- idf,类比TF- idf。这种方法产生的检索模型在各种标准TREC数据集上始终显著优于BM25,在某些情况下,它的扩展BM25+。特别是,实验表明,计算一个词在一个文档中出现的不同上下文的数量,比在特别的IR上下文中考虑一个整体的凹词频率更有效,更相关。
{"title":"Graph-of-word and TW-IDF: new approach to ad hoc IR","authors":"F. Rousseau, M. Vazirgiannis","doi":"10.1145/2505515.2505671","DOIUrl":"https://doi.org/10.1145/2505515.2505671","url":null,"abstract":"In this paper, we introduce novel document representation (graph-of-word) and retrieval model (TW-IDF) for ad hoc IR. Questioning the term independence assumption behind the traditional bag-of-word model, we propose a different representation of a document that captures the relationships between the terms using an unweighted directed graph of terms. From this graph, we extract at indexing time meaningful term weights (TW) that replace traditional term frequencies (TF) and from which we define a novel scoring function, namely TW-IDF, by analogy with TF-IDF. This approach leads to a retrieval model that consistently and significantly outperforms BM25 and in some cases its extension BM25+ on various standard TREC datasets. In particular, experiments show that counting the number of different contexts in which a term occurs inside a document is more effective and relevant to search than considering an overall concave term frequency in the context of ad hoc IR.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77661939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 147
Joint learning on sentiment and emotion classification 情感与情感分类的联合学习
Wei Gao, Shoushan Li, Sophia Yat-Mei Lee, Guodong Zhou, Chu-Ren Huang
Sentiment and emotion classification have been popularly but separately studied in natural language processing. In this paper, we address joint learning on sentiment and emotion classification where both the labeled data for sentiment and emotion classification are available. The objective of this joint-learning is to benefit the two tasks from each other for improving their performances. Specifically, an extra data set that is annotated with both sentiment and emotion labels are employed to estimate the transformation probability between the two kinds of labels. Furthermore, the transformation probability is leveraged to transfer the classification labels to benefit the two tasks from each other. Empirical studies demonstrate the effectiveness of our approach for the novel joint learning task.
情感和情绪分类是自然语言处理中比较流行但又分开研究的问题。在本文中,我们讨论了情感和情感分类的联合学习,其中情感和情感分类的标记数据都是可用的。这种联合学习的目的是使两个任务相互受益,提高它们的性能。具体来说,我们使用了一个额外的数据集,该数据集同时标注了情感和情感标签,以估计两种标签之间的转换概率。此外,利用转换概率转移分类标签,使两个任务相互受益。实证研究证明了我们的方法对新型联合学习任务的有效性。
{"title":"Joint learning on sentiment and emotion classification","authors":"Wei Gao, Shoushan Li, Sophia Yat-Mei Lee, Guodong Zhou, Chu-Ren Huang","doi":"10.1145/2505515.2507830","DOIUrl":"https://doi.org/10.1145/2505515.2507830","url":null,"abstract":"Sentiment and emotion classification have been popularly but separately studied in natural language processing. In this paper, we address joint learning on sentiment and emotion classification where both the labeled data for sentiment and emotion classification are available. The objective of this joint-learning is to benefit the two tasks from each other for improving their performances. Specifically, an extra data set that is annotated with both sentiment and emotion labels are employed to estimate the transformation probability between the two kinds of labels. Furthermore, the transformation probability is leveraged to transfer the classification labels to benefit the two tasks from each other. Empirical studies demonstrate the effectiveness of our approach for the novel joint learning task.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76872896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Zero-shot video retrieval using content and concepts 零镜头视频检索使用的内容和概念
Jeffrey Dalton, James Allan, P. Mirajkar
Recent research in video retrieval has been successful at finding videos when the query consists of tens or hundreds of sample relevant videos for training supervised models. Instead, we investigate unsupervised zero-shot retrieval where no training videos are provided: a query consists only of a text statement. For retrieval, we use text extracted from images in the videos, text recognized in the speech of its audio track, as well as automatically detected semantically meaningful visual video concepts identified with widely varying confidence in the videos. In this work we introduce a new method for automatically identifying relevant concepts given a text query using the Markov Random Field (MRF) retrieval framework. We use source expansion to build rich textual representations of semantic video concepts from large external sources such as the web. We find that concept-based retrieval significantly outperforms text based approaches in recall. Using an evaluation derived from the TRECVID MED'11 track, we present early results that an approach using multi-modal fusion can compensate for inadequacies in each modality, resulting in substantial effectiveness gains. With relevance feedback, our approach provides additional improvements of over 50%.
最近的视频检索研究已经成功地找到了由数十或数百个样本相关视频组成的视频,用于训练监督模型。相反,我们研究无监督的零投篮检索,其中没有提供训练视频:查询仅由文本语句组成。对于检索,我们使用从视频图像中提取的文本,在其音轨的语音中识别的文本,以及在视频中以广泛不同的置信度识别的自动检测语义上有意义的视觉视频概念。在这项工作中,我们引入了一种新的方法来自动识别相关概念给出一个文本查询使用马尔科夫随机场(MRF)检索框架。我们使用源扩展来构建来自大型外部源(如web)的语义视频概念的丰富文本表示。我们发现基于概念的检索在召回方面明显优于基于文本的方法。通过对TRECVID MED’11轨道的评估,我们提出了早期的结果,即使用多模态融合的方法可以弥补每种模态的不足,从而获得实质性的有效性提高。通过相关反馈,我们的方法提供了超过50%的额外改进。
{"title":"Zero-shot video retrieval using content and concepts","authors":"Jeffrey Dalton, James Allan, P. Mirajkar","doi":"10.1145/2505515.2507880","DOIUrl":"https://doi.org/10.1145/2505515.2507880","url":null,"abstract":"Recent research in video retrieval has been successful at finding videos when the query consists of tens or hundreds of sample relevant videos for training supervised models. Instead, we investigate unsupervised zero-shot retrieval where no training videos are provided: a query consists only of a text statement. For retrieval, we use text extracted from images in the videos, text recognized in the speech of its audio track, as well as automatically detected semantically meaningful visual video concepts identified with widely varying confidence in the videos. In this work we introduce a new method for automatically identifying relevant concepts given a text query using the Markov Random Field (MRF) retrieval framework. We use source expansion to build rich textual representations of semantic video concepts from large external sources such as the web. We find that concept-based retrieval significantly outperforms text based approaches in recall. Using an evaluation derived from the TRECVID MED'11 track, we present early results that an approach using multi-modal fusion can compensate for inadequacies in each modality, resulting in substantial effectiveness gains. With relevance feedback, our approach provides additional improvements of over 50%.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78870620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Beyond data: from user information to business value through personalized recommendations and consumer science 超越数据:通过个性化推荐和消费者科学,从用户信息到商业价值
X. Amatriain
Since the Netflix $1 million Prize, announced in 2006, Netflix has been known for having personalization at the core of our product. Our current product offering is nowadays focused around instant video streaming, and our data is now many orders of magnitude larger. Not only do we have many more users in many more countries, but we also receive many more streams of data. Besides the ratings, we now also use information such as what our members play, browse, or search. In this paper I will discuss the different approaches we follow to deal with these large streams of user data in order to extract information for personalizing our service. I will describe some of the machine learning models used, and their application in the service. I will also describe our data-driven approach to innovation that combines rapid offline explorations as well as online A/B testing. This approach enables us to convert user information into real and measurable business value.
自2006年宣布Netflix获得100万美元大奖以来,Netflix一直以将个性化作为我们产品的核心而闻名。我们目前提供的产品主要集中在即时视频流媒体上,我们的数据现在要大很多个数量级。我们不仅在更多的国家拥有更多的用户,而且还接收到更多的数据流。除了评级,我们现在还使用诸如我们的成员播放,浏览或搜索的信息。在本文中,我将讨论处理这些大型用户数据流的不同方法,以便提取信息以个性化我们的服务。我将描述所使用的一些机器学习模型,以及它们在服务中的应用。我还将介绍我们的数据驱动创新方法,该方法结合了快速的离线探索和在线A/B测试。这种方法使我们能够将用户信息转换为真实的、可测量的业务价值。
{"title":"Beyond data: from user information to business value through personalized recommendations and consumer science","authors":"X. Amatriain","doi":"10.1145/2505515.2514701","DOIUrl":"https://doi.org/10.1145/2505515.2514701","url":null,"abstract":"Since the Netflix $1 million Prize, announced in 2006, Netflix has been known for having personalization at the core of our product. Our current product offering is nowadays focused around instant video streaming, and our data is now many orders of magnitude larger. Not only do we have many more users in many more countries, but we also receive many more streams of data. Besides the ratings, we now also use information such as what our members play, browse, or search. In this paper I will discuss the different approaches we follow to deal with these large streams of user data in order to extract information for personalizing our service. I will describe some of the machine learning models used, and their application in the service. I will also describe our data-driven approach to innovation that combines rapid offline explorations as well as online A/B testing. This approach enables us to convert user information into real and measurable business value.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80408311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Improving pseudo-relevance feedback via tweet selection 通过tweet选择改善伪相关性反馈
Taiki Miyanishi, Kazuhiro Seki, K. Uehara
Query expansion methods using pseudo-relevance feedback have been shown effective for microblog search because they can solve vocabulary mismatch problems often seen in searching short documents such as Twitter messages (tweets), which are limited to 140 characters. Pseudo-relevance feedback assumes that the top ranked documents in the initial search results are relevant and that they contain topic-related words appropriate for relevance feedback. However, those assumptions do not always hold in reality because the initial search results often contain many irrelevant documents. In such a case, only a few of the suggested expansion words may be useful with many others being useless or even harmful. To overcome the limitation of pseudo-relevance feedback for microblog search, we propose a novel query expansion method based on two-stage relevance feedback that models search interests by manual tweet selection and integration of lexical and temporal evidence into its relevance model. Our experiments using a corpus of microblog data (the Tweets2011 corpus) demonstrate that the proposed two-stage relevance feedback approaches considerably improve search result relevance over almost all topics.
使用伪相关反馈的查询扩展方法在微博搜索中被证明是有效的,因为它可以解决搜索短文档(如Twitter消息)时经常出现的词汇不匹配问题,这些短文档限制在140个字符以内。伪相关性反馈假设在初始搜索结果中排名靠前的文档是相关的,并且它们包含适合相关性反馈的主题相关单词。然而,这些假设在现实中并不总是成立,因为最初的搜索结果通常包含许多不相关的文档。在这种情况下,只有少数建议的扩展词可能是有用的,其他许多是无用的,甚至是有害的。为了克服伪相关反馈在微博搜索中的局限性,提出了一种基于两阶段相关反馈的查询扩展方法,该方法通过人工选择推文并将词汇和时间证据整合到关联模型中来建模搜索兴趣。我们使用微博数据语料库(Tweets2011语料库)进行的实验表明,所提出的两阶段相关性反馈方法大大提高了几乎所有主题的搜索结果相关性。
{"title":"Improving pseudo-relevance feedback via tweet selection","authors":"Taiki Miyanishi, Kazuhiro Seki, K. Uehara","doi":"10.1145/2505515.2505701","DOIUrl":"https://doi.org/10.1145/2505515.2505701","url":null,"abstract":"Query expansion methods using pseudo-relevance feedback have been shown effective for microblog search because they can solve vocabulary mismatch problems often seen in searching short documents such as Twitter messages (tweets), which are limited to 140 characters. Pseudo-relevance feedback assumes that the top ranked documents in the initial search results are relevant and that they contain topic-related words appropriate for relevance feedback. However, those assumptions do not always hold in reality because the initial search results often contain many irrelevant documents. In such a case, only a few of the suggested expansion words may be useful with many others being useless or even harmful. To overcome the limitation of pseudo-relevance feedback for microblog search, we propose a novel query expansion method based on two-stage relevance feedback that models search interests by manual tweet selection and integration of lexical and temporal evidence into its relevance model. Our experiments using a corpus of microblog data (the Tweets2011 corpus) demonstrate that the proposed two-stage relevance feedback approaches considerably improve search result relevance over almost all topics.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"98 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80545887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Augmenting web search surrogates with images 用图像增强网络搜索代理
Robert G. Capra, Jaime Arguello, Falk Scholer
While images are commonly used in search result presentation for vertical domains such as shopping and news, web search results surrogates remain primarily text-based. In this paper, we present results of two large-scale user studies to examine the effects of augmenting text-based surrogates with images extracted from the underlying webpage. We evaluate effectiveness and efficiency at both the individual surrogate level and at the results page level. Additionally, we investigate the influence of two factors: the goodness of the image in terms of representing the underlying page content, and the diversity of the results on a results page. Our results show that at the individual surrogate level, good images provide only a small benefit in judgment accuracy versus text-only surrogates, with a slight increase in judgment time. At the results page level, surrogates with good images had similar effectiveness and efficiency compared to the text-only condition. However, in situations where the results page items had diverse senses, surrogates with images had higher click precision versus text-only ones. Results of these studies show tradeoffs in the use of images in web search surrogates, and highlight particular situations where they can provide benefits.
虽然图像通常用于购物和新闻等垂直领域的搜索结果显示,但网络搜索结果替代仍然主要是基于文本的。在本文中,我们展示了两个大规模用户研究的结果,以检验从底层网页提取图像增强基于文本的代理的效果。我们在单个代理级别和结果页面级别评估有效性和效率。此外,我们还研究了两个因素的影响:图像在表示潜在页面内容方面的良好性,以及结果页面上结果的多样性。我们的结果表明,在个体代理水平上,与纯文本代理相比,好的图像在判断准确性方面只提供了很小的好处,而判断时间则略有增加。在结果页面级别,与纯文本条件相比,具有良好图像的替代品具有相似的效果和效率。然而,在结果页面项具有多种感觉的情况下,带有图像的替代品比只有文本的替代品具有更高的点击精度。这些研究的结果显示了在网络搜索代理中使用图像的权衡,并强调了它们可以提供好处的特定情况。
{"title":"Augmenting web search surrogates with images","authors":"Robert G. Capra, Jaime Arguello, Falk Scholer","doi":"10.1145/2505515.2505714","DOIUrl":"https://doi.org/10.1145/2505515.2505714","url":null,"abstract":"While images are commonly used in search result presentation for vertical domains such as shopping and news, web search results surrogates remain primarily text-based. In this paper, we present results of two large-scale user studies to examine the effects of augmenting text-based surrogates with images extracted from the underlying webpage. We evaluate effectiveness and efficiency at both the individual surrogate level and at the results page level. Additionally, we investigate the influence of two factors: the goodness of the image in terms of representing the underlying page content, and the diversity of the results on a results page. Our results show that at the individual surrogate level, good images provide only a small benefit in judgment accuracy versus text-only surrogates, with a slight increase in judgment time. At the results page level, surrogates with good images had similar effectiveness and efficiency compared to the text-only condition. However, in situations where the results page items had diverse senses, surrogates with images had higher click precision versus text-only ones. Results of these studies show tradeoffs in the use of images in web search surrogates, and highlight particular situations where they can provide benefits.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81673012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
Proceedings of the 22nd ACM international conference on Information & Knowledge Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1