首页 > 最新文献

Proceedings of the 22nd ACM international conference on Information & Knowledge Management最新文献

英文 中文
Where shall we go today?: planning touristic tours with tripbuilder 我们今天去哪儿?:与tripbuilder一起策划旅游项目
Igo Ramalho Brilhante, J. Macêdo, F. M. Nardini, R. Perego, C. Renso
In this paper we propose TripBuilder, a new framework for personalized touristic tour planning. We mine from Flickr the information about the actual itineraries followed by a multitude of different tourists, and we match these itineraries on the touristic Point of Interests available from Wikipedia. The task of planning personalized touristic tours is then modeled as an instance of the Generalized Maximum Coverage problem. Wisdom-of-the-crowds information allows us to derive touristic plans that maximize a measure of interest for the tourist given her preferences and visiting time-budget. Experimental results on three different touristic cities show that our approach is effective and outperforms strong baselines.
本文提出了一种新的个性化旅游规划框架TripBuilder。我们从Flickr中挖掘出大量不同游客的实际行程信息,并将这些行程与维基百科上的旅游兴趣点进行匹配。然后将个性化旅游计划任务建模为广义最大覆盖问题的一个实例。群体智慧信息使我们能够根据游客的偏好和旅游时间预算,制定出最大限度地提高游客兴趣的旅游计划。在三个不同旅游城市的实验结果表明,我们的方法是有效的,并且优于强基线。
{"title":"Where shall we go today?: planning touristic tours with tripbuilder","authors":"Igo Ramalho Brilhante, J. Macêdo, F. M. Nardini, R. Perego, C. Renso","doi":"10.1145/2505515.2505643","DOIUrl":"https://doi.org/10.1145/2505515.2505643","url":null,"abstract":"In this paper we propose TripBuilder, a new framework for personalized touristic tour planning. We mine from Flickr the information about the actual itineraries followed by a multitude of different tourists, and we match these itineraries on the touristic Point of Interests available from Wikipedia. The task of planning personalized touristic tours is then modeled as an instance of the Generalized Maximum Coverage problem. Wisdom-of-the-crowds information allows us to derive touristic plans that maximize a measure of interest for the tourist given her preferences and visiting time-budget. Experimental results on three different touristic cities show that our approach is effective and outperforms strong baselines.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81666087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
Spatio-temporal and events based analysis of topic popularity in twitter 基于时空和事件的twitter话题流行度分析
S. Ardon, A. Bagchi, A. Mahanti, Amit Ruhela, Aaditeshwar Seth, R. M. Tripathy, Sipat Triukose
We present the first comprehensive characterization of the diffusion of ideas on Twitter, studying more than 5.96 million topics that include both popular and less popular topics. On a data set containing approximately 10 million users and a comprehensive scraping of 196 million tweets, we perform a rigorous temporal and spatial analysis, investigating the time-evolving properties of the subgraphs formed by the users discussing each topic. We focus on two different notions of the spatial: the network topology formed by follower-following links on Twitter, and the geospatial location of the users. We investigate the effect of initiators on the popularity of topics and find that users with a high number of followers have a strong impact on topic popularity. We deduce that topics become popular when disjoint clusters of users discussing them begin to merge and form one giant component that grows to cover a significant fraction of the network. Our geospatial analysis shows that highly popular topics are those that cross regional boundaries aggressively.
我们首次对Twitter上的思想传播进行了全面表征,研究了超过596万个话题,其中包括热门话题和不太热门的话题。在包含大约1000万用户和1.96亿条推文的数据集上,我们执行了严格的时间和空间分析,调查了由讨论每个主题的用户形成的子图的时间演变属性。我们关注两种不同的空间概念:由Twitter上的关注者-关注者链接形成的网络拓扑,以及用户的地理空间位置。我们研究了发起者对话题受欢迎程度的影响,发现拥有大量关注者的用户对话题受欢迎程度有很强的影响。我们推断,当讨论话题的不相关的用户群开始合并并形成一个巨大的组成部分,并逐渐覆盖网络的很大一部分时,话题就会变得流行起来。我们的地理空间分析表明,最受欢迎的话题是那些积极跨越区域边界的话题。
{"title":"Spatio-temporal and events based analysis of topic popularity in twitter","authors":"S. Ardon, A. Bagchi, A. Mahanti, Amit Ruhela, Aaditeshwar Seth, R. M. Tripathy, Sipat Triukose","doi":"10.1145/2505515.2505525","DOIUrl":"https://doi.org/10.1145/2505515.2505525","url":null,"abstract":"We present the first comprehensive characterization of the diffusion of ideas on Twitter, studying more than 5.96 million topics that include both popular and less popular topics. On a data set containing approximately 10 million users and a comprehensive scraping of 196 million tweets, we perform a rigorous temporal and spatial analysis, investigating the time-evolving properties of the subgraphs formed by the users discussing each topic. We focus on two different notions of the spatial: the network topology formed by follower-following links on Twitter, and the geospatial location of the users. We investigate the effect of initiators on the popularity of topics and find that users with a high number of followers have a strong impact on topic popularity. We deduce that topics become popular when disjoint clusters of users discussing them begin to merge and form one giant component that grows to cover a significant fraction of the network. Our geospatial analysis shows that highly popular topics are those that cross regional boundaries aggressively.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"115 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81791143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
SPHINX: rich insights into evidence-hypotheses relationships via parameter space-based exploration 狮身人面像:通过参数空间探索对证据-假设关系的丰富见解
Abhishek Mukherji, Jason Whitehouse, Christopher R. Botaish, Elke A. Rundensteiner, M. Ward
We demonstrate our SPHINX system that not only derives but also visualizes evidence-hypotheses relationships on a parameter space of belief and plausibility. SPHINX facilitates the analyst to interactively explore the contribution of different pieces of evidence towards the hypotheses. The key technical contributions of SPHINX include both computational and visual dimensions. The computational contributions cover (a.) flexible computational model selection; and (b.) real-time incremental strength computations. The visual contributions include (a.) sense-making over parameter space; (b.) filtering and abstraction options; (c.) novel visual displays such as evidence glyph and skyline views. Using two real datasets, we will demonstrate that the SPHINX system provides the analysts with rich insights into evidence-hypothesis relationships facilitating the discovery and decision making process.
我们演示了我们的SPHINX系统,它不仅可以在信念和似是而非的参数空间上导出证据-假设关系,而且还可以将证据-假设关系可视化。SPHINX使分析人员能够交互式地探索不同证据对假设的贡献。SPHINX的关键技术贡献包括计算维度和视觉维度。计算贡献包括(a)灵活的计算模型选择;(b)实时增量强度计算。视觉贡献包括(a)参数空间上的意义构建;(b)过滤和抽象选项;(c)新颖的视觉显示,如证据字形和天际线视图。使用两个真实的数据集,我们将演示SPHINX系统为分析人员提供了丰富的证据-假设关系的见解,从而促进了发现和决策过程。
{"title":"SPHINX: rich insights into evidence-hypotheses relationships via parameter space-based exploration","authors":"Abhishek Mukherji, Jason Whitehouse, Christopher R. Botaish, Elke A. Rundensteiner, M. Ward","doi":"10.1145/2505515.2508202","DOIUrl":"https://doi.org/10.1145/2505515.2508202","url":null,"abstract":"We demonstrate our SPHINX system that not only derives but also visualizes evidence-hypotheses relationships on a parameter space of belief and plausibility. SPHINX facilitates the analyst to interactively explore the contribution of different pieces of evidence towards the hypotheses. The key technical contributions of SPHINX include both computational and visual dimensions. The computational contributions cover (a.) flexible computational model selection; and (b.) real-time incremental strength computations. The visual contributions include (a.) sense-making over parameter space; (b.) filtering and abstraction options; (c.) novel visual displays such as evidence glyph and skyline views. Using two real datasets, we will demonstrate that the SPHINX system provides the analysts with rich insights into evidence-hypothesis relationships facilitating the discovery and decision making process.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79487752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The water filling model and the cube test: multi-dimensional evaluation for professional search 注水模型与立方体试验:多维度评价专业搜索
Jiyun Luo, Christopher Wing, G. Yang, Marti A. Hearst
Professional search activities such as patent and legal search are often time sensitive and consist of rich information needs with multiple aspects or subtopics. This paper proposes a 3D water filling model to describe this search process, and derives a new evaluation metric, the Cube Test, to encompass the complex nature of professional search. The new metric is compared against state-of-the-art patent search evaluation metrics as well as Web search evaluation metrics over two distinct patent datasets. The experimental results show that the Cube Test metric effectively captures the characteristics and requirements of professional search.
专业搜索活动,如专利和法律搜索,通常是时间敏感的,并且包含具有多个方面或子主题的丰富信息需求。本文提出了一个三维注水模型来描述这一搜索过程,并推导了一个新的评估指标——立方体测试,以涵盖专业搜索的复杂性。新指标与最先进的专利检索评估指标以及两个不同专利数据集上的Web检索评估指标进行了比较。实验结果表明,Cube Test度量有效地捕捉了专业搜索的特征和要求。
{"title":"The water filling model and the cube test: multi-dimensional evaluation for professional search","authors":"Jiyun Luo, Christopher Wing, G. Yang, Marti A. Hearst","doi":"10.1145/2505515.2523648","DOIUrl":"https://doi.org/10.1145/2505515.2523648","url":null,"abstract":"Professional search activities such as patent and legal search are often time sensitive and consist of rich information needs with multiple aspects or subtopics. This paper proposes a 3D water filling model to describe this search process, and derives a new evaluation metric, the Cube Test, to encompass the complex nature of professional search. The new metric is compared against state-of-the-art patent search evaluation metrics as well as Web search evaluation metrics over two distinct patent datasets. The experimental results show that the Cube Test metric effectively captures the characteristics and requirements of professional search.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85075475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Evaluating aggregated search using interleaving 使用交错评估聚合搜索
A. Chuklin, Anne Schuth, Katja Hofmann, P. Serdyukov, M. de Rijke
A result page of a modern web search engine is often much more complicated than a simple list of "ten blue links." In particular, a search engine may combine results from different sources (e.g., Web, News, and Images), and display these as grouped results to provide a better user experience. Such a system is called an aggregated or federated search system. Because search engines evolve over time, their results need to be constantly evaluated. However, one of the most efficient and widely used evaluation methods, interleaving, cannot be directly applied to aggregated search systems, as it ignores the need to group results originating from the same source (vertical results). We propose an interleaving algorithm that allows comparisons of search engine result pages containing grouped vertical documents. We compare our algorithm to existing interleaving algorithms and other evaluation methods (such as A/B-testing), both on real-life click log data and in simulation experiments. We find that our algorithm allows us to perform unbiased and accurate interleaved comparisons that are comparable to conventional evaluation techniques. We also show that our interleaving algorithm produces a ranking that does not substantially alter the user experience, while being sensitive to changes in both the vertical result block and the non-vertical document rankings. All this makes our proposed interleaving algorithm an essential tool for comparing IR systems with complex aggregated pages.
现代网络搜索引擎的结果页通常比“十个蓝色链接”的简单列表要复杂得多。特别是,搜索引擎可以组合来自不同来源的结果(例如,Web、新闻和图像),并将这些结果分组显示,以提供更好的用户体验。这样的系统称为聚合或联合搜索系统。因为搜索引擎随着时间的推移而发展,它们的结果需要不断地被评估。然而,最有效和最广泛使用的评估方法之一,交错,不能直接应用于聚合搜索系统,因为它忽略了对来自同一来源的结果(垂直结果)进行分组的需要。我们提出了一种交错算法,允许对包含分组垂直文档的搜索引擎结果页面进行比较。我们将我们的算法与现有的交错算法和其他评估方法(如A/ b测试)进行比较,无论是在现实生活中的点击日志数据还是在模拟实验中。我们发现我们的算法允许我们执行与传统评估技术相当的无偏和准确的交错比较。我们还展示了我们的交错算法产生的排名不会实质性地改变用户体验,同时对垂直结果块和非垂直文档排名的变化都很敏感。所有这些都使我们提出的交错算法成为比较IR系统与复杂聚合页面的重要工具。
{"title":"Evaluating aggregated search using interleaving","authors":"A. Chuklin, Anne Schuth, Katja Hofmann, P. Serdyukov, M. de Rijke","doi":"10.1145/2505515.2505698","DOIUrl":"https://doi.org/10.1145/2505515.2505698","url":null,"abstract":"A result page of a modern web search engine is often much more complicated than a simple list of \"ten blue links.\" In particular, a search engine may combine results from different sources (e.g., Web, News, and Images), and display these as grouped results to provide a better user experience. Such a system is called an aggregated or federated search system. Because search engines evolve over time, their results need to be constantly evaluated. However, one of the most efficient and widely used evaluation methods, interleaving, cannot be directly applied to aggregated search systems, as it ignores the need to group results originating from the same source (vertical results). We propose an interleaving algorithm that allows comparisons of search engine result pages containing grouped vertical documents. We compare our algorithm to existing interleaving algorithms and other evaluation methods (such as A/B-testing), both on real-life click log data and in simulation experiments. We find that our algorithm allows us to perform unbiased and accurate interleaved comparisons that are comparable to conventional evaluation techniques. We also show that our interleaving algorithm produces a ranking that does not substantially alter the user experience, while being sensitive to changes in both the vertical result block and the non-vertical document rankings. All this makes our proposed interleaving algorithm an essential tool for comparing IR systems with complex aggregated pages.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85464675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Disinformation techniques for entity resolution 实体解决的虚假信息技术
Steven Euijong Whang, H. Garcia-Molina
We study the problem of disinformation. We assume that an ``agent'' has some sensitive information that the ``adversary'' is trying to obtain. For example, a camera company (the agent) may secretly be developing its new camera model, and a user (the adversary) may want to know in advance the detailed specs of the model. The agent's goal is to disseminate false information to ``dilute'' what is known by the adversary. We model the adversary as an Entity Resolution (ER) process that pieces together available information. We formalize the problem of finding the disinformation with the highest benefit given a limited budget for creating the disinformation and propose efficient algorithms for solving the problem. We then evaluate our disinformation planning algorithms on real and synthetic data and compare the robustness of existing ER algorithms. In general, our disinformation techniques can be used as a framework for testing ER robustness.
我们研究虚假信息的问题。我们假设一个“特工”拥有一些“对手”试图获取的敏感信息。例如,一家相机公司(代理)可能正在秘密开发其新的相机型号,而用户(对手)可能希望提前知道该型号的详细规格。特工的目标是传播虚假信息,以“稀释”对手所知道的信息。我们将对手建模为将可用信息拼凑在一起的实体解析(ER)过程。我们形式化了在有限的预算条件下寻找效益最高的假信息的问题,并提出了解决该问题的有效算法。然后,我们在真实数据和合成数据上评估了我们的虚假信息规划算法,并比较了现有ER算法的鲁棒性。一般来说,我们的假信息技术可以用作测试ER鲁棒性的框架。
{"title":"Disinformation techniques for entity resolution","authors":"Steven Euijong Whang, H. Garcia-Molina","doi":"10.1145/2505515.2505636","DOIUrl":"https://doi.org/10.1145/2505515.2505636","url":null,"abstract":"We study the problem of disinformation. We assume that an ``agent'' has some sensitive information that the ``adversary'' is trying to obtain. For example, a camera company (the agent) may secretly be developing its new camera model, and a user (the adversary) may want to know in advance the detailed specs of the model. The agent's goal is to disseminate false information to ``dilute'' what is known by the adversary. We model the adversary as an Entity Resolution (ER) process that pieces together available information. We formalize the problem of finding the disinformation with the highest benefit given a limited budget for creating the disinformation and propose efficient algorithms for solving the problem. We then evaluate our disinformation planning algorithms on real and synthetic data and compare the robustness of existing ER algorithms. In general, our disinformation techniques can be used as a framework for testing ER robustness.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"114 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82287366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Personalization of web-search using short-term browsing context 使用短期浏览上下文的网络搜索个性化
Yury Ustinovsky, P. Serdyukov
Search and browsing activity is known to be a valuable source of information about user's search intent. It is extensively utilized by most of modern search engines to improve ranking by constructing certain ranking features as well as by personalizing search. Personalization aims at two major goals: extraction of stable preferences of a user and specification and disambiguation of the current query. The common way to approach these problems is to extract information from user's search and browsing long-term history and to utilize short-term history to determine the context of a given query. Personalization of the web search for the first queries in new search sessions of new users is more difficult due to the lack of both long- and short-term data. In this paper we study the problem of short-term personalization. To be more precise, we restrict our attention to the set of initial queries of search sessions. These, with the lack of contextual information, are known to be the most challenging for short-term personalization and are not covered by previous studies on the subject. To approach this problem in the absence of the search context, we employ short-term browsing context. We apply a widespread framework for personalization of search results based on the re-ranking approach and evaluate our methods on the large scale data. The proposed methods are shown to significantly improve non-personalized ranking of one of the major commercial search engines. To the best of our knowledge this is the first study addressing the problem of short-term personalization based on recent browsing history. We find that performance of this re-ranking approach can be reasonably predicted given a query. When we restrict the use of our method to the queries with largest expected gain, the resulting benefit of personalization increases significantly
众所周知,搜索和浏览活动是用户搜索意图的宝贵信息来源。它被大多数现代搜索引擎广泛使用,通过构建某些排名特征以及个性化搜索来提高排名。个性化旨在实现两个主要目标:提取用户和规范的稳定偏好,消除当前查询的歧义。解决这些问题的常用方法是从用户的搜索和浏览长期历史中提取信息,并利用短期历史来确定给定查询的上下文。由于缺乏长期和短期数据,在新用户的新搜索会话中对第一个查询的网络搜索进行个性化更加困难。本文主要研究短期个性化问题。更准确地说,我们将注意力限制在搜索会话的初始查询集上。由于缺乏上下文信息,这些被认为是短期个性化最具挑战性的,并且没有被先前的研究所涵盖。为了在没有搜索上下文的情况下解决这个问题,我们使用了短期浏览上下文。我们应用了基于重新排序方法的搜索结果个性化的广泛框架,并在大规模数据上评估了我们的方法。结果表明,所提出的方法显著提高了一个主要商业搜索引擎的非个性化排名。据我们所知,这是第一个解决基于最近浏览历史的短期个性化问题的研究。我们发现,给定一个查询,可以合理地预测这种重新排序方法的性能。当我们将方法的使用限制在预期收益最大的查询时,个性化带来的好处会显著增加
{"title":"Personalization of web-search using short-term browsing context","authors":"Yury Ustinovsky, P. Serdyukov","doi":"10.1145/2505515.2505679","DOIUrl":"https://doi.org/10.1145/2505515.2505679","url":null,"abstract":"Search and browsing activity is known to be a valuable source of information about user's search intent. It is extensively utilized by most of modern search engines to improve ranking by constructing certain ranking features as well as by personalizing search. Personalization aims at two major goals: extraction of stable preferences of a user and specification and disambiguation of the current query. The common way to approach these problems is to extract information from user's search and browsing long-term history and to utilize short-term history to determine the context of a given query. Personalization of the web search for the first queries in new search sessions of new users is more difficult due to the lack of both long- and short-term data. In this paper we study the problem of short-term personalization. To be more precise, we restrict our attention to the set of initial queries of search sessions. These, with the lack of contextual information, are known to be the most challenging for short-term personalization and are not covered by previous studies on the subject. To approach this problem in the absence of the search context, we employ short-term browsing context. We apply a widespread framework for personalization of search results based on the re-ranking approach and evaluate our methods on the large scale data. The proposed methods are shown to significantly improve non-personalized ranking of one of the major commercial search engines. To the best of our knowledge this is the first study addressing the problem of short-term personalization based on recent browsing history. We find that performance of this re-ranking approach can be reasonably predicted given a query. When we restrict the use of our method to the queries with largest expected gain, the resulting benefit of personalization increases significantly","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80591204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Inferring anchor links across multiple heterogeneous social networks 推断跨多个异构社会网络的锚链接
Xiangnan Kong, Jiawei Zhang, Philip S. Yu
Online social networks can often be represented as heterogeneous information networks containing abundant information about: who, where, when and what. Nowadays, people are usually involved in multiple social networks simultaneously. The multiple accounts of the same user in different networks are mostly isolated from each other without any connection between them. Discovering the correspondence of these accounts across multiple social networks is a crucial prerequisite for many interesting inter-network applications, such as link recommendation and community analysis using information from multiple networks. In this paper, we study the problem of anchor link prediction across multiple heterogeneous social networks, i.e., discovering the correspondence among different accounts of the same user. Unlike most prior work on link prediction and network alignment, we assume that the anchor links are one-to-one relationships (i.e., no two edges share a common endpoint) between the accounts in two social networks, and a small number of anchor links are known beforehand. We propose to extract heterogeneous features from multiple heterogeneous networks for anchor link prediction, including user's social, spatial, temporal and text information. Then we formulate the inference problem for anchor links as a stable matching problem between the two sets of user accounts in two different networks. An effective solution, MNA (Multi-Network Anchoring), is derived to infer anchor links w.r.t. the one-to-one constraint. Extensive experiments on two real-world heterogeneous social networks show that our MNA model consistently outperform other commonly-used baselines on anchor link prediction.
在线社交网络通常可以被表示为异构信息网络,其中包含有关何人、何地、何时和何事的丰富信息。如今,人们通常同时参与多个社交网络。同一用户在不同网络中的多个账号大多是相互隔离的,彼此之间没有任何连接。对于许多有趣的网络间应用程序(例如使用来自多个网络的信息的链接推荐和社区分析)来说,发现这些帐户跨多个社会网络的对应关系是一个至关重要的先决条件。在本文中,我们研究了跨多个异构社交网络的锚链接预测问题,即发现同一用户的不同帐户之间的对应关系。与之前大多数关于链接预测和网络对齐的工作不同,我们假设两个社交网络中的账户之间的锚链接是一对一的关系(即没有两条边共享一个共同的端点),并且事先知道少量锚链接。我们提出从多个异构网络中提取异构特征,包括用户的社会信息、空间信息、时间信息和文本信息,用于锚链接预测。然后,我们将锚链接的推理问题表述为两个不同网络中两组用户帐户之间的稳定匹配问题。推导出了一种有效的解决方案,即MNA(多网络锚定),它可以在一对一约束的基础上推断锚点链接。在两个真实的异构社交网络上进行的大量实验表明,我们的MNA模型在锚链接预测方面始终优于其他常用的基线。
{"title":"Inferring anchor links across multiple heterogeneous social networks","authors":"Xiangnan Kong, Jiawei Zhang, Philip S. Yu","doi":"10.1145/2505515.2505531","DOIUrl":"https://doi.org/10.1145/2505515.2505531","url":null,"abstract":"Online social networks can often be represented as heterogeneous information networks containing abundant information about: who, where, when and what. Nowadays, people are usually involved in multiple social networks simultaneously. The multiple accounts of the same user in different networks are mostly isolated from each other without any connection between them. Discovering the correspondence of these accounts across multiple social networks is a crucial prerequisite for many interesting inter-network applications, such as link recommendation and community analysis using information from multiple networks. In this paper, we study the problem of anchor link prediction across multiple heterogeneous social networks, i.e., discovering the correspondence among different accounts of the same user. Unlike most prior work on link prediction and network alignment, we assume that the anchor links are one-to-one relationships (i.e., no two edges share a common endpoint) between the accounts in two social networks, and a small number of anchor links are known beforehand. We propose to extract heterogeneous features from multiple heterogeneous networks for anchor link prediction, including user's social, spatial, temporal and text information. Then we formulate the inference problem for anchor links as a stable matching problem between the two sets of user accounts in two different networks. An effective solution, MNA (Multi-Network Anchoring), is derived to infer anchor links w.r.t. the one-to-one constraint. Extensive experiments on two real-world heterogeneous social networks show that our MNA model consistently outperform other commonly-used baselines on anchor link prediction.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82533704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 336
Predicting the impact of expansion terms using semantic and user interaction features 使用语义和用户交互特征预测扩展术语的影响
A. Bakhtin, Yury Ustinovsky, P. Serdyukov
Query expansion for Information Retrieval is a challenging task. On the one hand, low quality expansion may hurt either recall, due to vocabulary mismatch, or precision, due to topic drift, and therefore reduce user satisfaction. On the other hand, utilizing a large number of expansion terms for a query may easily lead to resource consumption overhead. As web search engines apply strict constraints on response time, it is essential to estimate the impact of each expansion term on query performance at the pre-retrieval time. Our experimental results confirm that a significant part of expansions do not improve query performance, and it is possible to detect such expansions at the pre-retrieval time.
信息检索的查询扩展是一项具有挑战性的任务。一方面,低质量的扩展可能会由于词汇不匹配而损害召回,或者由于主题漂移而损害准确性,从而降低用户满意度。另一方面,对一个查询使用大量的展开术语可能很容易导致资源消耗开销。由于web搜索引擎对响应时间有严格的限制,因此在检索前估计每个扩展项对查询性能的影响是非常必要的。我们的实验结果证实,很大一部分扩展不会提高查询性能,并且可以在预检索时检测到此类扩展。
{"title":"Predicting the impact of expansion terms using semantic and user interaction features","authors":"A. Bakhtin, Yury Ustinovsky, P. Serdyukov","doi":"10.1145/2505515.2507872","DOIUrl":"https://doi.org/10.1145/2505515.2507872","url":null,"abstract":"Query expansion for Information Retrieval is a challenging task. On the one hand, low quality expansion may hurt either recall, due to vocabulary mismatch, or precision, due to topic drift, and therefore reduce user satisfaction. On the other hand, utilizing a large number of expansion terms for a query may easily lead to resource consumption overhead. As web search engines apply strict constraints on response time, it is essential to estimate the impact of each expansion term on query performance at the pre-retrieval time. Our experimental results confirm that a significant part of expansions do not improve query performance, and it is possible to detect such expansions at the pre-retrieval time.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80869711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Users versus models: what observation tells us about effectiveness metrics 用户vs模型:观察告诉我们关于有效性度量的什么
Alistair Moffat, Paul Thomas, Falk Scholer
Retrieval system effectiveness can be measured in two quite different ways: by monitoring the behavior of users and gathering data about the ease and accuracy with which they accomplish certain specified information-seeking tasks; or by using numeric effectiveness metrics to score system runs in reference to a set of relevance judgments. In the second approach, the effectiveness metric is chosen in the belief that user task performance, if it were to be measured by the first approach, should be linked to the score provided by the metric. This work explores that link, by analyzing the assumptions and implications of a number of effectiveness metrics, and exploring how these relate to observable user behaviors. Data recorded as part of a user study included user self-assessment of search task difficulty; gaze position; and click activity. Our results show that user behavior is influenced by a blend of many factors, including the extent to which relevant documents are encountered, the stage of the search process, and task difficulty. These insights can be used to guide development of batch effectiveness metrics.
检索系统的有效性可以通过两种截然不同的方式来衡量:通过监测用户的行为并收集有关用户完成某些特定信息搜索任务的容易程度和准确性的数据;或者通过使用数字有效性度量来根据一组相关判断对系统运行进行评分。在第二种方法中,选择有效性度量是基于这样一种信念,即如果要用第一种方法度量用户任务性能,则应该将其与度量提供的分数联系起来。这项工作通过分析一些有效性指标的假设和含义,并探索这些指标与可观察到的用户行为之间的关系,探讨了这种联系。作为用户研究的一部分记录的数据包括用户对搜索任务难度的自我评估;凝视位置;点击活动。我们的研究结果表明,用户行为受到许多因素的混合影响,包括遇到相关文档的程度、搜索过程的阶段和任务难度。这些见解可用于指导批量有效性指标的开发。
{"title":"Users versus models: what observation tells us about effectiveness metrics","authors":"Alistair Moffat, Paul Thomas, Falk Scholer","doi":"10.1145/2505515.2507665","DOIUrl":"https://doi.org/10.1145/2505515.2507665","url":null,"abstract":"Retrieval system effectiveness can be measured in two quite different ways: by monitoring the behavior of users and gathering data about the ease and accuracy with which they accomplish certain specified information-seeking tasks; or by using numeric effectiveness metrics to score system runs in reference to a set of relevance judgments. In the second approach, the effectiveness metric is chosen in the belief that user task performance, if it were to be measured by the first approach, should be linked to the score provided by the metric. This work explores that link, by analyzing the assumptions and implications of a number of effectiveness metrics, and exploring how these relate to observable user behaviors. Data recorded as part of a user study included user self-assessment of search task difficulty; gaze position; and click activity. Our results show that user behavior is influenced by a blend of many factors, including the extent to which relevant documents are encountered, the stage of the search process, and task difficulty. These insights can be used to guide development of batch effectiveness metrics.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83023020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
期刊
Proceedings of the 22nd ACM international conference on Information & Knowledge Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1