首页 > 最新文献

Proceedings of the 25th International Conference on World Wide Web最新文献

英文 中文
Joint Recognition and Linking of Fine-Grained Locations from Tweets 微博中细粒度位置的联合识别和链接
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883067
Zongcheng Ji, Aixin Sun, G. Cong, Jialong Han
Many users casually reveal their locations such as restaurants, landmarks, and shops in their tweets. Recognizing such fine-grained locations from tweets and then linking the location mentions to well-defined location profiles (e.g., with formal name, detailed address, and geo-coordinates etc.) offer a tremendous opportunity for many applications. Different from existing solutions which perform location recognition and linking as two sub-tasks sequentially in a pipeline setting, in this paper, we propose a novel joint framework to perform location recognition and location linking simultaneously in a joint search space. We formulate this end-to-end location linking problem as a structured prediction problem and propose a beam-search based algorithm. Based on the concept of multi-view learning, we further enable the algorithm to learn from unlabeled data to alleviate the dearth of labeled data. Extensive experiments are conducted to recognize locations mentioned in tweets and link them to location profiles in Foursquare. Experimental results show that the proposed joint learning algorithm outperforms the state-of-the-art solutions, and learning from unlabeled data improves both the recognition and linking accuracy.
许多用户在推特上随意地透露他们的位置,如餐馆、地标和商店。从tweet中识别这种细粒度的位置,然后将位置提到链接到定义良好的位置配置文件(例如,具有正式名称、详细地址和地理坐标等),为许多应用程序提供了巨大的机会。与现有的将位置识别和链接作为两个子任务在管道设置中依次执行的解决方案不同,本文提出了一种新的联合框架,在联合搜索空间中同时执行位置识别和位置链接。我们将这种端到端位置连接问题表述为结构化预测问题,并提出了一种基于波束搜索的算法。基于多视图学习的概念,我们进一步使算法能够从未标记的数据中学习,以缓解标记数据的缺乏。他们进行了大量的实验来识别推文中提到的地点,并将它们链接到Foursquare上的位置资料。实验结果表明,所提出的联合学习算法优于当前的解决方案,并且从未标记数据中学习可以提高识别和链接的准确性。
{"title":"Joint Recognition and Linking of Fine-Grained Locations from Tweets","authors":"Zongcheng Ji, Aixin Sun, G. Cong, Jialong Han","doi":"10.1145/2872427.2883067","DOIUrl":"https://doi.org/10.1145/2872427.2883067","url":null,"abstract":"Many users casually reveal their locations such as restaurants, landmarks, and shops in their tweets. Recognizing such fine-grained locations from tweets and then linking the location mentions to well-defined location profiles (e.g., with formal name, detailed address, and geo-coordinates etc.) offer a tremendous opportunity for many applications. Different from existing solutions which perform location recognition and linking as two sub-tasks sequentially in a pipeline setting, in this paper, we propose a novel joint framework to perform location recognition and location linking simultaneously in a joint search space. We formulate this end-to-end location linking problem as a structured prediction problem and propose a beam-search based algorithm. Based on the concept of multi-view learning, we further enable the algorithm to learn from unlabeled data to alleviate the dearth of labeled data. Extensive experiments are conducted to recognize locations mentioned in tweets and link them to location profiles in Foursquare. Experimental results show that the proposed joint learning algorithm outperforms the state-of-the-art solutions, and learning from unlabeled data improves both the recognition and linking accuracy.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75148404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Tweet Properly: Analyzing Deleted Tweets to Understand and Identify Regrettable Ones 正确推文:分析已删除的推文,以理解和识别令人遗憾的推文
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883052
Lu Zhou, Wenbo Wang, Keke Chen
Inappropriate tweets can cause severe damages on authors' reputation or privacy. However, many users do not realize the negative consequences until they publish these tweets. Published tweets have lasting effects that may not be eliminated by simple deletion because other users may have read them or third-party tweet analysis platforms have cached them. Regrettable tweets, i.e., tweets with identifiable regrettable contents, cause the most damage on their authors because other users can easily notice them. In this paper, we study how to identify the regrettable tweets published by emph{normal individual users} via the contents and users' historical deletion patterns. We identify normal individual users based on their publishing, deleting, followers and friends statistics. We manually examine a set of randomly sampled deleted tweets from these users to identify regrettable tweets and understand the corresponding regrettable reasons. By applying content-based features and personalized history-based features, we develop classifiers that can effectively predict regrettable tweets.
不恰当的推文会对作者的名誉或隐私造成严重损害。然而,许多用户直到发布了这些推文才意识到负面后果。发布的推文具有持久的影响,可能不会通过简单的删除来消除,因为其他用户可能已经阅读了它们,或者第三方推文分析平台已经缓存了它们。遗憾的推文,即具有可识别的遗憾内容的推文,对其作者造成的损害最大,因为其他用户很容易注意到它们。本文研究了如何通过内容和用户的历史删除模式来识别emph{正常个人用户}发布的遗憾推文。我们根据他们的发布、删除、关注者和好友统计数据来识别正常的个人用户。我们手动检查一组随机抽样的从这些用户删除的推文,以识别遗憾的推文,并了解相应的遗憾原因。通过应用基于内容的特征和个性化的基于历史的特征,我们开发了可以有效预测遗憾推文的分类器。
{"title":"Tweet Properly: Analyzing Deleted Tweets to Understand and Identify Regrettable Ones","authors":"Lu Zhou, Wenbo Wang, Keke Chen","doi":"10.1145/2872427.2883052","DOIUrl":"https://doi.org/10.1145/2872427.2883052","url":null,"abstract":"Inappropriate tweets can cause severe damages on authors' reputation or privacy. However, many users do not realize the negative consequences until they publish these tweets. Published tweets have lasting effects that may not be eliminated by simple deletion because other users may have read them or third-party tweet analysis platforms have cached them. Regrettable tweets, i.e., tweets with identifiable regrettable contents, cause the most damage on their authors because other users can easily notice them. In this paper, we study how to identify the regrettable tweets published by emph{normal individual users} via the contents and users' historical deletion patterns. We identify normal individual users based on their publishing, deleting, followers and friends statistics. We manually examine a set of randomly sampled deleted tweets from these users to identify regrettable tweets and understand the corresponding regrettable reasons. By applying content-based features and personalized history-based features, we develop classifiers that can effectively predict regrettable tweets.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75277381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Voting with Their Feet: Inferring User Preferences from App Management Activities 用脚投票:从应用管理活动推断用户偏好
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2874814
Huoran Li, W. Ai, Xuanzhe Liu, Jian Tang, Gang Huang, Feng Feng, Q. Mei
Smartphone users have adopted an explosive number of mobile applications (a.k.a., apps) in the recent years. App marketplaces for iOS, Android and Windows Phone platforms host millions of apps which have been downloaded for more than 100 billion times. Investigating how people manage mobile apps in their everyday lives creates a unique opportunity to understand the behavior and preferences of mobile users, to infer the quality of apps, and to improve the user experience. Existing literature provides very limited knowledge about app management activities, due to the lack of user behavioral data at scale. This paper takes the initiative to analyze a very large app management log collected through a leading Android app marketplace. The data set covers five months of detailed downloading, updating, and uninstallation activities, involving 17 million anonymized users and one million apps. We present a surprising finding that the metrics commonly used by app stores to rank apps do not truly reflect the users' real attitudes towards the apps. We then identify useful patterns from the app management activities that much more accurately predict the user preferences of an app even when no user rating is available.
近年来,智能手机用户采用了大量的移动应用程序(又称应用程序)。iOS、Android和Windows Phone平台的应用市场上有数百万款应用,下载量超过1000亿次。调查人们在日常生活中如何管理手机应用程序,为了解手机用户的行为和偏好,推断应用程序的质量,并改善用户体验创造了一个独特的机会。由于缺乏大规模的用户行为数据,现有文献提供的关于应用管理活动的知识非常有限。本文首先分析了通过Android应用市场收集的大量应用管理日志。该数据集涵盖了五个月的详细下载、更新和卸载活动,涉及1700万匿名用户和100万个应用程序。我们提出了一个令人惊讶的发现,即应用商店通常用于应用排名的指标并不能真正反映用户对应用的真实态度。然后,我们从应用管理活动中识别出有用的模式,即使在没有用户评分的情况下,也能更准确地预测用户对应用的偏好。
{"title":"Voting with Their Feet: Inferring User Preferences from App Management Activities","authors":"Huoran Li, W. Ai, Xuanzhe Liu, Jian Tang, Gang Huang, Feng Feng, Q. Mei","doi":"10.1145/2872427.2874814","DOIUrl":"https://doi.org/10.1145/2872427.2874814","url":null,"abstract":"Smartphone users have adopted an explosive number of mobile applications (a.k.a., apps) in the recent years. App marketplaces for iOS, Android and Windows Phone platforms host millions of apps which have been downloaded for more than 100 billion times. Investigating how people manage mobile apps in their everyday lives creates a unique opportunity to understand the behavior and preferences of mobile users, to infer the quality of apps, and to improve the user experience. Existing literature provides very limited knowledge about app management activities, due to the lack of user behavioral data at scale. This paper takes the initiative to analyze a very large app management log collected through a leading Android app marketplace. The data set covers five months of detailed downloading, updating, and uninstallation activities, involving 17 million anonymized users and one million apps. We present a surprising finding that the metrics commonly used by app stores to rank apps do not truly reflect the users' real attitudes towards the apps. We then identify useful patterns from the app management activities that much more accurately predict the user preferences of an app even when no user rating is available.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75404457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Predicting Pre-click Quality for Native Advertisements 预测原生广告的预点击质量
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883053
K. Zhou, Miriam Redi, Andrew Haines, M. Lalmas
Native advertising is a specific form of online advertising where ads replicate the look-and-feel of their serving platform. In such context, providing a good user experience with the served ads is crucial to ensure long-term user engagement. In this work, we explore the notion of ad quality, namely the effectiveness of advertising from a user experience perspective. We design a learning framework to predict the pre-click quality of native ads. More specifically, we look at detecting offensive native ads, showing that, to quantify ad quality, ad offensive user feedback rates are more reliable than the commonly used click-through rate metrics. We then conduct a crowd-sourcing study to identify which criteria drive user preferences in native advertising. We translate these criteria into a set of ad quality features that we extract from the ad text, image and advertiser, and then use them to train a model able to identify offensive ads. We show that our model is very effective in detecting offensive ads, and provide in-depth insights on how different features affect ad quality. Finally, we deploy a preliminary version of such model and show its effectiveness in the reduction of the offensive ad feedback rate.
原生广告是在线广告的一种特殊形式,广告复制其服务平台的外观和感觉。在这种情况下,为广告提供良好的用户体验对于确保长期用户粘性至关重要。在这项工作中,我们从用户体验的角度探讨了广告质量的概念,即广告的有效性。我们设计了一个学习框架来预测原生广告的预点击质量。更具体地说,我们着眼于检测攻击性原生广告,结果表明,为了量化广告质量,广告攻击性用户反馈率比常用的点击率指标更可靠。然后,我们进行了一项众包研究,以确定哪些标准驱动用户对原生广告的偏好。我们将这些标准转化为一组从广告文本、图像和广告主中提取的广告质量特征,然后使用它们来训练一个能够识别攻击性广告的模型。我们证明了我们的模型在检测攻击性广告方面非常有效,并提供了不同特征如何影响广告质量的深入见解。最后,我们部署了该模型的初步版本,并展示了其在降低攻击性广告反馈率方面的有效性。
{"title":"Predicting Pre-click Quality for Native Advertisements","authors":"K. Zhou, Miriam Redi, Andrew Haines, M. Lalmas","doi":"10.1145/2872427.2883053","DOIUrl":"https://doi.org/10.1145/2872427.2883053","url":null,"abstract":"Native advertising is a specific form of online advertising where ads replicate the look-and-feel of their serving platform. In such context, providing a good user experience with the served ads is crucial to ensure long-term user engagement. In this work, we explore the notion of ad quality, namely the effectiveness of advertising from a user experience perspective. We design a learning framework to predict the pre-click quality of native ads. More specifically, we look at detecting offensive native ads, showing that, to quantify ad quality, ad offensive user feedback rates are more reliable than the commonly used click-through rate metrics. We then conduct a crowd-sourcing study to identify which criteria drive user preferences in native advertising. We translate these criteria into a set of ad quality features that we extract from the ad text, image and advertiser, and then use them to train a model able to identify offensive ads. We show that our model is very effective in detecting offensive ads, and provide in-depth insights on how different features affect ad quality. Finally, we deploy a preliminary version of such model and show its effectiveness in the reduction of the offensive ad feedback rate.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75563015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
La Sécurité Ouverte How We Doin? So Far? La s<s:1> curit<s:1> Ouverte我们怎么做?到目前为止?
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883583
M. Zurko
Open has meant a lot of things in the web thus far. The openness of the web has had profound implications for web security, from the beginning through to today. Each time the underlying web technology changes, we do a reset on the security it provides. Patterns and differences emerge in each round of security responses and challenges. What has that brought us as web users, technologists, researchers, and as a global community? What can we expect going forward? And what should we work towards as web technologists and caretakers?
到目前为止,开放在网络上已经意味着很多事情。从一开始到今天,网络的开放性对网络安全产生了深远的影响。每次底层网络技术发生变化,我们都会对其提供的安全性进行重置。在每一轮安全响应和挑战中都会出现模式和差异。作为网络用户、技术人员、研究人员和全球社区,这给我们带来了什么?未来我们能期待什么?作为网络技术人员和管理员,我们应该朝着什么方向努力?
{"title":"La Sécurité Ouverte How We Doin? So Far?","authors":"M. Zurko","doi":"10.1145/2872427.2883583","DOIUrl":"https://doi.org/10.1145/2872427.2883583","url":null,"abstract":"Open has meant a lot of things in the web thus far. The openness of the web has had profound implications for web security, from the beginning through to today. Each time the underlying web technology changes, we do a reset on the security it provides. Patterns and differences emerge in each round of security responses and challenges. What has that brought us as web users, technologists, researchers, and as a global community? What can we expect going forward? And what should we work towards as web technologists and caretakers?","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78414582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Patterns of Identity Usage in Tweets: A New Problem, Solution and Case Study 推文中身份使用模式的探索:一个新问题、解决方案和案例研究
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883027
K. Joseph, Wei Wei, Kathleen M. Carley
Sociologists have long been interested in the ways that identities, or labels for people, are created, used and applied across various social contexts. The present work makes two contributions to the study of identity, in particular the study of identity in text. We first consider the following novel NLP task: given a set of text data (here, from Twitter), label each word in the text as being representative of a (possibly multi-word) identity. To address this task, we develop a comprehensive feature set that leverages several avenues of recent NLP work on Twitter and use these features to train a supervised classifier. Our model outperforms a surprisingly strong rule-based baseline by 33%. We then use our model for a case study, applying it to a large corpora of Twitter data from users who actively discussed the Eric Garner and Michael Brown cases. Among other findings, we observe that the identities used by individuals differ in interesting ways based on social context measures derived from census data.
长期以来,社会学家一直对人们的身份或标签在不同社会背景下的创造、使用和应用方式感兴趣。本文对身份研究,特别是文本中的身份研究做出了两方面的贡献。我们首先考虑以下新颖的NLP任务:给定一组文本数据(这里来自Twitter),将文本中的每个单词标记为代表(可能是多单词)身份。为了完成这项任务,我们开发了一个综合的特征集,利用Twitter上最近NLP工作的几种途径,并使用这些特征来训练监督分类器。我们的模型比基于规则的基线高出33%。然后,我们将我们的模型用于案例研究,将其应用于积极讨论埃里克·加纳和迈克尔·布朗案件的用户的大型Twitter数据语料库。在其他发现中,我们观察到,基于来自人口普查数据的社会背景测量,个人使用的身份以有趣的方式不同。
{"title":"Exploring Patterns of Identity Usage in Tweets: A New Problem, Solution and Case Study","authors":"K. Joseph, Wei Wei, Kathleen M. Carley","doi":"10.1145/2872427.2883027","DOIUrl":"https://doi.org/10.1145/2872427.2883027","url":null,"abstract":"Sociologists have long been interested in the ways that identities, or labels for people, are created, used and applied across various social contexts. The present work makes two contributions to the study of identity, in particular the study of identity in text. We first consider the following novel NLP task: given a set of text data (here, from Twitter), label each word in the text as being representative of a (possibly multi-word) identity. To address this task, we develop a comprehensive feature set that leverages several avenues of recent NLP work on Twitter and use these features to train a supervised classifier. Our model outperforms a surprisingly strong rule-based baseline by 33%. We then use our model for a case study, applying it to a large corpora of Twitter data from users who actively discussed the Eric Garner and Michael Brown cases. Among other findings, we observe that the identities used by individuals differ in interesting ways based on social context measures derived from census data.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77903756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
PCT: Partial Co-Alignment of Social Networks PCT:社会网络的部分协同
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883038
Jiawei Zhang, Philip S. Yu
People nowadays usually participate in multiple online social networks simultaneously to enjoy more social network services. Besides the common users, social networks providing similar services can also share many other kinds of information entities, e.g., locations, videos and products. However, these shared information entities in different networks are mostly isolated without any known corresponding connections. In this paper, we aim at inferring such potential corresponding connections linking multiple kinds of shared entities across networks simultaneously. Formally, the problem is referred to as the network "Partial Co-alignmenT" (PCT) problem. PCT is an important problem and can be the prerequisite for many concrete cross-network applications, like social network fusion, mutual information exchange and transfer. Meanwhile, the PCT problem is also very challenging to address due to various reasons, like (1) the heterogeneity of social networks, (2) lack of training instances to build models, and (3) one-to-one constraint on the correspondence connections. To resolve these challenges, a novel unsupervised network alignment framework, UNICOAT (UNsupervIsed COncurrent AlignmenT)), is introduced in this paper. Based on the heterogeneous information, UNICOAT transforms the PCT problem into a joint optimization problem. To solve the objective function, the one-to-one constraint on the corresponding relationships is relaxed, and the redundant non-existing corresponding connections introduced by such a relaxation will be pruned with a novel network co-matching algorithm proposed in this paper. Extensive experiments conducted on real-world co-aligned social network datasets demonstrate the effectiveness of UNICOAT in addressing the PCT problem.
现在的人们通常同时参与多个在线社交网络,以享受更多的社交网络服务。除了共同用户之外,提供类似服务的社交网络还可以共享许多其他类型的信息实体,如地点、视频、产品等。然而,这些共享的信息实体在不同的网络中大多是孤立的,没有任何已知的对应连接。在本文中,我们的目的是推断出这种潜在的对应连接,将网络上的多种共享实体同时连接起来。正式地,这个问题被称为网络“部分共对准”(PCT)问题。PCT是一个重要的问题,它可以成为许多具体的跨网络应用的前提,如社交网络融合、相互信息交换和传递。同时,由于各种原因,PCT问题的解决也非常具有挑战性,如:(1)社会网络的异质性;(2)缺乏用于构建模型的训练实例;(3)对应连接的一对一约束。为了解决这些问题,本文提出了一种新的无监督网络对齐框架UNICOAT (unsupervised COncurrent alignment)。基于异构信息,UNICOAT将PCT问题转化为联合优化问题。为了求解目标函数,将对应关系上的一对一约束放宽,并利用本文提出的一种新颖的网络协同匹配算法对这种松弛所引入的冗余不存在的对应连接进行剪枝。在现实世界的共对齐社会网络数据集上进行的大量实验证明了UNICOAT在解决PCT问题方面的有效性。
{"title":"PCT: Partial Co-Alignment of Social Networks","authors":"Jiawei Zhang, Philip S. Yu","doi":"10.1145/2872427.2883038","DOIUrl":"https://doi.org/10.1145/2872427.2883038","url":null,"abstract":"People nowadays usually participate in multiple online social networks simultaneously to enjoy more social network services. Besides the common users, social networks providing similar services can also share many other kinds of information entities, e.g., locations, videos and products. However, these shared information entities in different networks are mostly isolated without any known corresponding connections. In this paper, we aim at inferring such potential corresponding connections linking multiple kinds of shared entities across networks simultaneously. Formally, the problem is referred to as the network \"Partial Co-alignmenT\" (PCT) problem. PCT is an important problem and can be the prerequisite for many concrete cross-network applications, like social network fusion, mutual information exchange and transfer. Meanwhile, the PCT problem is also very challenging to address due to various reasons, like (1) the heterogeneity of social networks, (2) lack of training instances to build models, and (3) one-to-one constraint on the correspondence connections. To resolve these challenges, a novel unsupervised network alignment framework, UNICOAT (UNsupervIsed COncurrent AlignmenT)), is introduced in this paper. Based on the heterogeneous information, UNICOAT transforms the PCT problem into a joint optimization problem. To solve the objective function, the one-to-one constraint on the corresponding relationships is relaxed, and the redundant non-existing corresponding connections introduced by such a relaxation will be pruned with a novel network co-matching algorithm proposed in this paper. Extensive experiments conducted on real-world co-aligned social network datasets demonstrate the effectiveness of UNICOAT in addressing the PCT problem.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85373503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Table Cell Search for Question Answering 表格单元格搜索问题回答
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883080
Huan Sun, Hao Ma, Xiaodong He, Wen-tau Yih, Yu Su, Xifeng Yan
Tables are pervasive on the Web. Informative web tables range across a large variety of topics, which can naturally serve as a significant resource to satisfy user information needs. Driven by such observations, in this paper, we investigate an important yet largely under-addressed problem: Given millions of tables, how to precisely retrieve table cells to answer a user question. This work proposes a novel table cell search framework to attack this problem. We first formulate the concept of a relational chain which connects two cells in a table and represents the semantic relation between them. With the help of search engine snippets, our framework generates a set of relational chains pointing to potentially correct answer cells. We further employ deep neural networks to conduct more fine-grained inference on which relational chains best match the input question and finally extract the corresponding answer cells. Based on millions of tables crawled from the Web, we evaluate our framework in the open-domain question answering (QA) setting, using both the well-known WebQuestions dataset and user queries mined from Bing search engine logs. On WebQuestions, our framework is comparable to state-of-the-art QA systems based on knowledge bases (KBs), while on Bing queries, it outperforms other systems with a 56.7% relative gain. Moreover, when combined with results from our framework, KB-based QA performance can obtain a relative improvement of 28.1% to 66.7%, demonstrating that web tables supply rich knowledge that might not exist or is difficult to be identified in existing KBs.
表格在网络上无处不在。信息性web表涵盖了各种各样的主题,自然可以作为满足用户信息需求的重要资源。在这种观察的驱使下,在本文中,我们研究了一个重要但在很大程度上没有得到解决的问题:给定数百万个表,如何精确地检索表单元格来回答用户的问题。这项工作提出了一个新的表单元格搜索框架来解决这个问题。我们首先形成关系链的概念,它连接表中的两个单元格,并表示它们之间的语义关系。在搜索引擎片段的帮助下,我们的框架生成一组指向可能正确答案单元格的关系链。我们进一步使用深度神经网络对哪个关系链最适合输入问题进行更细粒度的推理,并最终提取相应的答案单元。基于从Web上抓取的数百万个表,我们在开放域问答(QA)设置中评估了我们的框架,使用了众所周知的WebQuestions数据集和从Bing搜索引擎日志中挖掘的用户查询。在WebQuestions上,我们的框架可以与最先进的基于知识库(KBs)的QA系统相媲美,而在Bing查询上,它以56.7%的相对增益优于其他系统。此外,当与我们的框架的结果相结合时,基于知识库的QA性能可以获得28.1%到66.7%的相对改进,这表明web表提供了丰富的知识,这些知识可能在现有知识库中不存在或难以识别。
{"title":"Table Cell Search for Question Answering","authors":"Huan Sun, Hao Ma, Xiaodong He, Wen-tau Yih, Yu Su, Xifeng Yan","doi":"10.1145/2872427.2883080","DOIUrl":"https://doi.org/10.1145/2872427.2883080","url":null,"abstract":"Tables are pervasive on the Web. Informative web tables range across a large variety of topics, which can naturally serve as a significant resource to satisfy user information needs. Driven by such observations, in this paper, we investigate an important yet largely under-addressed problem: Given millions of tables, how to precisely retrieve table cells to answer a user question. This work proposes a novel table cell search framework to attack this problem. We first formulate the concept of a relational chain which connects two cells in a table and represents the semantic relation between them. With the help of search engine snippets, our framework generates a set of relational chains pointing to potentially correct answer cells. We further employ deep neural networks to conduct more fine-grained inference on which relational chains best match the input question and finally extract the corresponding answer cells. Based on millions of tables crawled from the Web, we evaluate our framework in the open-domain question answering (QA) setting, using both the well-known WebQuestions dataset and user queries mined from Bing search engine logs. On WebQuestions, our framework is comparable to state-of-the-art QA systems based on knowledge bases (KBs), while on Bing queries, it outperforms other systems with a 56.7% relative gain. Moreover, when combined with results from our framework, KB-based QA performance can obtain a relative improvement of 28.1% to 66.7%, demonstrating that web tables supply rich knowledge that might not exist or is difficult to be identified in existing KBs.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83601179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
TrackMeOrNot: Enabling Flexible Control on Web Tracking TrackMeOrNot:在Web跟踪上启用灵活控制
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883034
W. Meng, Byoungyoung Lee, Xinyu Xing, Wenke Lee
Recent advance in web tracking technologies has raised many privacy concerns. To combat users' fear of privacy invasion, online vendors have taken measures such as being more transparent with users about their data use and providing options for users to manage their online activities. Such efforts gain users' trust in online vendors and improve their willingness to share their digital footprints. However, there are still a significant amount of users who actively limit involuntarily sharing of data because vendor provided management tools only restrict the use of collected data and users worry vendors do not have enough measures in place to protect their privacy sensitive information. In this paper, we propose TrackMeOrNot, a new anti-tracking mechanism. It allows users to selectively share their online footprints with vendors. With TrackMeOrNot, users are no longer concerned with privacy. Using it, users can specify their privacy sensitive activities and selectively disclose their activities to vendors based on their specified privacy demands. We implemented TrackMeOrNot on Chromium browser and systematically evaluated its performance using a large set of test cases. We show that TrackMeOrNot can efficiently and effectively shield privacy sensitive browsing activities.
最近网络追踪技术的进步引发了许多隐私问题。为了消除用户对隐私被侵犯的恐惧,网络供应商已经采取了一些措施,比如在数据使用方面对用户更加透明,并为用户提供管理其在线活动的选项。这些努力赢得了用户对在线供应商的信任,提高了他们分享数字足迹的意愿。然而,仍然有相当数量的用户主动限制非自愿的数据共享,因为供应商提供的管理工具只限制收集数据的使用,用户担心供应商没有足够的措施来保护他们的隐私敏感信息。本文提出了一种新的反跟踪机制TrackMeOrNot。它允许用户有选择地与供应商分享他们的在线足迹。有了TrackMeOrNot,用户不再担心隐私问题。使用它,用户可以指定他们的隐私敏感活动,并根据他们指定的隐私需求有选择地向供应商披露他们的活动。我们在Chromium浏览器上实现了TrackMeOrNot,并使用大量的测试用例系统地评估了它的性能。我们表明TrackMeOrNot可以有效地屏蔽隐私敏感的浏览活动。
{"title":"TrackMeOrNot: Enabling Flexible Control on Web Tracking","authors":"W. Meng, Byoungyoung Lee, Xinyu Xing, Wenke Lee","doi":"10.1145/2872427.2883034","DOIUrl":"https://doi.org/10.1145/2872427.2883034","url":null,"abstract":"Recent advance in web tracking technologies has raised many privacy concerns. To combat users' fear of privacy invasion, online vendors have taken measures such as being more transparent with users about their data use and providing options for users to manage their online activities. Such efforts gain users' trust in online vendors and improve their willingness to share their digital footprints. However, there are still a significant amount of users who actively limit involuntarily sharing of data because vendor provided management tools only restrict the use of collected data and users worry vendors do not have enough measures in place to protect their privacy sensitive information. In this paper, we propose TrackMeOrNot, a new anti-tracking mechanism. It allows users to selectively share their online footprints with vendors. With TrackMeOrNot, users are no longer concerned with privacy. Using it, users can specify their privacy sensitive activities and selectively disclose their activities to vendors based on their specified privacy demands. We implemented TrackMeOrNot on Chromium browser and systematically evaluated its performance using a large set of test cases. We show that TrackMeOrNot can efficiently and effectively shield privacy sensitive browsing activities.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85394820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
On the Temporal Dynamics of Opinion Spamming: Case Studies on Yelp 论垃圾意见的时间动态:以Yelp为例
Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883087
C. SantoshK., Arjun Mukherjee
Recently, the problem of opinion spam has been widespread and has attracted a lot of research attention. While the problem has been approached on a variety of dimensions, the temporal dynamics in which opinion spamming operates is unclear. Are there specific spamming policies that spammers employ? What kind of changes happen with respect to the dynamics to the truthful ratings on entities. How do buffered spamming operate for entities that need spamming to retain threshold popularity and reduced spamming for entities making better success? We analyze these questions in the light of time-series analysis on Yelp. Our analyses discover various temporal patterns and their relationships with the rate at which fake reviews are posted. Building on our analyses, we employ vector autoregression to predict the rate of deception across different spamming policies. Next, we explore the effect of filtered reviews on (long-term and imminent) future rating and popularity prediction of entities. Our results discover novel temporal dynamics of spamming which are intuitive, arguable and also render confidence on Yelp's filtering. Lastly, we leverage our discovered temporal patterns in deception detection. Experimental results on large-scale reviews show the effectiveness of our approach that significantly improves the existing approaches.
近年来,垃圾意见问题越来越普遍,引起了人们的广泛关注。虽然这个问题已经在不同的维度上得到解决,但意见垃圾邮件运作的时间动态尚不清楚。垃圾邮件发送者是否采用了特定的垃圾邮件策略?关于实体的真实评级的动态会发生什么样的变化。对于需要发送垃圾邮件以保持阈值流行度的实体,缓冲垃圾邮件如何操作,并减少垃圾邮件以使实体获得更好的成功?我们结合Yelp的时间序列分析来分析这些问题。我们的分析发现了各种时间模式,以及它们与虚假评论发布率的关系。基于我们的分析,我们使用向量自回归来预测不同垃圾邮件策略中的欺骗率。接下来,我们探讨了过滤评论对实体(长期和近期)未来评级和受欢迎程度预测的影响。我们的结果发现新的时间动态的垃圾邮件,这是直观的,有争议的,也使信心Yelp的过滤。最后,我们利用我们发现的欺骗检测的时间模式。大规模评论的实验结果表明,我们的方法显著改进了现有方法的有效性。
{"title":"On the Temporal Dynamics of Opinion Spamming: Case Studies on Yelp","authors":"C. SantoshK., Arjun Mukherjee","doi":"10.1145/2872427.2883087","DOIUrl":"https://doi.org/10.1145/2872427.2883087","url":null,"abstract":"Recently, the problem of opinion spam has been widespread and has attracted a lot of research attention. While the problem has been approached on a variety of dimensions, the temporal dynamics in which opinion spamming operates is unclear. Are there specific spamming policies that spammers employ? What kind of changes happen with respect to the dynamics to the truthful ratings on entities. How do buffered spamming operate for entities that need spamming to retain threshold popularity and reduced spamming for entities making better success? We analyze these questions in the light of time-series analysis on Yelp. Our analyses discover various temporal patterns and their relationships with the rate at which fake reviews are posted. Building on our analyses, we employ vector autoregression to predict the rate of deception across different spamming policies. Next, we explore the effect of filtered reviews on (long-term and imminent) future rating and popularity prediction of entities. Our results discover novel temporal dynamics of spamming which are intuitive, arguable and also render confidence on Yelp's filtering. Lastly, we leverage our discovered temporal patterns in deception detection. Experimental results on large-scale reviews show the effectiveness of our approach that significantly improves the existing approaches.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82571395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
期刊
Proceedings of the 25th International Conference on World Wide Web
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1