首页 > 最新文献

Proceedings of The Web Conference 2020最新文献

英文 中文
Adaptive Probabilistic Word Embedding 自适应概率词嵌入
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380147
Shuangyin Li, Yu Zhang, Rong Pan, Kaixiang Mo
Word embeddings have been widely used and proven to be effective in many natural language processing and text modeling tasks. It is obvious that one ambiguous word could have very different semantics in various contexts, which is called polysemy. Most existing works aim at generating only one single embedding for each word while a few works build a limited number of embeddings to present different meanings for each word. However, it is hard to determine the exact number of senses for each word as the word meaning is dependent on contexts. To address this problem, we propose a novel Adaptive Probabilistic Word Embedding (APWE) model, where the word polysemy is defined over a latent interpretable semantic space. Specifically, at first each word is represented by an embedding in the latent semantic space and then based on the proposed APWE model, the word embedding can be adaptively adjusted and updated based on different contexts to obtain the tailored word embedding. Empirical comparisons with state-of-the-art models demonstrate the superiority of the proposed APWE model.
词嵌入在许多自然语言处理和文本建模任务中得到了广泛的应用,并被证明是有效的。很明显,一个有歧义的词在不同的语境中可能有非常不同的语义,这被称为一词多义。大多数现有的作品旨在为每个词只生成一个嵌入,而少数作品则构建有限数量的嵌入来表示每个词的不同含义。然而,很难确定每个单词的确切数量,因为单词的含义取决于上下文。为了解决这个问题,我们提出了一种新的自适应概率词嵌入(APWE)模型,该模型在潜在的可解释语义空间上定义词的多义性。具体而言,首先在潜在语义空间中对每个词进行嵌入,然后基于所提出的APWE模型,可以根据不同的上下文自适应调整和更新词嵌入,从而获得量身定制的词嵌入。与最先进的模型进行了实证比较,证明了所提出的APWE模型的优越性。
{"title":"Adaptive Probabilistic Word Embedding","authors":"Shuangyin Li, Yu Zhang, Rong Pan, Kaixiang Mo","doi":"10.1145/3366423.3380147","DOIUrl":"https://doi.org/10.1145/3366423.3380147","url":null,"abstract":"Word embeddings have been widely used and proven to be effective in many natural language processing and text modeling tasks. It is obvious that one ambiguous word could have very different semantics in various contexts, which is called polysemy. Most existing works aim at generating only one single embedding for each word while a few works build a limited number of embeddings to present different meanings for each word. However, it is hard to determine the exact number of senses for each word as the word meaning is dependent on contexts. To address this problem, we propose a novel Adaptive Probabilistic Word Embedding (APWE) model, where the word polysemy is defined over a latent interpretable semantic space. Specifically, at first each word is represented by an embedding in the latent semantic space and then based on the proposed APWE model, the word embedding can be adaptively adjusted and updated based on different contexts to obtain the tailored word embedding. Empirical comparisons with state-of-the-art models demonstrate the superiority of the proposed APWE model.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82217739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
In Opinion Holders’ Shoes: Modeling Cumulative Influence for View Change in Online Argumentation 站在观点持有者的立场:对网络辩论中观点变化的累积影响建模
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380302
Zhen Guo, Zhe Zhang, Munindar P. Singh
Understanding how people change their views during multiparty argumentative discussions is important in applications that involve human communication, e.g., in social media and education. Existing research focuses on lexical features of individual comments, dynamics of discussions, or the personalities of participants but deemphasizes the cumulative influence of the interplay of comments by different participants on a participant’s mindset. We address the task of predicting the points where a user’s view changes given an entire discussion, thereby tackling the confusion due to multiple plausible alternatives when considering the entirety of a discussion. We make the following contributions. (1) Through a human study, we show that modeling a user’s perception of comments is crucial in predicting persuasiveness. (2) We present a sequential model for cumulative influence that captures the interplay between comments as both local and nonlocal dependencies, and demonstrate its capability of selecting the most effective information for changing views. (3) We identify contextual and interactive features and propose sequence structures to incorporate these features. Our empirical evaluation using a Reddit Change My View dataset shows that contextual and interactive features are valuable in predicting view changes, and a sequential model notably outperforms the nonsequential baseline models.
了解人们在多方辩论中如何改变他们的观点在涉及人类交流的应用中是很重要的,例如在社交媒体和教育中。现有的研究侧重于个体评论的词汇特征、讨论的动态或参与者的个性,但不强调不同参与者的评论相互作用对参与者心态的累积影响。我们解决了在整个讨论中预测用户观点变化的点的任务,从而解决了在考虑整个讨论时由于多个似是而非的替代方案而造成的混乱。我们做出以下贡献。(1)通过一项人类研究,我们表明建模用户对评论的感知对于预测说服力至关重要。(2)我们提出了一个累积影响的顺序模型,该模型捕捉了评论之间作为本地和非本地依赖关系的相互作用,并证明了其选择最有效信息以改变观点的能力。(3)我们识别了上下文和交互特征,并提出了包含这些特征的序列结构。我们使用Reddit Change My View数据集进行的实证评估表明,上下文和交互特征在预测视图变化方面是有价值的,并且顺序模型明显优于非顺序基线模型。
{"title":"In Opinion Holders’ Shoes: Modeling Cumulative Influence for View Change in Online Argumentation","authors":"Zhen Guo, Zhe Zhang, Munindar P. Singh","doi":"10.1145/3366423.3380302","DOIUrl":"https://doi.org/10.1145/3366423.3380302","url":null,"abstract":"Understanding how people change their views during multiparty argumentative discussions is important in applications that involve human communication, e.g., in social media and education. Existing research focuses on lexical features of individual comments, dynamics of discussions, or the personalities of participants but deemphasizes the cumulative influence of the interplay of comments by different participants on a participant’s mindset. We address the task of predicting the points where a user’s view changes given an entire discussion, thereby tackling the confusion due to multiple plausible alternatives when considering the entirety of a discussion. We make the following contributions. (1) Through a human study, we show that modeling a user’s perception of comments is crucial in predicting persuasiveness. (2) We present a sequential model for cumulative influence that captures the interplay between comments as both local and nonlocal dependencies, and demonstrate its capability of selecting the most effective information for changing views. (3) We identify contextual and interactive features and propose sequence structures to incorporate these features. Our empirical evaluation using a Reddit Change My View dataset shows that contextual and interactive features are valuable in predicting view changes, and a sequential model notably outperforms the nonsequential baseline models.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82578050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
On the Robustness of Cascade Diffusion under Node Attacks 节点攻击下级联扩散的鲁棒性研究
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380028
Alvis Logins, Yuchen Li, Panagiotis Karras
How can we assess a network’s ability to maintain its functionality under attacks? Network robustness has been studied extensively in the case of deterministic networks. However, applications such as online information diffusion and the behavior of networked public raise a question of robustness in probabilistic networks. We propose three novel robustness measures for networks hosting a diffusion under the Independent Cascade (IC) model, susceptible to node attacks. The outcome of such a process depends on the selection of its initiators, or seeds, by the seeder, as well as on two factors outside the seeder’s discretion: the attack strategy and the probabilistic diffusion outcome. We consider three levels of seeder awareness regarding these two uncontrolled factors, and evaluate the network’s viability aggregated over all possible extents of node attacks. We introduce novel algorithms from building blocks found in previous works to evaluate the proposed measures. A thorough experimental study with synthetic and real, scale-free and homogeneous networks establishes that these algorithms are effective and efficient, while the proposed measures highlight differences among networks in terms of robustness and the surprise they furnish when attacked. Last, we devise a new measure of diffusion entropy that can inform the design of probabilistically robust networks.
我们如何评估网络在攻击下维持其功能的能力?网络鲁棒性在确定性网络中得到了广泛的研究。然而,在线信息扩散和网络公众行为等应用提出了概率网络的鲁棒性问题。我们提出了三种新的鲁棒性措施,用于在独立级联(IC)模型下承载扩散的网络,容易受到节点攻击。这种过程的结果取决于播种者对其启动者或种子的选择,以及播种者自由裁量权之外的两个因素:攻击策略和概率扩散结果。我们考虑了关于这两个不受控制因素的三个层次的种子意识,并评估了在所有可能的节点攻击程度上聚合的网络可行性。我们从以前的工作中发现的构建块中引入新的算法来评估所提出的措施。对合成的、真实的、无标度的和同构的网络进行了彻底的实验研究,证明了这些算法是有效和高效的,而所提出的措施强调了网络之间在鲁棒性和攻击时所提供的惊喜方面的差异。最后,我们设计了一种新的扩散熵度量,可以为概率鲁棒网络的设计提供信息。
{"title":"On the Robustness of Cascade Diffusion under Node Attacks","authors":"Alvis Logins, Yuchen Li, Panagiotis Karras","doi":"10.1145/3366423.3380028","DOIUrl":"https://doi.org/10.1145/3366423.3380028","url":null,"abstract":"How can we assess a network’s ability to maintain its functionality under attacks? Network robustness has been studied extensively in the case of deterministic networks. However, applications such as online information diffusion and the behavior of networked public raise a question of robustness in probabilistic networks. We propose three novel robustness measures for networks hosting a diffusion under the Independent Cascade (IC) model, susceptible to node attacks. The outcome of such a process depends on the selection of its initiators, or seeds, by the seeder, as well as on two factors outside the seeder’s discretion: the attack strategy and the probabilistic diffusion outcome. We consider three levels of seeder awareness regarding these two uncontrolled factors, and evaluate the network’s viability aggregated over all possible extents of node attacks. We introduce novel algorithms from building blocks found in previous works to evaluate the proposed measures. A thorough experimental study with synthetic and real, scale-free and homogeneous networks establishes that these algorithms are effective and efficient, while the proposed measures highlight differences among networks in terms of robustness and the surprise they furnish when attacked. Last, we devise a new measure of diffusion entropy that can inform the design of probabilistically robust networks.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89904291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Asymptotic Behavior of Sequence Models 序列模型的渐近行为
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380044
Flavio Chierichetti, Ravi Kumar, A. Tomkins
In this paper we study the limiting dynamics of a sequential process that generalizes Pólya’s urn. This process has been studied also in the context of language generation, discrete choice, repeat consumption, and models for the web graph. The process we study generates future items by copying from past items. It is parameterized by a sequence of weights describing how much to prefer copying from recent versus more distant locations. We show that, if the weight sequence follows a power law with exponent α ∈ [0, 1), then the sequences generated by the model tend toward a limiting behavior in which the eventual frequency of each token in the alphabet attains a limit. Moreover, in the case α > 2, we show that the sequence converges to a token being chosen infinitely often, and each other token being chosen only constantly many times.
本文研究了一类序列过程的极限动力学,该过程推广了Pólya定律。这一过程也在语言生成、离散选择、重复消费和网络图模型的背景下进行了研究。我们研究的过程是通过复制过去的项目来生成未来的项目。它是通过一系列权重来参数化的,这些权重描述了在多大程度上更喜欢从最近的位置复制而不是从更远的位置复制。我们证明,如果权重序列遵循指数α∈[0,1]的幂律,则模型生成的序列趋向于一种极限行为,即字母表中每个符号的最终频率达到一个极限。此外,在α > 2的情况下,我们证明了该序列收敛于一个令牌被无限次选择,而另一个令牌只被连续多次选择。
{"title":"Asymptotic Behavior of Sequence Models","authors":"Flavio Chierichetti, Ravi Kumar, A. Tomkins","doi":"10.1145/3366423.3380044","DOIUrl":"https://doi.org/10.1145/3366423.3380044","url":null,"abstract":"In this paper we study the limiting dynamics of a sequential process that generalizes Pólya’s urn. This process has been studied also in the context of language generation, discrete choice, repeat consumption, and models for the web graph. The process we study generates future items by copying from past items. It is parameterized by a sequence of weights describing how much to prefer copying from recent versus more distant locations. We show that, if the weight sequence follows a power law with exponent α ∈ [0, 1), then the sequences generated by the model tend toward a limiting behavior in which the eventual frequency of each token in the alphabet attains a limit. Moreover, in the case α > 2, we show that the sequence converges to a token being chosen infinitely often, and each other token being chosen only constantly many times.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89966420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ROSE: Role-based Signed Network Embedding ROSE:基于角色的签名网络嵌入
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380038
Amin Javari, Tyler Derr, Pouya Esmailian, Jiliang Tang, K. Chang
In real-world networks, nodes might have more than one type of relationship. Signed networks are an important class of such networks consisting of two types of relations: positive and negative. Recently, embedding signed networks has attracted increasing attention and is more challenging than classic networks since nodes are connected by paths with multi-types of links. Existing works capture the complex relationships by relying on social theories. However, this approach has major drawbacks, including the incompleteness/inaccurateness of such theories. Thus, we propose network transformation based embedding to address these shortcomings. The core idea is that rather than directly finding the similarities of two nodes from the complex paths connecting them, we can obtain their similarities through simple paths connecting their different roles. We employ this idea to build our proposed embedding technique that can be described in three steps: (1) the input directed signed network is transformed into an unsigned bipartite network with each node mapped to a set of nodes we denote as role-nodes. Each role-node captures a certain role that a node in the original network plays; (2) the network of role-nodes is embedded; and (3) the original network is encoded by aggregating the embedding vectors of role-nodes. Our experiments show the novel proposed technique substantially outperforms existing models.
在现实世界的网络中,节点可能有不止一种类型的关系。签名网络是这类网络中重要的一类,它由两种关系组成:正关系和负关系。近年来,嵌入签名网络越来越受到人们的关注,由于节点之间的连接路径具有多种类型的链路,因此与传统网络相比,嵌入签名网络更具挑战性。现有的作品依靠社会理论来捕捉复杂的关系。然而,这种方法有很大的缺点,包括这种理论的不完整性/不准确性。因此,我们提出基于网络转换的嵌入来解决这些缺点。其核心思想是,不是直接从连接两个节点的复杂路径中寻找它们的相似度,而是通过连接它们不同角色的简单路径获得它们的相似度。我们采用这一思想来构建我们提出的嵌入技术,该技术可以分为三个步骤来描述:(1)将输入有符号网络转换为无符号二部网络,每个节点映射到一组节点,我们将其称为角色节点。每个角色节点捕获原始网络中某个节点所扮演的特定角色;(2)角色节点网络嵌入;(3)对角色节点的嵌入向量进行聚合,对原始网络进行编码。我们的实验表明,新提出的技术实质上优于现有的模型。
{"title":"ROSE: Role-based Signed Network Embedding","authors":"Amin Javari, Tyler Derr, Pouya Esmailian, Jiliang Tang, K. Chang","doi":"10.1145/3366423.3380038","DOIUrl":"https://doi.org/10.1145/3366423.3380038","url":null,"abstract":"In real-world networks, nodes might have more than one type of relationship. Signed networks are an important class of such networks consisting of two types of relations: positive and negative. Recently, embedding signed networks has attracted increasing attention and is more challenging than classic networks since nodes are connected by paths with multi-types of links. Existing works capture the complex relationships by relying on social theories. However, this approach has major drawbacks, including the incompleteness/inaccurateness of such theories. Thus, we propose network transformation based embedding to address these shortcomings. The core idea is that rather than directly finding the similarities of two nodes from the complex paths connecting them, we can obtain their similarities through simple paths connecting their different roles. We employ this idea to build our proposed embedding technique that can be described in three steps: (1) the input directed signed network is transformed into an unsigned bipartite network with each node mapped to a set of nodes we denote as role-nodes. Each role-node captures a certain role that a node in the original network plays; (2) the network of role-nodes is embedded; and (3) the original network is encoded by aggregating the embedding vectors of role-nodes. Our experiments show the novel proposed technique substantially outperforms existing models.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86559587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Apophanies or Epiphanies? How Crawlers Impact Our Understanding of the Web 灵光乍现还是顿悟?爬虫如何影响我们对网络的理解
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380113
Syed Suleman Ahmad, Muhammad Daniyal Dar, Muhammad Fareed Zaffar, N. Vallina-Rodriguez, Rishab Nithyanand
Data generated by web crawlers has formed the basis for much of our current understanding of the Internet. However, not all crawlers are created equal and crawlers generally find themselves trading off between computational overhead, developer effort, data accuracy, and completeness. Therefore, the choice of crawler has a critical impact on the data generated and knowledge inferred from it. In this paper, we conduct a systematic study of the trade-offs presented by different crawlers and the impact that these can have on various types of measurement studies. We make the following contributions: First, we conduct a survey of all research published since 2015 in the premier security and Internet measurement venues to identify and verify the repeatability of crawling methodologies deployed for different problem domains and publication venues. Next, we conduct a qualitative evaluation of a subset of all crawling tools identified in our survey. This evaluation allows us to draw conclusions about the suitability of each tool for specific types of data gathering. Finally, we present a methodology and a measurement framework to empirically highlight the differences between crawlers and how the choice of crawler can impact our understanding of the web.
网络爬虫生成的数据构成了我们目前对互联网理解的基础。然而,并不是所有的爬虫都是平等的,爬虫通常会在计算开销、开发人员工作、数据准确性和完整性之间进行权衡。因此,爬虫的选择对生成的数据和从中推断出的知识有着至关重要的影响。在本文中,我们对不同爬虫所提供的权衡进行了系统研究,并对这些可能对各种类型的测量研究产生的影响进行了研究。我们做出了以下贡献:首先,我们对自2015年以来在主要安全和互联网测量场所发表的所有研究进行了调查,以确定和验证针对不同问题领域和出版场所部署的爬行方法的可重复性。接下来,我们对调查中确定的所有爬行工具的一个子集进行定性评估。这种评估使我们能够得出关于每种工具对特定类型数据收集的适用性的结论。最后,我们提出了一种方法和测量框架,以经验强调爬虫之间的差异,以及选择爬虫如何影响我们对网络的理解。
{"title":"Apophanies or Epiphanies? How Crawlers Impact Our Understanding of the Web","authors":"Syed Suleman Ahmad, Muhammad Daniyal Dar, Muhammad Fareed Zaffar, N. Vallina-Rodriguez, Rishab Nithyanand","doi":"10.1145/3366423.3380113","DOIUrl":"https://doi.org/10.1145/3366423.3380113","url":null,"abstract":"Data generated by web crawlers has formed the basis for much of our current understanding of the Internet. However, not all crawlers are created equal and crawlers generally find themselves trading off between computational overhead, developer effort, data accuracy, and completeness. Therefore, the choice of crawler has a critical impact on the data generated and knowledge inferred from it. In this paper, we conduct a systematic study of the trade-offs presented by different crawlers and the impact that these can have on various types of measurement studies. We make the following contributions: First, we conduct a survey of all research published since 2015 in the premier security and Internet measurement venues to identify and verify the repeatability of crawling methodologies deployed for different problem domains and publication venues. Next, we conduct a qualitative evaluation of a subset of all crawling tools identified in our survey. This evaluation allows us to draw conclusions about the suitability of each tool for specific types of data gathering. Finally, we present a methodology and a measurement framework to empirically highlight the differences between crawlers and how the choice of crawler can impact our understanding of the web.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87521507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Differentially Private Stream Processing for the Semantic Web 语义Web的差分私有流处理
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380265
Daniele Dell'Aglio, A. Bernstein
Data often contains sensitive information, which poses a major obstacle to publishing it. Some suggest to obfuscate the data or only releasing some data statistics. These approaches have, however, been shown to provide insufficient safeguards against de-anonymisation. Recently, differential privacy (DP), an approach that injects noise into the query answers to provide statistical privacy guarantees, has emerged as a solution to release sensitive data. This study investigates how to continuously release privacy-preserving histograms (or distributions) from online streams of sensitive data by combining DP and semantic web technologies. We focus on distributions, as they are the basis for many analytic applications. Specifically, we propose SihlQL, a query language that processes RDF streams in a privacy-preserving fashion. SihlQL builds on top of SPARQL and the w-event DP framework. We show how some peculiarities of w-event privacy constrain the expressiveness of SihlQL queries. Addressing these constraints, we propose an extension of w-event privacy that provides answers to a larger class of queries while preserving their privacy. To evaluate SihlQL, we implemented a prototype engine that compiles queries to Apache Flink topologies and studied its privacy properties using real-world data from an IPTV provider and an online e-commerce web site.
数据通常包含敏感信息,这对发布数据构成了重大障碍。一些人建议混淆数据或只公布部分数据统计。然而,这些方法已被证明不能提供足够的防范去匿名化的措施。差分隐私(differential privacy, DP)是一种向查询答案中注入噪声以提供统计隐私保证的方法,近年来,差分隐私(differential privacy, DP)作为敏感数据泄露的解决方案应运而生。本研究探讨了如何结合DP和语义web技术,从在线敏感数据流中持续释放隐私保护直方图(或分布)。我们关注分布,因为它们是许多分析应用程序的基础。具体来说,我们提出了SihlQL,这是一种以保护隐私的方式处理RDF流的查询语言。SihlQL构建在SPARQL和w-event DP框架之上。我们将展示w-事件隐私的一些特性如何约束SihlQL查询的表达性。为了解决这些限制,我们提出了w-event隐私的扩展,在保留其隐私的同时为更大的查询类提供答案。为了评估SihlQL,我们实现了一个原型引擎,该引擎将查询编译到Apache Flink拓扑,并使用来自IPTV提供商和在线电子商务网站的真实数据研究其隐私属性。
{"title":"Differentially Private Stream Processing for the Semantic Web","authors":"Daniele Dell'Aglio, A. Bernstein","doi":"10.1145/3366423.3380265","DOIUrl":"https://doi.org/10.1145/3366423.3380265","url":null,"abstract":"Data often contains sensitive information, which poses a major obstacle to publishing it. Some suggest to obfuscate the data or only releasing some data statistics. These approaches have, however, been shown to provide insufficient safeguards against de-anonymisation. Recently, differential privacy (DP), an approach that injects noise into the query answers to provide statistical privacy guarantees, has emerged as a solution to release sensitive data. This study investigates how to continuously release privacy-preserving histograms (or distributions) from online streams of sensitive data by combining DP and semantic web technologies. We focus on distributions, as they are the basis for many analytic applications. Specifically, we propose SihlQL, a query language that processes RDF streams in a privacy-preserving fashion. SihlQL builds on top of SPARQL and the w-event DP framework. We show how some peculiarities of w-event privacy constrain the expressiveness of SihlQL queries. Addressing these constraints, we propose an extension of w-event privacy that provides answers to a larger class of queries while preserving their privacy. To evaluate SihlQL, we implemented a prototype engine that compiles queries to Apache Flink topologies and studied its privacy properties using real-world data from an IPTV provider and an online e-commerce web site.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"121 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77440630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
How Do We Create a Fantabulous Password? 我们如何创建一个奇妙的密码?
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380222
Simon S. Woo
Although pronounceability can improve password memorability, most existing password generation approaches have not properly integrated the pronounceability of passwords in their designs. In this work, we demonstrate several shortfalls of current pronounceable password generation approaches, and then propose, ProSemPass, a new method of generating passwords that are pronounceable and semantically meaningful. In our approach, users supply initial input words and our system improves the pronounceability and meaning of the user-provided words by automatically creating a portmanteau. To measure the strength of our approach, we use attacker models, where attackers have complete knowledge of our password generation algorithms. We measure strength in guess numbers and compare those with other existing password generation approaches. Using a large-scale IRB-approved user study with 1,563 Amazon MTurkers over 9 different conditions, our approach achieves a 30% higher recall than those from current pronounceable password approaches, and is stronger than the offline guessing attack limit.
虽然可发音性可以提高密码的可记忆性,但现有的大多数密码生成方法都没有在设计中适当地考虑密码的可发音性。在这项工作中,我们展示了当前可发音密码生成方法的几个不足,然后提出了ProSemPass,一种生成可发音和语义有意义的密码的新方法。在我们的方法中,用户提供初始输入单词,我们的系统通过自动创建一个组合来提高用户提供的单词的发音和含义。为了衡量我们方法的强度,我们使用攻击者模型,攻击者完全了解我们的密码生成算法。我们测量猜测数字的强度,并将其与其他现有的密码生成方法进行比较。通过对1563名亚马逊MTurkers在9种不同条件下进行的大规模irb批准的用户研究,我们的方法比当前可发音密码方法的召回率高出30%,并且比离线猜测攻击限制更强。
{"title":"How Do We Create a Fantabulous Password?","authors":"Simon S. Woo","doi":"10.1145/3366423.3380222","DOIUrl":"https://doi.org/10.1145/3366423.3380222","url":null,"abstract":"Although pronounceability can improve password memorability, most existing password generation approaches have not properly integrated the pronounceability of passwords in their designs. In this work, we demonstrate several shortfalls of current pronounceable password generation approaches, and then propose, ProSemPass, a new method of generating passwords that are pronounceable and semantically meaningful. In our approach, users supply initial input words and our system improves the pronounceability and meaning of the user-provided words by automatically creating a portmanteau. To measure the strength of our approach, we use attacker models, where attackers have complete knowledge of our password generation algorithms. We measure strength in guess numbers and compare those with other existing password generation approaches. Using a large-scale IRB-approved user study with 1,563 Amazon MTurkers over 9 different conditions, our approach achieves a 30% higher recall than those from current pronounceable password approaches, and is stronger than the offline guessing attack limit.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74272261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Representativeness of Automated Web Crawls as a Surrogate for Human Browsing 自动网络爬虫作为人类浏览代理的代表性
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380104
David Zeber, Sarah Bird, Camila Oliveira, Walter Rudametkin, I. Segall, Fredrik Wollsén, M. Lopatka
Large-scale Web crawls have emerged as the state of the art for studying characteristics of the Web. In particular, they are a core tool for online tracking research. Web crawling is an attractive approach to data collection, as crawls can be run at relatively low infrastructure cost and don’t require handling sensitive user data such as browsing histories. However, the biases introduced by using crawls as a proxy for human browsing data have not been well studied. Crawls may fail to capture the diversity of user environments, and the snapshot view of the Web presented by one-time crawls does not reflect its constantly evolving nature, which hinders reproducibility of crawl-based studies. In this paper, we quantify the repeatability and representativeness of Web crawls in terms of common tracking and fingerprinting metrics, considering both variation across crawls and divergence from human browser usage. We quantify baseline variation of simultaneous crawls, then isolate the effects of time, cloud IP address vs. residential, and operating system. This provides a foundation to assess the agreement between crawls visiting a standard list of high-traffic websites and actual browsing behaviour measured from an opt-in sample of over 50,000 users of the Firefox Web browser. Our analysis reveals differences between the treatment of stateless crawling infrastructure and generally stateful human browsing, showing, for example, that crawlers tend to experience higher rates of third-party activity than human browser users on loading pages from the same domains.
大规模网络爬虫已经成为研究网络特征的最新技术。特别是,它们是在线跟踪研究的核心工具。Web爬行是一种有吸引力的数据收集方法,因为爬行可以以相对较低的基础设施成本运行,并且不需要处理浏览历史等敏感用户数据。然而,使用爬虫作为人类浏览数据的代理所带来的偏见还没有得到很好的研究。爬虫可能无法捕捉到用户环境的多样性,并且一次性爬虫所呈现的Web快照视图不能反映其不断发展的本质,这阻碍了基于爬虫的研究的可重复性。在本文中,我们根据常见的跟踪和指纹指标量化了网络爬虫的可重复性和代表性,同时考虑了爬虫之间的差异以及与人类浏览器使用的差异。我们量化了同时抓取的基线变化,然后隔离了时间、云IP地址与住宅和操作系统的影响。这为评估爬行程序访问高流量网站的标准列表和实际浏览行为之间的一致性提供了基础,这些行为是从超过50,000名Firefox Web浏览器用户的选择样本中测量出来的。我们的分析揭示了处理无状态爬行基础设施和一般有状态人类浏览之间的差异,例如,在从相同域加载页面时,爬行程序往往比人类浏览器用户体验到更高的第三方活动率。
{"title":"The Representativeness of Automated Web Crawls as a Surrogate for Human Browsing","authors":"David Zeber, Sarah Bird, Camila Oliveira, Walter Rudametkin, I. Segall, Fredrik Wollsén, M. Lopatka","doi":"10.1145/3366423.3380104","DOIUrl":"https://doi.org/10.1145/3366423.3380104","url":null,"abstract":"Large-scale Web crawls have emerged as the state of the art for studying characteristics of the Web. In particular, they are a core tool for online tracking research. Web crawling is an attractive approach to data collection, as crawls can be run at relatively low infrastructure cost and don’t require handling sensitive user data such as browsing histories. However, the biases introduced by using crawls as a proxy for human browsing data have not been well studied. Crawls may fail to capture the diversity of user environments, and the snapshot view of the Web presented by one-time crawls does not reflect its constantly evolving nature, which hinders reproducibility of crawl-based studies. In this paper, we quantify the repeatability and representativeness of Web crawls in terms of common tracking and fingerprinting metrics, considering both variation across crawls and divergence from human browser usage. We quantify baseline variation of simultaneous crawls, then isolate the effects of time, cloud IP address vs. residential, and operating system. This provides a foundation to assess the agreement between crawls visiting a standard list of high-traffic websites and actual browsing behaviour measured from an opt-in sample of over 50,000 users of the Firefox Web browser. Our analysis reveals differences between the treatment of stateless crawling infrastructure and generally stateful human browsing, showing, for example, that crawlers tend to experience higher rates of third-party activity than human browser users on loading pages from the same domains.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80370832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Next Point-of-Interest Recommendation on Resource-Constrained Mobile Devices 资源受限移动设备的下一个兴趣点推荐
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380170
Qinyong Wang, Hongzhi Yin, Tong Chen, Zi Huang, Hao Wang, Yanchang Zhao, Nguyen Quoc Viet Hung
In the modern tourism industry, next point-of-interest (POI) recommendation is an important mobile service as it effectively aids hesitating travelers to decide the next POI to visit. Currently, most next POI recommender systems are built upon a cloud-based paradigm, where the recommendation models are trained and deployed on the powerful cloud servers. When a recommendation request is made by a user via mobile devices, the current contextual information will be uploaded to the cloud servers to help the well-trained models generate personalized recommendation results. However, in reality, this paradigm heavily relies on high-quality network connectivity, and is subject to high energy footprint in the operation and increasing privacy concerns among the public. To bypass these defects, we propose a novel Light Location Recommender System (LLRec) to perform next POI recommendation locally on resource-constrained mobile devices. To make LLRec fully compatible with the limited computing resources and memory space, we leverage FastGRNN, a lightweight but effective gated Recurrent Neural Network (RNN) as its main building block, and significantly compress the model size by adopting the tensor-train composition in the embedding layer. As a compact model, LLRec maintains its robustness via an innovative teacher-student training framework, where a powerful teacher model is trained on the cloud to learn essential knowledge from available contextual data, and the simplified student model LLRec is trained under the guidance of the teacher model. The final LLRec is downloaded and deployed on users’ mobile devices to generate accurate recommendations solely utilizing users’ local data. As a result, LLRec significantly reduces the dependency on cloud servers, thus allowing for next POI recommendation in a stable, cost-effective and secure way. Extensive experiments on two large-scale recommendation datasets further demonstrate the superiority of our proposed solution.
在现代旅游业中,下一个兴趣点(POI)推荐是一项重要的移动服务,因为它有效地帮助犹豫不决的旅行者决定下一个访问的POI。目前,大多数next POI推荐系统都是建立在基于云的范例之上的,其中推荐模型是在强大的云服务器上训练和部署的。当用户通过移动设备提出推荐请求时,当前的上下文信息将被上传到云服务器,以帮助训练有素的模型生成个性化的推荐结果。然而,在现实中,这种模式严重依赖于高质量的网络连接,并且在运行中受到高能源足迹和公众日益增加的隐私担忧的影响。为了绕过这些缺陷,我们提出了一种新的光位置推荐系统(LLRec),在资源受限的移动设备上本地执行下一个POI推荐。为了使LLRec完全兼容有限的计算资源和内存空间,我们利用FastGRNN(一种轻量级但有效的门控递归神经网络(RNN))作为其主要构建块,并通过在嵌入层中采用张量-训练组合来显著压缩模型大小。作为一个紧凑的模型,LLRec通过创新的师生培训框架来保持其鲁棒性,其中一个强大的教师模型在云上进行培训,从可用的上下文数据中学习必要的知识,而简化的学生模型LLRec在教师模型的指导下进行培训。最终的LLRec被下载并部署到用户的移动设备上,仅利用用户的本地数据生成准确的推荐。因此,LLRec显著减少了对云服务器的依赖,从而允许以稳定、经济、安全的方式推荐下一个POI。在两个大规模推荐数据集上的大量实验进一步证明了我们提出的解决方案的优越性。
{"title":"Next Point-of-Interest Recommendation on Resource-Constrained Mobile Devices","authors":"Qinyong Wang, Hongzhi Yin, Tong Chen, Zi Huang, Hao Wang, Yanchang Zhao, Nguyen Quoc Viet Hung","doi":"10.1145/3366423.3380170","DOIUrl":"https://doi.org/10.1145/3366423.3380170","url":null,"abstract":"In the modern tourism industry, next point-of-interest (POI) recommendation is an important mobile service as it effectively aids hesitating travelers to decide the next POI to visit. Currently, most next POI recommender systems are built upon a cloud-based paradigm, where the recommendation models are trained and deployed on the powerful cloud servers. When a recommendation request is made by a user via mobile devices, the current contextual information will be uploaded to the cloud servers to help the well-trained models generate personalized recommendation results. However, in reality, this paradigm heavily relies on high-quality network connectivity, and is subject to high energy footprint in the operation and increasing privacy concerns among the public. To bypass these defects, we propose a novel Light Location Recommender System (LLRec) to perform next POI recommendation locally on resource-constrained mobile devices. To make LLRec fully compatible with the limited computing resources and memory space, we leverage FastGRNN, a lightweight but effective gated Recurrent Neural Network (RNN) as its main building block, and significantly compress the model size by adopting the tensor-train composition in the embedding layer. As a compact model, LLRec maintains its robustness via an innovative teacher-student training framework, where a powerful teacher model is trained on the cloud to learn essential knowledge from available contextual data, and the simplified student model LLRec is trained under the guidance of the teacher model. The final LLRec is downloaded and deployed on users’ mobile devices to generate accurate recommendations solely utilizing users’ local data. As a result, LLRec significantly reduces the dependency on cloud servers, thus allowing for next POI recommendation in a stable, cost-effective and secure way. Extensive experiments on two large-scale recommendation datasets further demonstrate the superiority of our proposed solution.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"83 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82371558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
期刊
Proceedings of The Web Conference 2020
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1