首页 > 最新文献

Proceedings of The Web Conference 2020最新文献

英文 中文
Adaptive Probabilistic Word Embedding 自适应概率词嵌入
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380147
Shuangyin Li, Yu Zhang, Rong Pan, Kaixiang Mo
Word embeddings have been widely used and proven to be effective in many natural language processing and text modeling tasks. It is obvious that one ambiguous word could have very different semantics in various contexts, which is called polysemy. Most existing works aim at generating only one single embedding for each word while a few works build a limited number of embeddings to present different meanings for each word. However, it is hard to determine the exact number of senses for each word as the word meaning is dependent on contexts. To address this problem, we propose a novel Adaptive Probabilistic Word Embedding (APWE) model, where the word polysemy is defined over a latent interpretable semantic space. Specifically, at first each word is represented by an embedding in the latent semantic space and then based on the proposed APWE model, the word embedding can be adaptively adjusted and updated based on different contexts to obtain the tailored word embedding. Empirical comparisons with state-of-the-art models demonstrate the superiority of the proposed APWE model.
词嵌入在许多自然语言处理和文本建模任务中得到了广泛的应用,并被证明是有效的。很明显,一个有歧义的词在不同的语境中可能有非常不同的语义,这被称为一词多义。大多数现有的作品旨在为每个词只生成一个嵌入,而少数作品则构建有限数量的嵌入来表示每个词的不同含义。然而,很难确定每个单词的确切数量,因为单词的含义取决于上下文。为了解决这个问题,我们提出了一种新的自适应概率词嵌入(APWE)模型,该模型在潜在的可解释语义空间上定义词的多义性。具体而言,首先在潜在语义空间中对每个词进行嵌入,然后基于所提出的APWE模型,可以根据不同的上下文自适应调整和更新词嵌入,从而获得量身定制的词嵌入。与最先进的模型进行了实证比较,证明了所提出的APWE模型的优越性。
{"title":"Adaptive Probabilistic Word Embedding","authors":"Shuangyin Li, Yu Zhang, Rong Pan, Kaixiang Mo","doi":"10.1145/3366423.3380147","DOIUrl":"https://doi.org/10.1145/3366423.3380147","url":null,"abstract":"Word embeddings have been widely used and proven to be effective in many natural language processing and text modeling tasks. It is obvious that one ambiguous word could have very different semantics in various contexts, which is called polysemy. Most existing works aim at generating only one single embedding for each word while a few works build a limited number of embeddings to present different meanings for each word. However, it is hard to determine the exact number of senses for each word as the word meaning is dependent on contexts. To address this problem, we propose a novel Adaptive Probabilistic Word Embedding (APWE) model, where the word polysemy is defined over a latent interpretable semantic space. Specifically, at first each word is represented by an embedding in the latent semantic space and then based on the proposed APWE model, the word embedding can be adaptively adjusted and updated based on different contexts to obtain the tailored word embedding. Empirical comparisons with state-of-the-art models demonstrate the superiority of the proposed APWE model.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82217739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
In Opinion Holders’ Shoes: Modeling Cumulative Influence for View Change in Online Argumentation 站在观点持有者的立场:对网络辩论中观点变化的累积影响建模
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380302
Zhen Guo, Zhe Zhang, Munindar P. Singh
Understanding how people change their views during multiparty argumentative discussions is important in applications that involve human communication, e.g., in social media and education. Existing research focuses on lexical features of individual comments, dynamics of discussions, or the personalities of participants but deemphasizes the cumulative influence of the interplay of comments by different participants on a participant’s mindset. We address the task of predicting the points where a user’s view changes given an entire discussion, thereby tackling the confusion due to multiple plausible alternatives when considering the entirety of a discussion. We make the following contributions. (1) Through a human study, we show that modeling a user’s perception of comments is crucial in predicting persuasiveness. (2) We present a sequential model for cumulative influence that captures the interplay between comments as both local and nonlocal dependencies, and demonstrate its capability of selecting the most effective information for changing views. (3) We identify contextual and interactive features and propose sequence structures to incorporate these features. Our empirical evaluation using a Reddit Change My View dataset shows that contextual and interactive features are valuable in predicting view changes, and a sequential model notably outperforms the nonsequential baseline models.
了解人们在多方辩论中如何改变他们的观点在涉及人类交流的应用中是很重要的,例如在社交媒体和教育中。现有的研究侧重于个体评论的词汇特征、讨论的动态或参与者的个性,但不强调不同参与者的评论相互作用对参与者心态的累积影响。我们解决了在整个讨论中预测用户观点变化的点的任务,从而解决了在考虑整个讨论时由于多个似是而非的替代方案而造成的混乱。我们做出以下贡献。(1)通过一项人类研究,我们表明建模用户对评论的感知对于预测说服力至关重要。(2)我们提出了一个累积影响的顺序模型,该模型捕捉了评论之间作为本地和非本地依赖关系的相互作用,并证明了其选择最有效信息以改变观点的能力。(3)我们识别了上下文和交互特征,并提出了包含这些特征的序列结构。我们使用Reddit Change My View数据集进行的实证评估表明,上下文和交互特征在预测视图变化方面是有价值的,并且顺序模型明显优于非顺序基线模型。
{"title":"In Opinion Holders’ Shoes: Modeling Cumulative Influence for View Change in Online Argumentation","authors":"Zhen Guo, Zhe Zhang, Munindar P. Singh","doi":"10.1145/3366423.3380302","DOIUrl":"https://doi.org/10.1145/3366423.3380302","url":null,"abstract":"Understanding how people change their views during multiparty argumentative discussions is important in applications that involve human communication, e.g., in social media and education. Existing research focuses on lexical features of individual comments, dynamics of discussions, or the personalities of participants but deemphasizes the cumulative influence of the interplay of comments by different participants on a participant’s mindset. We address the task of predicting the points where a user’s view changes given an entire discussion, thereby tackling the confusion due to multiple plausible alternatives when considering the entirety of a discussion. We make the following contributions. (1) Through a human study, we show that modeling a user’s perception of comments is crucial in predicting persuasiveness. (2) We present a sequential model for cumulative influence that captures the interplay between comments as both local and nonlocal dependencies, and demonstrate its capability of selecting the most effective information for changing views. (3) We identify contextual and interactive features and propose sequence structures to incorporate these features. Our empirical evaluation using a Reddit Change My View dataset shows that contextual and interactive features are valuable in predicting view changes, and a sequential model notably outperforms the nonsequential baseline models.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82578050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
On the Robustness of Cascade Diffusion under Node Attacks 节点攻击下级联扩散的鲁棒性研究
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380028
Alvis Logins, Yuchen Li, Panagiotis Karras
How can we assess a network’s ability to maintain its functionality under attacks? Network robustness has been studied extensively in the case of deterministic networks. However, applications such as online information diffusion and the behavior of networked public raise a question of robustness in probabilistic networks. We propose three novel robustness measures for networks hosting a diffusion under the Independent Cascade (IC) model, susceptible to node attacks. The outcome of such a process depends on the selection of its initiators, or seeds, by the seeder, as well as on two factors outside the seeder’s discretion: the attack strategy and the probabilistic diffusion outcome. We consider three levels of seeder awareness regarding these two uncontrolled factors, and evaluate the network’s viability aggregated over all possible extents of node attacks. We introduce novel algorithms from building blocks found in previous works to evaluate the proposed measures. A thorough experimental study with synthetic and real, scale-free and homogeneous networks establishes that these algorithms are effective and efficient, while the proposed measures highlight differences among networks in terms of robustness and the surprise they furnish when attacked. Last, we devise a new measure of diffusion entropy that can inform the design of probabilistically robust networks.
我们如何评估网络在攻击下维持其功能的能力?网络鲁棒性在确定性网络中得到了广泛的研究。然而,在线信息扩散和网络公众行为等应用提出了概率网络的鲁棒性问题。我们提出了三种新的鲁棒性措施,用于在独立级联(IC)模型下承载扩散的网络,容易受到节点攻击。这种过程的结果取决于播种者对其启动者或种子的选择,以及播种者自由裁量权之外的两个因素:攻击策略和概率扩散结果。我们考虑了关于这两个不受控制因素的三个层次的种子意识,并评估了在所有可能的节点攻击程度上聚合的网络可行性。我们从以前的工作中发现的构建块中引入新的算法来评估所提出的措施。对合成的、真实的、无标度的和同构的网络进行了彻底的实验研究,证明了这些算法是有效和高效的,而所提出的措施强调了网络之间在鲁棒性和攻击时所提供的惊喜方面的差异。最后,我们设计了一种新的扩散熵度量,可以为概率鲁棒网络的设计提供信息。
{"title":"On the Robustness of Cascade Diffusion under Node Attacks","authors":"Alvis Logins, Yuchen Li, Panagiotis Karras","doi":"10.1145/3366423.3380028","DOIUrl":"https://doi.org/10.1145/3366423.3380028","url":null,"abstract":"How can we assess a network’s ability to maintain its functionality under attacks? Network robustness has been studied extensively in the case of deterministic networks. However, applications such as online information diffusion and the behavior of networked public raise a question of robustness in probabilistic networks. We propose three novel robustness measures for networks hosting a diffusion under the Independent Cascade (IC) model, susceptible to node attacks. The outcome of such a process depends on the selection of its initiators, or seeds, by the seeder, as well as on two factors outside the seeder’s discretion: the attack strategy and the probabilistic diffusion outcome. We consider three levels of seeder awareness regarding these two uncontrolled factors, and evaluate the network’s viability aggregated over all possible extents of node attacks. We introduce novel algorithms from building blocks found in previous works to evaluate the proposed measures. A thorough experimental study with synthetic and real, scale-free and homogeneous networks establishes that these algorithms are effective and efficient, while the proposed measures highlight differences among networks in terms of robustness and the surprise they furnish when attacked. Last, we devise a new measure of diffusion entropy that can inform the design of probabilistically robust networks.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89904291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Asymptotic Behavior of Sequence Models 序列模型的渐近行为
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380044
Flavio Chierichetti, Ravi Kumar, A. Tomkins
In this paper we study the limiting dynamics of a sequential process that generalizes Pólya’s urn. This process has been studied also in the context of language generation, discrete choice, repeat consumption, and models for the web graph. The process we study generates future items by copying from past items. It is parameterized by a sequence of weights describing how much to prefer copying from recent versus more distant locations. We show that, if the weight sequence follows a power law with exponent α ∈ [0, 1), then the sequences generated by the model tend toward a limiting behavior in which the eventual frequency of each token in the alphabet attains a limit. Moreover, in the case α > 2, we show that the sequence converges to a token being chosen infinitely often, and each other token being chosen only constantly many times.
本文研究了一类序列过程的极限动力学,该过程推广了Pólya定律。这一过程也在语言生成、离散选择、重复消费和网络图模型的背景下进行了研究。我们研究的过程是通过复制过去的项目来生成未来的项目。它是通过一系列权重来参数化的,这些权重描述了在多大程度上更喜欢从最近的位置复制而不是从更远的位置复制。我们证明,如果权重序列遵循指数α∈[0,1]的幂律,则模型生成的序列趋向于一种极限行为,即字母表中每个符号的最终频率达到一个极限。此外,在α > 2的情况下,我们证明了该序列收敛于一个令牌被无限次选择,而另一个令牌只被连续多次选择。
{"title":"Asymptotic Behavior of Sequence Models","authors":"Flavio Chierichetti, Ravi Kumar, A. Tomkins","doi":"10.1145/3366423.3380044","DOIUrl":"https://doi.org/10.1145/3366423.3380044","url":null,"abstract":"In this paper we study the limiting dynamics of a sequential process that generalizes Pólya’s urn. This process has been studied also in the context of language generation, discrete choice, repeat consumption, and models for the web graph. The process we study generates future items by copying from past items. It is parameterized by a sequence of weights describing how much to prefer copying from recent versus more distant locations. We show that, if the weight sequence follows a power law with exponent α ∈ [0, 1), then the sequences generated by the model tend toward a limiting behavior in which the eventual frequency of each token in the alphabet attains a limit. Moreover, in the case α > 2, we show that the sequence converges to a token being chosen infinitely often, and each other token being chosen only constantly many times.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89966420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ROSE: Role-based Signed Network Embedding ROSE:基于角色的签名网络嵌入
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380038
Amin Javari, Tyler Derr, Pouya Esmailian, Jiliang Tang, K. Chang
In real-world networks, nodes might have more than one type of relationship. Signed networks are an important class of such networks consisting of two types of relations: positive and negative. Recently, embedding signed networks has attracted increasing attention and is more challenging than classic networks since nodes are connected by paths with multi-types of links. Existing works capture the complex relationships by relying on social theories. However, this approach has major drawbacks, including the incompleteness/inaccurateness of such theories. Thus, we propose network transformation based embedding to address these shortcomings. The core idea is that rather than directly finding the similarities of two nodes from the complex paths connecting them, we can obtain their similarities through simple paths connecting their different roles. We employ this idea to build our proposed embedding technique that can be described in three steps: (1) the input directed signed network is transformed into an unsigned bipartite network with each node mapped to a set of nodes we denote as role-nodes. Each role-node captures a certain role that a node in the original network plays; (2) the network of role-nodes is embedded; and (3) the original network is encoded by aggregating the embedding vectors of role-nodes. Our experiments show the novel proposed technique substantially outperforms existing models.
在现实世界的网络中,节点可能有不止一种类型的关系。签名网络是这类网络中重要的一类,它由两种关系组成:正关系和负关系。近年来,嵌入签名网络越来越受到人们的关注,由于节点之间的连接路径具有多种类型的链路,因此与传统网络相比,嵌入签名网络更具挑战性。现有的作品依靠社会理论来捕捉复杂的关系。然而,这种方法有很大的缺点,包括这种理论的不完整性/不准确性。因此,我们提出基于网络转换的嵌入来解决这些缺点。其核心思想是,不是直接从连接两个节点的复杂路径中寻找它们的相似度,而是通过连接它们不同角色的简单路径获得它们的相似度。我们采用这一思想来构建我们提出的嵌入技术,该技术可以分为三个步骤来描述:(1)将输入有符号网络转换为无符号二部网络,每个节点映射到一组节点,我们将其称为角色节点。每个角色节点捕获原始网络中某个节点所扮演的特定角色;(2)角色节点网络嵌入;(3)对角色节点的嵌入向量进行聚合,对原始网络进行编码。我们的实验表明,新提出的技术实质上优于现有的模型。
{"title":"ROSE: Role-based Signed Network Embedding","authors":"Amin Javari, Tyler Derr, Pouya Esmailian, Jiliang Tang, K. Chang","doi":"10.1145/3366423.3380038","DOIUrl":"https://doi.org/10.1145/3366423.3380038","url":null,"abstract":"In real-world networks, nodes might have more than one type of relationship. Signed networks are an important class of such networks consisting of two types of relations: positive and negative. Recently, embedding signed networks has attracted increasing attention and is more challenging than classic networks since nodes are connected by paths with multi-types of links. Existing works capture the complex relationships by relying on social theories. However, this approach has major drawbacks, including the incompleteness/inaccurateness of such theories. Thus, we propose network transformation based embedding to address these shortcomings. The core idea is that rather than directly finding the similarities of two nodes from the complex paths connecting them, we can obtain their similarities through simple paths connecting their different roles. We employ this idea to build our proposed embedding technique that can be described in three steps: (1) the input directed signed network is transformed into an unsigned bipartite network with each node mapped to a set of nodes we denote as role-nodes. Each role-node captures a certain role that a node in the original network plays; (2) the network of role-nodes is embedded; and (3) the original network is encoded by aggregating the embedding vectors of role-nodes. Our experiments show the novel proposed technique substantially outperforms existing models.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86559587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Apophanies or Epiphanies? How Crawlers Impact Our Understanding of the Web 灵光乍现还是顿悟?爬虫如何影响我们对网络的理解
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380113
Syed Suleman Ahmad, Muhammad Daniyal Dar, Muhammad Fareed Zaffar, N. Vallina-Rodriguez, Rishab Nithyanand
Data generated by web crawlers has formed the basis for much of our current understanding of the Internet. However, not all crawlers are created equal and crawlers generally find themselves trading off between computational overhead, developer effort, data accuracy, and completeness. Therefore, the choice of crawler has a critical impact on the data generated and knowledge inferred from it. In this paper, we conduct a systematic study of the trade-offs presented by different crawlers and the impact that these can have on various types of measurement studies. We make the following contributions: First, we conduct a survey of all research published since 2015 in the premier security and Internet measurement venues to identify and verify the repeatability of crawling methodologies deployed for different problem domains and publication venues. Next, we conduct a qualitative evaluation of a subset of all crawling tools identified in our survey. This evaluation allows us to draw conclusions about the suitability of each tool for specific types of data gathering. Finally, we present a methodology and a measurement framework to empirically highlight the differences between crawlers and how the choice of crawler can impact our understanding of the web.
网络爬虫生成的数据构成了我们目前对互联网理解的基础。然而,并不是所有的爬虫都是平等的,爬虫通常会在计算开销、开发人员工作、数据准确性和完整性之间进行权衡。因此,爬虫的选择对生成的数据和从中推断出的知识有着至关重要的影响。在本文中,我们对不同爬虫所提供的权衡进行了系统研究,并对这些可能对各种类型的测量研究产生的影响进行了研究。我们做出了以下贡献:首先,我们对自2015年以来在主要安全和互联网测量场所发表的所有研究进行了调查,以确定和验证针对不同问题领域和出版场所部署的爬行方法的可重复性。接下来,我们对调查中确定的所有爬行工具的一个子集进行定性评估。这种评估使我们能够得出关于每种工具对特定类型数据收集的适用性的结论。最后,我们提出了一种方法和测量框架,以经验强调爬虫之间的差异,以及选择爬虫如何影响我们对网络的理解。
{"title":"Apophanies or Epiphanies? How Crawlers Impact Our Understanding of the Web","authors":"Syed Suleman Ahmad, Muhammad Daniyal Dar, Muhammad Fareed Zaffar, N. Vallina-Rodriguez, Rishab Nithyanand","doi":"10.1145/3366423.3380113","DOIUrl":"https://doi.org/10.1145/3366423.3380113","url":null,"abstract":"Data generated by web crawlers has formed the basis for much of our current understanding of the Internet. However, not all crawlers are created equal and crawlers generally find themselves trading off between computational overhead, developer effort, data accuracy, and completeness. Therefore, the choice of crawler has a critical impact on the data generated and knowledge inferred from it. In this paper, we conduct a systematic study of the trade-offs presented by different crawlers and the impact that these can have on various types of measurement studies. We make the following contributions: First, we conduct a survey of all research published since 2015 in the premier security and Internet measurement venues to identify and verify the repeatability of crawling methodologies deployed for different problem domains and publication venues. Next, we conduct a qualitative evaluation of a subset of all crawling tools identified in our survey. This evaluation allows us to draw conclusions about the suitability of each tool for specific types of data gathering. Finally, we present a methodology and a measurement framework to empirically highlight the differences between crawlers and how the choice of crawler can impact our understanding of the web.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87521507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Differentially Private Stream Processing for the Semantic Web 语义Web的差分私有流处理
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380265
Daniele Dell'Aglio, A. Bernstein
Data often contains sensitive information, which poses a major obstacle to publishing it. Some suggest to obfuscate the data or only releasing some data statistics. These approaches have, however, been shown to provide insufficient safeguards against de-anonymisation. Recently, differential privacy (DP), an approach that injects noise into the query answers to provide statistical privacy guarantees, has emerged as a solution to release sensitive data. This study investigates how to continuously release privacy-preserving histograms (or distributions) from online streams of sensitive data by combining DP and semantic web technologies. We focus on distributions, as they are the basis for many analytic applications. Specifically, we propose SihlQL, a query language that processes RDF streams in a privacy-preserving fashion. SihlQL builds on top of SPARQL and the w-event DP framework. We show how some peculiarities of w-event privacy constrain the expressiveness of SihlQL queries. Addressing these constraints, we propose an extension of w-event privacy that provides answers to a larger class of queries while preserving their privacy. To evaluate SihlQL, we implemented a prototype engine that compiles queries to Apache Flink topologies and studied its privacy properties using real-world data from an IPTV provider and an online e-commerce web site.
数据通常包含敏感信息,这对发布数据构成了重大障碍。一些人建议混淆数据或只公布部分数据统计。然而,这些方法已被证明不能提供足够的防范去匿名化的措施。差分隐私(differential privacy, DP)是一种向查询答案中注入噪声以提供统计隐私保证的方法,近年来,差分隐私(differential privacy, DP)作为敏感数据泄露的解决方案应运而生。本研究探讨了如何结合DP和语义web技术,从在线敏感数据流中持续释放隐私保护直方图(或分布)。我们关注分布,因为它们是许多分析应用程序的基础。具体来说,我们提出了SihlQL,这是一种以保护隐私的方式处理RDF流的查询语言。SihlQL构建在SPARQL和w-event DP框架之上。我们将展示w-事件隐私的一些特性如何约束SihlQL查询的表达性。为了解决这些限制,我们提出了w-event隐私的扩展,在保留其隐私的同时为更大的查询类提供答案。为了评估SihlQL,我们实现了一个原型引擎,该引擎将查询编译到Apache Flink拓扑,并使用来自IPTV提供商和在线电子商务网站的真实数据研究其隐私属性。
{"title":"Differentially Private Stream Processing for the Semantic Web","authors":"Daniele Dell'Aglio, A. Bernstein","doi":"10.1145/3366423.3380265","DOIUrl":"https://doi.org/10.1145/3366423.3380265","url":null,"abstract":"Data often contains sensitive information, which poses a major obstacle to publishing it. Some suggest to obfuscate the data or only releasing some data statistics. These approaches have, however, been shown to provide insufficient safeguards against de-anonymisation. Recently, differential privacy (DP), an approach that injects noise into the query answers to provide statistical privacy guarantees, has emerged as a solution to release sensitive data. This study investigates how to continuously release privacy-preserving histograms (or distributions) from online streams of sensitive data by combining DP and semantic web technologies. We focus on distributions, as they are the basis for many analytic applications. Specifically, we propose SihlQL, a query language that processes RDF streams in a privacy-preserving fashion. SihlQL builds on top of SPARQL and the w-event DP framework. We show how some peculiarities of w-event privacy constrain the expressiveness of SihlQL queries. Addressing these constraints, we propose an extension of w-event privacy that provides answers to a larger class of queries while preserving their privacy. To evaluate SihlQL, we implemented a prototype engine that compiles queries to Apache Flink topologies and studied its privacy properties using real-world data from an IPTV provider and an online e-commerce web site.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"121 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77440630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Structure of Social Influence in Recommender Networks 推荐网络中的社会影响结构
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380020
P. Analytis, D. Barkoczi, Philipp Lorenz-Spreen, Stefan M. Herzog
People’s ability to influence others’ opinion on matters of taste varies greatly—both offline and in recommender systems. What are the mechanisms underlying these striking differences? Using the weighted k-nearest neighbors algorithm (k-nn) to represent an array of social learning strategies, we show—leveraging methods from network science—how the k-nn algorithm gives rise to networks of social influence in six real-world domains of taste. We show three novel results that apply both to offline advice taking and online recommender settings. First, influential individuals have mainstream tastes and high dispersion in their taste similarity with others. Second, the fewer people an individual or algorithm consults (i.e., the lower k is) or the larger the weight placed on the opinions of more similar others, the smaller the group of people with substantial influence. Third, the influence networks emerging from deploying the k-nn algorithm are hierarchically organized. Our results shed new light on classic empirical findings in communication and network science and can help improve the understanding of social influence offline and online.
无论是在线下还是在推荐系统中,人们在品味问题上影响他人意见的能力差别很大。这些显著差异背后的机制是什么?使用加权k近邻算法(k-nn)来表示一系列社会学习策略,我们展示了-利用网络科学的方法- k-nn算法如何在六个现实世界的品味领域中产生社会影响网络。我们展示了三个新结果,它们既适用于离线建议获取,也适用于在线推荐设置。首先,有影响力的个人具有主流品味,与他人的品味相似度高度分散。其次,个人或算法咨询的人越少(即k越低),或者对更相似的其他人的意见给予的权重越大,具有重大影响力的群体就越小。第三,部署k-nn算法产生的影响网络是分层组织的。我们的研究结果为传播和网络科学的经典实证发现提供了新的视角,有助于提高对线下和线上社会影响的理解。
{"title":"The Structure of Social Influence in Recommender Networks","authors":"P. Analytis, D. Barkoczi, Philipp Lorenz-Spreen, Stefan M. Herzog","doi":"10.1145/3366423.3380020","DOIUrl":"https://doi.org/10.1145/3366423.3380020","url":null,"abstract":"People’s ability to influence others’ opinion on matters of taste varies greatly—both offline and in recommender systems. What are the mechanisms underlying these striking differences? Using the weighted k-nearest neighbors algorithm (k-nn) to represent an array of social learning strategies, we show—leveraging methods from network science—how the k-nn algorithm gives rise to networks of social influence in six real-world domains of taste. We show three novel results that apply both to offline advice taking and online recommender settings. First, influential individuals have mainstream tastes and high dispersion in their taste similarity with others. Second, the fewer people an individual or algorithm consults (i.e., the lower k is) or the larger the weight placed on the opinions of more similar others, the smaller the group of people with substantial influence. Third, the influence networks emerging from deploying the k-nn algorithm are hierarchically organized. Our results shed new light on classic empirical findings in communication and network science and can help improve the understanding of social influence offline and online.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89197786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Leveraging Passage-level Cumulative Gain for Document Ranking 利用段落级累积增益进行文档排序
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380305
Zhijing Wu, Jiaxin Mao, Yiqun Liu, Jingtao Zhan, Yukun Zheng, Min Zhang, Shaoping Ma
Document ranking is one of the most studied but challenging problems in information retrieval (IR) research. A number of existing document ranking models capture relevance signals at the whole document level. Recently, more and more research has begun to address this problem from fine-grained document modeling. Several works leveraged fine-grained passage-level relevance signals in ranking models. However, most of these works focus on context-independent passage-level relevance signals and ignore the context information, which may lead to inaccurate estimation of passage-level relevance. In this paper, we investigate how information gain accumulates with passages when users sequentially read a document. We propose the context-aware Passage-level Cumulative Gain (PCG), which aggregates relevance scores of passages and avoids the need to formally split a document into independent passages. Next, we incorporate the patterns of PCG into a BERT-based sequential model called Passage-level Cumulative Gain Model (PCGM) to predict the PCG sequence. Finally, we apply PCGM to the document ranking task. Experimental results on two public ad hoc retrieval benchmark datasets show that PCGM outperforms most existing ranking models and also indicates the effectiveness of PCG signals. We believe that this work contributes to improving ranking performance and providing more explainability for document ranking.
文献排序是信息检索研究中研究最多但也是最具挑战性的问题之一。许多现有的文档排序模型在整个文档级别捕获相关信号。最近,越来越多的研究开始从细粒度文档建模的角度来解决这个问题。一些研究在排序模型中利用了细粒度的通道级相关信号。然而,这些研究大多关注与语境无关的篇章级关联信号,而忽略了语境信息,这可能导致篇章级关联的估计不准确。在本文中,我们研究了当用户顺序阅读文档时,信息增益是如何随着段落积累的。我们提出了上下文感知的段落级累积增益(PCG),它汇总了段落的相关性分数,避免了将文档正式拆分为独立段落的需要。接下来,我们将PCG的模式整合到基于bert的序列模型中,称为通道级累积增益模型(PCGM),以预测PCG序列。最后,我们将PCGM应用于文档排序任务。在两个公共自组织检索基准数据集上的实验结果表明,PCGM优于大多数现有的排序模型,也表明了PCG信号的有效性。我们相信这项工作有助于提高排名性能,并为文档排名提供更多的可解释性。
{"title":"Leveraging Passage-level Cumulative Gain for Document Ranking","authors":"Zhijing Wu, Jiaxin Mao, Yiqun Liu, Jingtao Zhan, Yukun Zheng, Min Zhang, Shaoping Ma","doi":"10.1145/3366423.3380305","DOIUrl":"https://doi.org/10.1145/3366423.3380305","url":null,"abstract":"Document ranking is one of the most studied but challenging problems in information retrieval (IR) research. A number of existing document ranking models capture relevance signals at the whole document level. Recently, more and more research has begun to address this problem from fine-grained document modeling. Several works leveraged fine-grained passage-level relevance signals in ranking models. However, most of these works focus on context-independent passage-level relevance signals and ignore the context information, which may lead to inaccurate estimation of passage-level relevance. In this paper, we investigate how information gain accumulates with passages when users sequentially read a document. We propose the context-aware Passage-level Cumulative Gain (PCG), which aggregates relevance scores of passages and avoids the need to formally split a document into independent passages. Next, we incorporate the patterns of PCG into a BERT-based sequential model called Passage-level Cumulative Gain Model (PCGM) to predict the PCG sequence. Finally, we apply PCGM to the document ranking task. Experimental results on two public ad hoc retrieval benchmark datasets show that PCGM outperforms most existing ranking models and also indicates the effectiveness of PCG signals. We believe that this work contributes to improving ranking performance and providing more explainability for document ranking.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87705009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Twitter User Location Inference Based on Representation Learning and Label Propagation 基于表示学习和标签传播的Twitter用户位置推断
Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380019
Hechan Tian, Meng Zhang, Xiangyang Luo, Fenlin Liu, Yaqiong Qiao
Social network user location inference technology has been widely used in various geospatial applications like public health monitoring and local advertising recommendation. Due to insufficient consideration of relationships between users and location indicative words, most of existing inference methods estimate label propagation probabilities solely based on statistical features, resulting in large location inference error. In this paper, a Twitter user location inference method based on representation learning and label propagation is proposed. Firstly, the heterogeneous connection relation graph is constructed based on relationships between Twitter users and relationships between users and location indicative words, and relationships unrelated to geographic attributes are filtered. Then, vector representations of users are learnt from the connection relation graph. Finally, label propagation probabilities between adjacent users are calculated based on vector representations, and the locations of unknown users are predicted through iterative label propagation. Experiments on two representative Twitter datasets - GeoText and TwUs, show that the proposed method can accurately calculate label propagation probabilities based on vector representations and improve the accuracy of location inference. Compared with existing typical Twitter user location inference methods - GCN and MLP-TXT+NET, the median error distance of the proposed method is reduced by 18% and 16%, respectively.
社交网络用户位置推断技术已广泛应用于公共卫生监测、本地广告推荐等各种地理空间应用。由于没有充分考虑用户与位置指示词之间的关系,现有的推理方法大多仅基于统计特征来估计标签传播概率,导致位置推断误差较大。提出了一种基于表示学习和标签传播的Twitter用户位置推理方法。首先,基于Twitter用户之间的关系和用户与位置指示词之间的关系构建异构连接关系图,并过滤与地理属性无关的关系;然后,从连接关系图中学习用户的向量表示。最后,基于向量表示计算相邻用户之间的标签传播概率,并通过迭代标签传播预测未知用户的位置。在两个具有代表性的Twitter数据集GeoText和TwUs上的实验表明,该方法可以准确地计算出基于向量表示的标签传播概率,提高了位置推理的准确性。与现有的典型Twitter用户位置推断方法GCN和MLP-TXT+NET相比,本文方法的中位误差距离分别减小了18%和16%。
{"title":"Twitter User Location Inference Based on Representation Learning and Label Propagation","authors":"Hechan Tian, Meng Zhang, Xiangyang Luo, Fenlin Liu, Yaqiong Qiao","doi":"10.1145/3366423.3380019","DOIUrl":"https://doi.org/10.1145/3366423.3380019","url":null,"abstract":"Social network user location inference technology has been widely used in various geospatial applications like public health monitoring and local advertising recommendation. Due to insufficient consideration of relationships between users and location indicative words, most of existing inference methods estimate label propagation probabilities solely based on statistical features, resulting in large location inference error. In this paper, a Twitter user location inference method based on representation learning and label propagation is proposed. Firstly, the heterogeneous connection relation graph is constructed based on relationships between Twitter users and relationships between users and location indicative words, and relationships unrelated to geographic attributes are filtered. Then, vector representations of users are learnt from the connection relation graph. Finally, label propagation probabilities between adjacent users are calculated based on vector representations, and the locations of unknown users are predicted through iterative label propagation. Experiments on two representative Twitter datasets - GeoText and TwUs, show that the proposed method can accurately calculate label propagation probabilities based on vector representations and improve the accuracy of location inference. Compared with existing typical Twitter user location inference methods - GCN and MLP-TXT+NET, the median error distance of the proposed method is reduced by 18% and 16%, respectively.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85412856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
Proceedings of The Web Conference 2020
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1