Proceedings of the Web Conference 2021最新文献

A Novel Macro-Micro Fusion Network for User Representation Learning on Mobile Apps 一种用于移动应用用户表征学习的宏微融合网络

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450109

Shuqing Bian, Wayne Xin Zhao, Kun Zhou, Xu Chen, Jing Cai, Yancheng He, Xingji Luo, Ji-rong Wen

The evolution of mobile apps has greatly changed the way that we live. It becomes increasingly important to understand and model the users on mobile apps. Instead of focusing on some specific app alone, it has become a popular paradigm to study the user behavior on various mobile apps in a symbiotic environment. In this paper, we study the task of user representation learning with both macro and micro interaction data on mobile apps. Specifically, macro and micro interaction refer to user-app interaction or user-item interaction on some specific app, respectively. By combining the two kinds of user data, it is expected to derive a more comprehensive, robust user representation model on mobile apps. In order to effectively fuse the information across the macro and micro views, we propose a novel macro-micro fusion network for user representation learning on mobile apps. With a Transformer architecture as the base model, we design a representation fusion component that is able to capture the category-based semantic alignment at the user level. After such semantic alignment, the information across the two views can be adaptively fused in our approach. Furthermore, we adopt mutual information maximization to derive a self-supervised loss to enhance the learning of our fusion network. Extensive experiments with three downstream tasks on two real-world datasets have demonstrated the effectiveness of our approach.

移动应用的发展极大地改变了我们的生活方式。理解和模拟手机应用上的用户变得越来越重要。在共生环境中研究各种移动应用程序的用户行为已经成为一种流行的范式，而不是只关注某个特定的应用程序。本文从移动应用的宏观和微观交互数据两方面研究了用户表征学习任务。具体来说，宏观交互和微观交互分别是指用户与应用程序之间的交互和用户与项目之间在某个特定应用程序上的交互。通过结合这两种用户数据，我们有望在移动应用上推导出一个更全面、更稳健的用户表示模型。为了有效地融合宏、微观视角的信息，我们提出了一种新的宏、微观融合网络，用于移动应用的用户表征学习。使用Transformer体系结构作为基本模型，我们设计了一个表示融合组件，它能够在用户级别捕获基于类别的语义对齐。在这种语义对齐之后，我们的方法可以自适应地融合两个视图之间的信息。此外，我们采用互信息最大化来推导自监督损失，以增强融合网络的学习能力。在两个真实数据集上进行了三个下游任务的大量实验，证明了我们方法的有效性。

{"title":"A Novel Macro-Micro Fusion Network for User Representation Learning on Mobile Apps","authors":"Shuqing Bian, Wayne Xin Zhao, Kun Zhou, Xu Chen, Jing Cai, Yancheng He, Xingji Luo, Ji-rong Wen","doi":"10.1145/3442381.3450109","DOIUrl":"https://doi.org/10.1145/3442381.3450109","url":null,"abstract":"The evolution of mobile apps has greatly changed the way that we live. It becomes increasingly important to understand and model the users on mobile apps. Instead of focusing on some specific app alone, it has become a popular paradigm to study the user behavior on various mobile apps in a symbiotic environment. In this paper, we study the task of user representation learning with both macro and micro interaction data on mobile apps. Specifically, macro and micro interaction refer to user-app interaction or user-item interaction on some specific app, respectively. By combining the two kinds of user data, it is expected to derive a more comprehensive, robust user representation model on mobile apps. In order to effectively fuse the information across the macro and micro views, we propose a novel macro-micro fusion network for user representation learning on mobile apps. With a Transformer architecture as the base model, we design a representation fusion component that is able to capture the category-based semantic alignment at the user level. After such semantic alignment, the information across the two views can be adaptively fused in our approach. Furthermore, we adopt mutual information maximization to derive a self-supervised loss to enhance the learning of our fusion network. Extensive experiments with three downstream tasks on two real-world datasets have demonstrated the effectiveness of our approach.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117193466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

An Investigation of Identity-Account Inconsistency in Single Sign-On 单点登录中身份-账户不一致问题的研究

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450085

Guannan Liu, Xing Gao, Haining Wang

Single Sign-On (SSO) has been widely adopted for online authentication due to its favorable usability and security. However, it also introduces a single point of failure since all service providers fully trust the identity of a user created by the SSO identity provider. In this paper, we investigate the identity-account inconsistency threat, a new SSO vulnerability that can cause the compromise of online accounts. The vulnerability exists because current SSO systems highly rely on a user’s email address to bind an account with a real identity, but ignore the fact that email addresses might be reused by other users. We reveal that under the SSO authentication, such inconsistency allows an adversary controlling a reused email address to take over associated online accounts without knowing any credentials like passwords. Specifically, we first conduct a measurement study on the account management policies for multiple cloud email providers, showing the feasibility of acquiring previously used email accounts. We further perform a systematic study on 100 popular websites using the Google business email service with our own domain address and demonstrate that most online accounts can be compromised by exploiting this inconsistency vulnerability. To shed light on email reuse in the wild, we analyze the commonly used naming conventions that lead to a wide existence of potential email address collisions, and conduct a case study on the account policies of U.S. universities. Finally, we propose several useful practices for end-users, service providers, and identity providers to protect against this identity-account inconsistency threat.

单点登录(Single Sign-On, SSO)以其良好的可用性和安全性被广泛应用于在线身份验证。但是，它也引入了单点故障，因为所有服务提供者都完全信任SSO身份提供者创建的用户的身份。本文研究了身份-帐户不一致威胁，这是一种新的单点登录漏洞，可以导致在线帐户的泄露。该漏洞的存在是因为当前的SSO系统高度依赖用户的电子邮件地址将帐户与真实身份绑定，但忽略了电子邮件地址可能被其他用户重用的事实。我们发现，在SSO身份验证下，这种不一致允许攻击者控制重用的电子邮件地址，在不知道任何凭据(如密码)的情况下接管相关的在线帐户。具体而言，我们首先对多个云电子邮件提供商的帐户管理策略进行了测量研究，显示了获取以前使用过的电子邮件帐户的可行性。我们进一步对使用我们自己的域名地址的谷歌商业电子邮件服务的100个流行网站进行了系统研究，并证明大多数在线帐户可以通过利用这种不一致漏洞而受到损害。为了阐明电子邮件的重用，我们分析了导致潜在电子邮件地址冲突的普遍使用的命名约定，并对美国大学的帐户政策进行了案例研究。最后，我们为最终用户、服务提供者和身份提供者提出了几个有用的实践，以防止这种身份-帐户不一致威胁。

{"title":"An Investigation of Identity-Account Inconsistency in Single Sign-On","authors":"Guannan Liu, Xing Gao, Haining Wang","doi":"10.1145/3442381.3450085","DOIUrl":"https://doi.org/10.1145/3442381.3450085","url":null,"abstract":"Single Sign-On (SSO) has been widely adopted for online authentication due to its favorable usability and security. However, it also introduces a single point of failure since all service providers fully trust the identity of a user created by the SSO identity provider. In this paper, we investigate the identity-account inconsistency threat, a new SSO vulnerability that can cause the compromise of online accounts. The vulnerability exists because current SSO systems highly rely on a user’s email address to bind an account with a real identity, but ignore the fact that email addresses might be reused by other users. We reveal that under the SSO authentication, such inconsistency allows an adversary controlling a reused email address to take over associated online accounts without knowing any credentials like passwords. Specifically, we first conduct a measurement study on the account management policies for multiple cloud email providers, showing the feasibility of acquiring previously used email accounts. We further perform a systematic study on 100 popular websites using the Google business email service with our own domain address and demonstrate that most online accounts can be compromised by exploiting this inconsistency vulnerability. To shed light on email reuse in the wild, we analyze the commonly used naming conventions that lead to a wide existence of potential email address collisions, and conduct a case study on the account policies of U.S. universities. Finally, we propose several useful practices for end-users, service providers, and identity providers to protect against this identity-account inconsistency threat.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117353666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

CrowdGP: a Gaussian Process Model for Inferring Relevance from Crowd Annotations 从人群注解中推断相关性的高斯过程模型

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450047

Dan Li, Zhaochun Ren, E. Kanoulas

Test collection has been a crucial factor for developing information retrieval systems. Constructing a test collection requires annotators to assess the relevance of massive query-document pairs. Relevance annotations acquired through crowdsourcing platforms alleviate the enormous cost of this process but they are often noisy. Existing models to denoise crowd annotations mostly assume that annotations are generated independently, based on which a probabilistic graphical model is designed to model the annotation generation process. However, tasks are often correlated with each other in reality. It is an understudied problem whether and how task correlation helps in denoising crowd annotations. In this paper, we relax the independence assumption to model task correlation in terms of relevance. We propose a new crowd annotation generation model named CrowdGP, where true relevance labels, annotator competence, annotator’s bias towards relevancy, task difficulty, and task’s bias towards relevancy are modelled through a Gaussian process and multiple Gaussian variables respectively. The CrowdGP model shows better performance in terms of interring true relevance labels compared with state-of-the-art baselines on two crowdsourcing datasets on relevance. The experiments also demonstrate its effectiveness in terms of selecting new tasks for future crowd annotation, which is a new functionality of CrowdGP. Ablation studies indicate that the effectiveness is attributed to the modelling of task correlation based on the auxiliary information of tasks and the prior relevance information of documents to queries.

测试采集已经成为开发信息检索系统的关键因素。构建测试集合需要注释器评估大量查询文档对的相关性。通过众包平台获得的相关性注释减轻了这一过程的巨大成本，但它们通常是嘈杂的。现有的群体标注去噪模型大多假设标注是独立生成的，在此基础上设计了概率图模型对标注生成过程进行建模。然而，在现实中，任务往往是相互关联的。任务关联是否有助于去噪以及如何帮助去噪是一个尚未得到充分研究的问题。在本文中，我们放宽独立性假设，从相关性的角度对任务相关性进行建模。本文提出了一种新的群体标注生成模型CrowdGP，该模型通过高斯过程和多个高斯变量分别对真实相关标签、标注者能力、标注者相关性偏差、任务难度和任务相关性偏差进行建模。在两个关于相关性的众包数据集上，与最先进的基线相比，crowdp模型在关联真实相关性标签方面表现出更好的性能。实验还证明了它在为未来的人群注释选择新任务方面的有效性，这是CrowdGP的新功能。研究表明，基于任务辅助信息和文档与查询的先验相关信息的任务关联建模是有效的。

{"title":"CrowdGP: a Gaussian Process Model for Inferring Relevance from Crowd Annotations","authors":"Dan Li, Zhaochun Ren, E. Kanoulas","doi":"10.1145/3442381.3450047","DOIUrl":"https://doi.org/10.1145/3442381.3450047","url":null,"abstract":"Test collection has been a crucial factor for developing information retrieval systems. Constructing a test collection requires annotators to assess the relevance of massive query-document pairs. Relevance annotations acquired through crowdsourcing platforms alleviate the enormous cost of this process but they are often noisy. Existing models to denoise crowd annotations mostly assume that annotations are generated independently, based on which a probabilistic graphical model is designed to model the annotation generation process. However, tasks are often correlated with each other in reality. It is an understudied problem whether and how task correlation helps in denoising crowd annotations. In this paper, we relax the independence assumption to model task correlation in terms of relevance. We propose a new crowd annotation generation model named CrowdGP, where true relevance labels, annotator competence, annotator’s bias towards relevancy, task difficulty, and task’s bias towards relevancy are modelled through a Gaussian process and multiple Gaussian variables respectively. The CrowdGP model shows better performance in terms of interring true relevance labels compared with state-of-the-art baselines on two crowdsourcing datasets on relevance. The experiments also demonstrate its effectiveness in terms of selecting new tasks for future crowd annotation, which is a new functionality of CrowdGP. Ablation studies indicate that the effectiveness is attributed to the modelling of task correlation based on the auxiliary information of tasks and the prior relevance information of documents to queries.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127267322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

DeepVista: 16K Panoramic Cinema on Your Mobile Device DeepVista:移动设备上的16K全景影院

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449829

Wenxiao Zhang, Feng Qian, B. Han, P. Hui

In this paper, we design, implement, and evaluate , which is to our knowledge the first consumer-class system that streams panoramic videos far beyond the ultra high-definition resolution (up to 16K) to mobile devices, offering truly immersive experiences. Such an immense resolution makes streaming video-on-demand (VoD) content extremely resource-demanding. To tackle this challenge, introduces a novel framework that leverages an edge server to perform efficient, intelligent, and quality-guaranteed content transcoding, by extracting from panoramic frames the viewport stream that will be delivered to the client. To support real-time transcoding of 16K content, employs several key mechanisms such as dual-GPU acceleration, lossless viewport extraction, deep viewport prediction, and a two-layer streaming design. Our extensive evaluations using real users’ viewport movement data indicate that outperforms existing solutions, and can smoothly stream 16K panoramic videos to mobile devices over diverse wireless networks including WiFi, LTE, and mmWave 5G.

在本文中，我们设计，实现和评估，据我们所知，这是第一个消费者级系统，将全景视频远远超过超高清分辨率(高达16K)传输到移动设备，提供真正的沉浸式体验。如此高的分辨率使得流媒体视频点播(VoD)内容非常需要资源。为了应对这一挑战，我们引入了一个新的框架，通过从全景帧中提取将交付给客户端的视口流，该框架利用边缘服务器来执行高效、智能和有质量保证的内容转码。为了支持16K内容的实时转码，采用了几个关键机制，如双gpu加速、无损视口提取、深度视口预测和两层流设计。我们使用真实用户的视口移动数据进行的广泛评估表明，它优于现有的解决方案，并且可以通过各种无线网络(包括WiFi, LTE和毫米波5G)顺畅地将16K全景视频流式传输到移动设备。

引用次数: 15

GuideBoot: Guided Bootstrap for Deep Contextual Banditsin Online Advertising GuideBoot:在线广告中深度上下文强盗的引导引导

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449987

Feiyang Pan, Haoming Li, Xiang Ao, Wei Wang, Yanrong Kang, Ao Tan, Qing He

The exploration/exploitation (E&E) dilemma lies at the core of interactive systems such as online advertising, for which contextual bandit algorithms have been proposed. Bayesian approaches provide guided exploration via uncertainty estimation, but the applicability is often limited due to over-simplified assumptions. Non-Bayesian bootstrap methods, on the other hand, can apply to complex problems by using deep reward models, but lack a clear guidance to the exploration behavior. It still remains largely unsolved to develop a practical method for complex deep contextual bandits. In this paper, we introduce Guided Bootstrap (GuideBoot), combining the best of both worlds. GuideBoot provides explicit guidance to the exploration behavior by training multiple models over both real samples and noisy samples with fake labels, where the noise is added according to the predictive uncertainty. The proposed method is efficient as it can make decisions on-the-fly by utilizing only one randomly chosen model, but is also effective as we show that it can be viewed as a non-Bayesian approximation of Thompson sampling. Moreover, we extend it to an online version that can learn solely from streaming data, which is favored in real applications. Extensive experiments on both synthetic tasks and large-scale advertising environments show that GuideBoot achieves significant improvements against previous state-of-the-art methods.

探索/利用(E&E)困境是在线广告等互动系统的核心，为此已经提出了上下文强盗算法。贝叶斯方法通过不确定性估计提供指导性探索，但由于过度简化的假设，其适用性往往受到限制。另一方面，非贝叶斯自举方法可以通过使用深度奖励模型来应用于复杂问题，但缺乏对探索行为的明确指导。开发一种实用的方法来处理复杂的深层上下文强盗，在很大程度上仍然没有得到解决。在本文中，我们介绍了引导引导(GuideBoot)，结合了两者的优点。GuideBoot通过在真实样本和带有假标签的噪声样本上训练多个模型，根据预测的不确定性添加噪声，从而为探索行为提供明确的指导。所提出的方法是有效的，因为它可以通过只使用一个随机选择的模型来即时做出决策，但也有效，因为我们表明它可以被视为汤普森抽样的非贝叶斯近似。此外，我们将其扩展到可以仅从流数据中学习的在线版本，这在实际应用中很受欢迎。在合成任务和大规模广告环境中进行的大量实验表明，与之前最先进的方法相比，GuideBoot取得了显著的进步。

{"title":"GuideBoot: Guided Bootstrap for Deep Contextual Banditsin Online Advertising","authors":"Feiyang Pan, Haoming Li, Xiang Ao, Wei Wang, Yanrong Kang, Ao Tan, Qing He","doi":"10.1145/3442381.3449987","DOIUrl":"https://doi.org/10.1145/3442381.3449987","url":null,"abstract":"The exploration/exploitation (E&E) dilemma lies at the core of interactive systems such as online advertising, for which contextual bandit algorithms have been proposed. Bayesian approaches provide guided exploration via uncertainty estimation, but the applicability is often limited due to over-simplified assumptions. Non-Bayesian bootstrap methods, on the other hand, can apply to complex problems by using deep reward models, but lack a clear guidance to the exploration behavior. It still remains largely unsolved to develop a practical method for complex deep contextual bandits. In this paper, we introduce Guided Bootstrap (GuideBoot), combining the best of both worlds. GuideBoot provides explicit guidance to the exploration behavior by training multiple models over both real samples and noisy samples with fake labels, where the noise is added according to the predictive uncertainty. The proposed method is efficient as it can make decisions on-the-fly by utilizing only one randomly chosen model, but is also effective as we show that it can be viewed as a non-Bayesian approximation of Thompson sampling. Moreover, we extend it to an online version that can learn solely from streaming data, which is favored in real applications. Extensive experiments on both synthetic tasks and large-scale advertising environments show that GuideBoot achieves significant improvements against previous state-of-the-art methods.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115555573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-level Connection Enhanced Representation Learning for Script Event Prediction 面向脚本事件预测的多级连接增强表示学习

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449894

Lihong Wang, Juwei Yue, Shu Guo, Jiawei Sheng, Qianren Mao, Zhenyu Chen, Shenghai Zhong, Chen Li

Script event prediction (SEP) aims to choose a correct subsequent event from a candidate list, given a chain of ordered context events. Event representation learning has been proposed and successfully applied to this task. Most previous methods learning representations mainly focus on coarse-grained connections at event or chain level, while ignoring more fine-grained connections between events. Here we propose a novel framework which can enhance the representation learning of events by mining their connections at multiple granularity levels, including argument level, event level and chain level. In our method, we first employ a masked self-attention mechanism to model the relations between the components of events (i.e. arguments). Then, a directed graph convolutional network is further utilized to model the temporal or causal relations between events in the chain. Finally, we introduce an attention module to the context event chain, so as to dynamically aggregate context events with respect to the current candidate event. By fusing threefold connections in a unified framework, our approach can learn more accurate argument/event/chain representations, and thus leads to better prediction performance. Comprehensive experiment results on public New York Times corpus demonstrate that our model outperforms other state-of-the-art baselines. Our code is available in https://github.com/YueAWu/MCer.

脚本事件预测(SEP)的目标是从给定的有序上下文事件链的候选列表中选择正确的后续事件。事件表示学习已被提出并成功应用于该任务。以往学习表示的方法主要关注事件或链级的粗粒度连接，而忽略了事件之间的细粒度连接。本文提出了一种新的框架，该框架可以通过在多个粒度级别(包括参数级别、事件级别和链级别)挖掘事件之间的联系来增强事件的表示学习。在我们的方法中，我们首先使用一个隐藏的自关注机制来建模事件组件(即参数)之间的关系。然后，进一步利用有向图卷积网络对链中事件之间的时间或因果关系进行建模。最后，我们在上下文事件链中引入了一个关注模块，从而根据当前候选事件动态聚合上下文事件。通过在一个统一的框架中融合三重连接，我们的方法可以学习更准确的参数/事件/链表示，从而获得更好的预测性能。在纽约时报公共语料库上的综合实验结果表明，我们的模型优于其他最先进的基线。我们的代码可在https://github.com/YueAWu/MCer中获得。

{"title":"Multi-level Connection Enhanced Representation Learning for Script Event Prediction","authors":"Lihong Wang, Juwei Yue, Shu Guo, Jiawei Sheng, Qianren Mao, Zhenyu Chen, Shenghai Zhong, Chen Li","doi":"10.1145/3442381.3449894","DOIUrl":"https://doi.org/10.1145/3442381.3449894","url":null,"abstract":"Script event prediction (SEP) aims to choose a correct subsequent event from a candidate list, given a chain of ordered context events. Event representation learning has been proposed and successfully applied to this task. Most previous methods learning representations mainly focus on coarse-grained connections at event or chain level, while ignoring more fine-grained connections between events. Here we propose a novel framework which can enhance the representation learning of events by mining their connections at multiple granularity levels, including argument level, event level and chain level. In our method, we first employ a masked self-attention mechanism to model the relations between the components of events (i.e. arguments). Then, a directed graph convolutional network is further utilized to model the temporal or causal relations between events in the chain. Finally, we introduce an attention module to the context event chain, so as to dynamically aggregate context events with respect to the current candidate event. By fusing threefold connections in a unified framework, our approach can learn more accurate argument/event/chain representations, and thus leads to better prediction performance. Comprehensive experiment results on public New York Times corpus demonstrate that our model outperforms other state-of-the-art baselines. Our code is available in https://github.com/YueAWu/MCer.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116094873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Interventions for Softening Can Lead to Hardening of Opinions: Evidence from a Randomized Controlled Trial 软化干预可能导致观点硬化:来自随机对照试验的证据

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450019

A. Spitz, A. Abu-Akel, R. West

Motivated by the goal of designing interventions for softening polarized opinions on the Web, and building on results from psychology, we hypothesized that people would be moved more easily towards opposing opinions when the latter were voiced by a celebrity they like, rather than by a celebrity they dislike. We tested this hypothesis in a survey-based randomized controlled trial in which we exposed respondents to opinions that were randomly assigned to one of four spokespersons each: a disagreeing but liked celebrity, a disagreeing and disliked celebrity, a disagreeing expert, and an agreeing but disliked celebrity. After the treatment, we measured changes in the respondents’ opinions, empathy towards the spokespersons, and use of affective language. Unlike hypothesized, no softening of opinions was observed regardless of the respondents’ attitudes towards the celebrity. Instead, we found strong evidence of a hardening of pretreatment opinions when a disagreeing opinion was attributed to an expert or when an agreeing opinion was attributed to a disliked celebrity. We also observed a pronounced reduction in empathy for disagreeing spokespersons, indicating a punitive response. The only celebrity for whom, on average, empathy remained unchanged was the one who agreed, even though they were disliked. Our results could be explained as a reaction to violated expectations towards experts and as a perceived breach of trust by liked celebrities. They confirm that naïve strategies at mediation may not yield intended results, and how difficult it is to depolarize—and how easy it is to further polarize or provoke emotional responses.

我们的目标是设计干预措施，软化网络上两极分化的观点，并基于心理学的结果，我们假设，当人们喜欢的名人发表反对意见时，他们会更容易倾向于反对意见，而不是他们不喜欢的名人。我们在一项基于调查的随机对照试验中检验了这一假设，在试验中，我们向受访者随机分配了四个发言人中的一个:一个不同意但喜欢的名人，一个不同意但不喜欢的名人，一个不同意但不喜欢的专家，一个同意但不喜欢的名人。治疗后，我们测量了受访者的意见、对发言人的同理心和情感语言使用的变化。与假设不同的是，无论受访者对名人的态度如何，都没有观察到意见的软化。相反，我们发现强有力的证据表明，当不同意的意见来自专家或同意的意见来自不受欢迎的名人时，预处理意见会硬化。我们还观察到，对持不同意见的发言人的同理心明显减少，这表明了一种惩罚性的反应。平均而言，唯一对名人的同理心保持不变的是那些同意的名人，即使他们不受欢迎。我们的结果可以解释为对专家期望被打破的反应，以及被喜欢的名人对信任的感知破坏。他们证实了naïve调解策略可能不会产生预期的结果，以及去两极分化是多么困难，而进一步两极分化或激起情绪反应是多么容易。

{"title":"Interventions for Softening Can Lead to Hardening of Opinions: Evidence from a Randomized Controlled Trial","authors":"A. Spitz, A. Abu-Akel, R. West","doi":"10.1145/3442381.3450019","DOIUrl":"https://doi.org/10.1145/3442381.3450019","url":null,"abstract":"Motivated by the goal of designing interventions for softening polarized opinions on the Web, and building on results from psychology, we hypothesized that people would be moved more easily towards opposing opinions when the latter were voiced by a celebrity they like, rather than by a celebrity they dislike. We tested this hypothesis in a survey-based randomized controlled trial in which we exposed respondents to opinions that were randomly assigned to one of four spokespersons each: a disagreeing but liked celebrity, a disagreeing and disliked celebrity, a disagreeing expert, and an agreeing but disliked celebrity. After the treatment, we measured changes in the respondents’ opinions, empathy towards the spokespersons, and use of affective language. Unlike hypothesized, no softening of opinions was observed regardless of the respondents’ attitudes towards the celebrity. Instead, we found strong evidence of a hardening of pretreatment opinions when a disagreeing opinion was attributed to an expert or when an agreeing opinion was attributed to a disliked celebrity. We also observed a pronounced reduction in empathy for disagreeing spokespersons, indicating a punitive response. The only celebrity for whom, on average, empathy remained unchanged was the one who agreed, even though they were disliked. Our results could be explained as a reaction to violated expectations towards experts and as a perceived breach of trust by liked celebrities. They confirm that naïve strategies at mediation may not yield intended results, and how difficult it is to depolarize—and how easy it is to further polarize or provoke emotional responses.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116133686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Modeling Sparse Information Diffusion at Scale via Lazy Multivariate Hawkes Processes 基于Lazy Multivariate Hawkes过程的稀疏信息扩散模型

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450094

Maximilian Nickel, Matt Le

Multivariate Hawkes Processes (MHPs) are an important class of temporal point processes that have enabled key advances in understanding and predicting social information systems. However, due to their complex modeling of temporal dependencies, MHPs have proven to be notoriously difficult to scale, what has limited their applications to relatively small domains. In this work, we propose a novel model and computational approach to overcome this important limitation. By exploiting a characteristic sparsity pattern in real-world diffusion processes, we show that our approach allows to compute the exact likelihood and gradients of an MHP – independently of the ambient dimensions of the underlying network. We show on synthetic and real-world datasets that our method does not only achieve state-of-the-art modeling results, but also improves runtime performance by multiple orders of magnitude on sparse event sequences. In combination with easily interpretable latent variables and influence structures, this allows us to analyze diffusion processes in networks at previously unattainable scale.

多元霍克斯过程(MHPs)是一类重要的时间点过程，它在理解和预测社会信息系统方面取得了关键进展。然而，由于它们对时间依赖性的复杂建模，mhp已被证明是非常难以扩展的，这限制了它们在相对较小的领域的应用。在这项工作中，我们提出了一种新的模型和计算方法来克服这一重要的限制。通过利用现实世界扩散过程中的特征稀疏模式，我们表明我们的方法允许计算MHP的精确似然和梯度-独立于底层网络的环境维度。我们在合成和真实世界的数据集上展示了我们的方法不仅可以获得最先进的建模结果，而且还可以在稀疏事件序列上提高多个数量级的运行时性能。结合易于解释的潜在变量和影响结构，这使我们能够以以前无法达到的规模分析网络中的扩散过程。

引用次数: 11

Not All Features Are Equal: Discovering Essential Features for Preserving Prediction Privacy 不是所有的特征都是平等的:发现保护预测隐私的基本特征

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449965

FatemehSadat Mireshghallah, Mohammadkazem Taram, A. Jalali, Ahmed T. Elthakeb, D. Tullsen, H. Esmaeilzadeh

When receiving machine learning services from the cloud, the provider does not need to receive all features; in fact, only a subset of the features are necessary for the target prediction task. Discerning this subset is the key problem of this work. We formulate this problem as a gradient-based perturbation maximization method that discovers this subset in the input feature space with respect to the functionality of the prediction model used by the provider. After identifying the subset, our framework, Cloak, suppresses the rest of the features using utility-preserving constant values that are discovered through a separate gradient-based optimization process. We show that Cloak does not necessarily require collaboration from the service provider beyond its normal service, and can be applied in scenarios where we only have black-box access to the service provider’s model. We theoretically guarantee that Cloak’s optimizations reduce the upper bound of the Mutual Information (MI) between the data and the sifted representations that are sent out. Experimental results show that Cloak reduces the mutual information between the input and the sifted representations by 85.01% with only negligible reduction in utility (1.42%). In addition, we show that Cloak greatly diminishes adversaries’ ability to learn and infer non-conducive features.

当从云端接收机器学习服务时，提供商不需要接收所有功能;事实上，目标预测任务只需要特征的一个子集。识别这个子集是这项工作的关键问题。我们将这个问题表述为基于梯度的扰动最大化方法，该方法在输入特征空间中根据提供者使用的预测模型的功能发现这个子集。在确定了子集之后，我们的框架Cloak使用通过单独的基于梯度的优化过程发现的保持效用的常数值来抑制其余的特征。我们表明，Cloak并不一定需要服务提供者在其正常服务之外进行协作，并且可以应用于我们只有黑盒访问服务提供者模型的场景中。从理论上讲，我们保证Cloak的优化减少了发送的数据和筛选后的表示之间的互信息(MI)的上限。实验结果表明，斗篷将输入和筛选表示之间的互信息减少了85.01%，而效用的降低可以忽略不计(1.42%)。此外，我们还表明，Cloak极大地削弱了对手学习和推断不利特征的能力。

{"title":"Not All Features Are Equal: Discovering Essential Features for Preserving Prediction Privacy","authors":"FatemehSadat Mireshghallah, Mohammadkazem Taram, A. Jalali, Ahmed T. Elthakeb, D. Tullsen, H. Esmaeilzadeh","doi":"10.1145/3442381.3449965","DOIUrl":"https://doi.org/10.1145/3442381.3449965","url":null,"abstract":"When receiving machine learning services from the cloud, the provider does not need to receive all features; in fact, only a subset of the features are necessary for the target prediction task. Discerning this subset is the key problem of this work. We formulate this problem as a gradient-based perturbation maximization method that discovers this subset in the input feature space with respect to the functionality of the prediction model used by the provider. After identifying the subset, our framework, Cloak, suppresses the rest of the features using utility-preserving constant values that are discovered through a separate gradient-based optimization process. We show that Cloak does not necessarily require collaboration from the service provider beyond its normal service, and can be applied in scenarios where we only have black-box access to the service provider’s model. We theoretically guarantee that Cloak’s optimizations reduce the upper bound of the Mutual Information (MI) between the data and the sifted representations that are sent out. Experimental results show that Cloak reduces the mutual information between the input and the sifted representations by 85.01% with only negligible reduction in utility (1.42%). In addition, we show that Cloak greatly diminishes adversaries’ ability to learn and infer non-conducive features.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114280864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Completing Missing Prevalence Rates for Multiple Chronic Diseases by Jointly Leveraging Both Intra- and Inter-Disease Population Health Data Correlations 通过联合利用疾病内和疾病间人口健康数据相关性来完成多种慢性病的缺失患病率

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449811

Yujie Feng, Jiangtao Wang, Yasha Wang, A. Helal

Population health data are becoming more and more publicly available on the Internet than ever before. Such datasets offer a great potential for enabling a better understanding of the health of populations, and inform health professionals and policy makers for better resource planning, disease management and prevention across different regions. However, due to the laborious and high-cost nature of collecting such public health data, it is a common place to find many missing entries on these datasets, which challenges the utility of the data and hinders reliable analysis and understanding. To tackle this problem, this paper proposes a deep-learning-based approach, called Compressive Population Health (CPH), to infer and recover (to complete) the missing prevalence rate entries of multiple chronic diseases. The key insight of CPH relies on the combined exploitation of both intra-disease and inter-disease correlation opportunities. Specifically, we first propose a Convolutional Neural Network (CNN) based approach to extract and model both of these two types of correlations, and then adopt a Generative Adversarial Network (GAN) based prevalence inference model to jointly fuse them to facility the prevalence rates data recovery of missing entries. We extensively evaluate the inference model based on real-world public health datasets publicly available on the Web. Results show that our inference method outperforms other baseline methods in various settings and with a significantly improved accuracy (from 14.8% to 9.1%).

人口健康数据在互联网上比以往任何时候都越来越公开。这些数据集具有很大的潜力，可以更好地了解人口健康状况，并为卫生专业人员和决策者提供信息，以便在不同区域更好地进行资源规划、疾病管理和预防。然而，由于收集此类公共卫生数据的费力和高成本性质，在这些数据集中经常发现许多缺失条目，这对数据的效用提出了挑战，并阻碍了可靠的分析和理解。为了解决这一问题，本文提出了一种基于深度学习的方法，称为压缩人口健康(CPH)，以推断和恢复(以完成)缺失的多种慢性疾病的患病率条目。CPH的关键洞察力依赖于疾病内和疾病间相关性机会的综合利用。具体来说，我们首先提出了一种基于卷积神经网络(CNN)的方法来提取和建模这两种类型的相关性，然后采用基于生成对抗网络(GAN)的患病率推理模型来联合融合它们，以促进缺失条目的患病率数据恢复。我们广泛评估了基于Web上公开的真实世界公共卫生数据集的推理模型。结果表明，我们的推理方法在各种设置下都优于其他基准方法，并且准确率显著提高(从14.8%提高到9.1%)。

{"title":"Completing Missing Prevalence Rates for Multiple Chronic Diseases by Jointly Leveraging Both Intra- and Inter-Disease Population Health Data Correlations","authors":"Yujie Feng, Jiangtao Wang, Yasha Wang, A. Helal","doi":"10.1145/3442381.3449811","DOIUrl":"https://doi.org/10.1145/3442381.3449811","url":null,"abstract":"Population health data are becoming more and more publicly available on the Internet than ever before. Such datasets offer a great potential for enabling a better understanding of the health of populations, and inform health professionals and policy makers for better resource planning, disease management and prevention across different regions. However, due to the laborious and high-cost nature of collecting such public health data, it is a common place to find many missing entries on these datasets, which challenges the utility of the data and hinders reliable analysis and understanding. To tackle this problem, this paper proposes a deep-learning-based approach, called Compressive Population Health (CPH), to infer and recover (to complete) the missing prevalence rate entries of multiple chronic diseases. The key insight of CPH relies on the combined exploitation of both intra-disease and inter-disease correlation opportunities. Specifically, we first propose a Convolutional Neural Network (CNN) based approach to extract and model both of these two types of correlations, and then adopt a Generative Adversarial Network (GAN) based prevalence inference model to jointly fuse them to facility the prevalence rates data recovery of missing entries. We extensively evaluate the inference model based on real-world public health datasets publicly available on the Web. Results show that our inference method outperforms other baseline methods in various settings and with a significantly improved accuracy (from 14.8% to 9.1%).","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129508231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11