Proceedings of the Web Conference 2021最新文献_第6页

CoopEdge: A Decentralized Blockchain-based Platform for Cooperative Edge Computing CoopEdge:一个去中心化的基于区块链的协作边缘计算平台

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449994

Liang Yuan, Qiang He, Siyu Tan, Bo Li, Jiangshan Yu, Feifei Chen, Hai Jin, Yun Yang

Edge computing (EC) has recently emerged as a novel computing paradigm that offers users low-latency services. Suffering from constrained computing resources due to their limited physical sizes, edge servers cannot always handle all the incoming computation tasks timely when they operate independently. They often need to cooperate through peer-offloading. Deployed and managed by different stakeholders, edge servers operate in a distrusted environment. Trust and incentive are the two main issues that challenge cooperative computing between them. Another unique challenge in the EC environment is to facilitate trust and incentive in a decentralized manner. To tackle these challenges systematically, this paper proposes CoopEdge, a novel blockchain-based decentralized platform, to drive and support cooperative edge computing. On CoopEdge, an edge server can publish a computation task for other edge servers to contend for. A winner is selected from candidate edge servers based on their reputations. After that, a consensus is reached among edge servers to record the performance in task execution on blockchain. We implement CoopEdge based on Hyperledger Sawtooth and evaluate it experimentally against a baseline and two state-of-the-art implementations in a simulated EC environment. The results validate the usefulness of CoopEdge and demonstrate its performance.

边缘计算(EC)最近作为一种新的计算范式出现，为用户提供低延迟服务。边缘服务器由于物理大小有限，计算资源有限，当它们独立运行时，不能总是及时处理所有传入的计算任务。他们通常需要通过同伴卸载进行合作。边缘服务器由不同的利益相关者部署和管理，在不受信任的环境中运行。信任和激励是挑战它们之间协同计算的两个主要问题。欧共体环境中的另一个独特挑战是以分散的方式促进信任和激励。为了系统地解决这些挑战，本文提出了一种新的基于区块链的去中心化平台CoopEdge来驱动和支持协作边缘计算。在CoopEdge上，一个边缘服务器可以发布一个计算任务供其他边缘服务器竞争。获胜者将根据其声誉从候选边缘服务器中选出。之后，边缘服务器之间达成共识，记录区块链上任务执行的性能。我们实现了基于Hyperledger锯齿的CoopEdge，并在模拟EC环境中针对基线和两种最先进的实现进行了实验评估。结果验证了CoopEdge的有效性和性能。

{"title":"CoopEdge: A Decentralized Blockchain-based Platform for Cooperative Edge Computing","authors":"Liang Yuan, Qiang He, Siyu Tan, Bo Li, Jiangshan Yu, Feifei Chen, Hai Jin, Yun Yang","doi":"10.1145/3442381.3449994","DOIUrl":"https://doi.org/10.1145/3442381.3449994","url":null,"abstract":"Edge computing (EC) has recently emerged as a novel computing paradigm that offers users low-latency services. Suffering from constrained computing resources due to their limited physical sizes, edge servers cannot always handle all the incoming computation tasks timely when they operate independently. They often need to cooperate through peer-offloading. Deployed and managed by different stakeholders, edge servers operate in a distrusted environment. Trust and incentive are the two main issues that challenge cooperative computing between them. Another unique challenge in the EC environment is to facilitate trust and incentive in a decentralized manner. To tackle these challenges systematically, this paper proposes CoopEdge, a novel blockchain-based decentralized platform, to drive and support cooperative edge computing. On CoopEdge, an edge server can publish a computation task for other edge servers to contend for. A winner is selected from candidate edge servers based on their reputations. After that, a consensus is reached among edge servers to record the performance in task execution on blockchain. We implement CoopEdge based on Hyperledger Sawtooth and evaluate it experimentally against a baseline and two state-of-the-art implementations in a simulated EC environment. The results validate the usefulness of CoopEdge and demonstrate its performance.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116452506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 79

NTAM: Neighborhood-Temporal Attention Model for Disk Failure Prediction in Cloud Platforms 基于邻域-时间注意力模型的云平台硬盘故障预测

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449867

Chuan Luo, Pu Zhao, Bo Qiao, Youjiang Wu, Hongyu Zhang, Wei Wu, Weihai Lu, Yingnong Dang, S. Rajmohan, Qingwei Lin, Dongmei Zhang

With the rapid deployment of cloud platforms, high service reliability is of critical importance. An industrial cloud platform contains a huge number of disks, and disk failure is a common cause of service unreliability. In recent years, many machine learning based disk failure prediction approaches have been proposed, and they can predict disk failures based on disk status data before the failures actually happen. In this way, proactive actions can be taken in advance to improve service reliability. However, existing approaches treat each disk individually and do not explore the influence of the neighboring disks. In this paper, we propose Neighborhood-Temporal Attention Model (NTAM), a novel deep learning based approach to disk failure prediction. When predicting whether or not a disk will fail in near future, NTAM is a novel approach that not only utilizes a disk’s own status data, but also considers its neighbors’ status data. Moreover, NTAM includes a novel attention-based temporal component to capture the temporal nature of the disk status data. Besides, we propose a data enhancement method, called Temporal Progressive Sampling (TPS), to handle the extreme data imbalance issue. We evaluate NTAM on a public dataset as well as two industrial datasets collected from millions of disks in Microsoft Azure. Our experimental results show that NTAM significantly outperforms state-of-the-art competitors. Also, our empirical evaluations indicate the effectiveness of the neighborhood-ware component and the temporal component underlying NTAM as well as the effectiveness of TPS. More encouragingly, we have successfully applied NTAM and TPS to Microsoft cloud platforms (including Microsoft Azure and Microsoft 365) and obtained benefits in industrial practice.

随着云平台的快速部署，高业务可靠性至关重要。工业云平台中存在大量硬盘，硬盘故障是导致业务不可靠的常见原因。近年来，人们提出了许多基于机器学习的磁盘故障预测方法，这些方法可以在故障实际发生之前根据磁盘状态数据预测磁盘故障。这样可以提前采取主动措施，提高业务的可靠性。然而，现有的方法对每个磁盘进行单独处理，并没有探索相邻磁盘的影响。在本文中，我们提出了一种基于深度学习的新型磁盘故障预测方法——邻域-时间注意模型(NTAM)。在预测磁盘在不久的将来是否会发生故障时，NTAM是一种新颖的方法，它不仅利用磁盘自己的状态数据，而且还考虑其邻居的状态数据。此外，NTAM还包括一个新颖的基于注意力的时间组件，用于捕获磁盘状态数据的时间特性。此外，我们还提出了一种数据增强方法，称为时序渐进采样(TPS)，以处理极端数据不平衡问题。我们在一个公共数据集以及两个从Microsoft Azure中数百万磁盘收集的工业数据集上评估NTAM。我们的实验结果表明，NTAM显著优于最先进的竞争对手。此外，我们的实证评估表明邻域分量和时间分量的有效性以及TPS的有效性。更令人鼓舞的是，我们已经成功地将NTAM和TPS应用于微软云平台(包括微软Azure和微软365)，并在工业实践中获得了效益。

{"title":"NTAM: Neighborhood-Temporal Attention Model for Disk Failure Prediction in Cloud Platforms","authors":"Chuan Luo, Pu Zhao, Bo Qiao, Youjiang Wu, Hongyu Zhang, Wei Wu, Weihai Lu, Yingnong Dang, S. Rajmohan, Qingwei Lin, Dongmei Zhang","doi":"10.1145/3442381.3449867","DOIUrl":"https://doi.org/10.1145/3442381.3449867","url":null,"abstract":"With the rapid deployment of cloud platforms, high service reliability is of critical importance. An industrial cloud platform contains a huge number of disks, and disk failure is a common cause of service unreliability. In recent years, many machine learning based disk failure prediction approaches have been proposed, and they can predict disk failures based on disk status data before the failures actually happen. In this way, proactive actions can be taken in advance to improve service reliability. However, existing approaches treat each disk individually and do not explore the influence of the neighboring disks. In this paper, we propose Neighborhood-Temporal Attention Model (NTAM), a novel deep learning based approach to disk failure prediction. When predicting whether or not a disk will fail in near future, NTAM is a novel approach that not only utilizes a disk’s own status data, but also considers its neighbors’ status data. Moreover, NTAM includes a novel attention-based temporal component to capture the temporal nature of the disk status data. Besides, we propose a data enhancement method, called Temporal Progressive Sampling (TPS), to handle the extreme data imbalance issue. We evaluate NTAM on a public dataset as well as two industrial datasets collected from millions of disks in Microsoft Azure. Our experimental results show that NTAM significantly outperforms state-of-the-art competitors. Also, our empirical evaluations indicate the effectiveness of the neighborhood-ware component and the temporal component underlying NTAM as well as the effectiveness of TPS. More encouragingly, we have successfully applied NTAM and TPS to Microsoft cloud platforms (including Microsoft Azure and Microsoft 365) and obtained benefits in industrial practice.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126555974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Estimation of Fair Ranking Metrics with Incomplete Judgments 不完全判断下公平排名指标的估计

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450080

Ömer Kirnap, Fernando Diaz, Asia J. Biega, Michael D. Ekstrand, Ben Carterette, Emine Yilmaz

There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individuals are rarely present, limiting the application of fair ranking metrics in large scale systems. In order to address this problem, we propose a sampling strategy and estimation technique for four fair ranking metrics. We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items. We evaluate our approach using both simulated and real world data. Our experimental results demonstrate that our method can estimate this family of fair ranking metrics and provides a robust, reliable alternative to exhaustive or random data annotation.

评估搜索系统排序决策的公平性越来越受到关注。这些度量标准通常考虑特定组的项目成员，通常使用受保护的属性(如性别或种族)来标识。到目前为止，这些指标通常假定项目的受保护属性标签的可用性和完整性。然而，个体的受保护属性很少存在，这限制了公平排名指标在大规模系统中的应用。为了解决这一问题，我们提出了一种针对四个公平排名指标的抽样策略和估计技术。我们制定了一个稳健的无偏估计器，它可以在非常有限的标记项目数量下运行。我们使用模拟和真实世界的数据来评估我们的方法。我们的实验结果表明，我们的方法可以估计出这一系列公平的排名指标，并提供了一种鲁棒、可靠的替代穷举或随机数据注释。

引用次数: 27

Sketch-based Algorithms for Approximate Shortest Paths in Road Networks 道路网络中基于草图的近似最短路径算法

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450083

Gaurav Aggarwal, Sreenivas Gollapudi, Raghavender, A. Sinop

Constructing efficient data structures (distance oracles) for fast computation of shortest paths and other connectivity measures in graphs has been a promising area of study in computer science [23, 24, 28]. In this paper, we propose very efficient algorithms, based on a distance oracle, for computing approximate shortest paths and alternate paths in road networks. Specifically, we adopt a distance oracle construction that exploits the existence of small separators in such networks. In other words, the existence of a small cut in a graph admits a partitioning of the graph into balanced components with a small number of inter-component edges. We demonstrate the efficacy of our algorithm by using it to find near optimal shortest paths and show that it also has the desired properties of well-studied goal-oriented path search algorithms such as ALT [12]. We further demonstrate the use of our distance oracle to produce multiple alternative routes in addition to the shortest path. Finally, we empirically demonstrate that our method, while exploring few edges, produces high quality alternates with respect to metrics such as optimality-loss and diversity of paths.

构建高效的数据结构(距离预言器)来快速计算图中最短路径和其他连通性度量是计算机科学中一个很有前途的研究领域[23,24,28]。在本文中，我们提出了非常有效的算法，基于距离预言，计算道路网络中的近似最短路径和备用路径。具体地说，我们采用了一种远程oracle结构，利用这种网络中存在的小分隔符。换句话说，图中存在一个小切口，就允许将图划分为具有少量组件间边的平衡组件。我们通过使用我们的算法寻找近最优最短路径来证明我们算法的有效性，并表明它也具有充分研究的面向目标的路径搜索算法(如ALT[12])所需的特性。我们进一步演示了使用距离预测器来生成除了最短路径之外的多个备选路径。最后，我们通过经验证明，我们的方法在探索少数边缘的同时，可以产生高质量的替代方案，例如最优性损失和路径多样性。

{"title":"Sketch-based Algorithms for Approximate Shortest Paths in Road Networks","authors":"Gaurav Aggarwal, Sreenivas Gollapudi, Raghavender, A. Sinop","doi":"10.1145/3442381.3450083","DOIUrl":"https://doi.org/10.1145/3442381.3450083","url":null,"abstract":"Constructing efficient data structures (distance oracles) for fast computation of shortest paths and other connectivity measures in graphs has been a promising area of study in computer science [23, 24, 28]. In this paper, we propose very efficient algorithms, based on a distance oracle, for computing approximate shortest paths and alternate paths in road networks. Specifically, we adopt a distance oracle construction that exploits the existence of small separators in such networks. In other words, the existence of a small cut in a graph admits a partitioning of the graph into balanced components with a small number of inter-component edges. We demonstrate the efficacy of our algorithm by using it to find near optimal shortest paths and show that it also has the desired properties of well-studied goal-oriented path search algorithms such as ALT [12]. We further demonstrate the use of our distance oracle to produce multiple alternative routes in addition to the shortest path. Finally, we empirically demonstrate that our method, while exploring few edges, produces high quality alternates with respect to metrics such as optimality-loss and diversity of paths.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"455 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124315671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Diversification-Aware Learning to Rank using Distributed Representation 使用分布式表示的多样性感知学习排序

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449831

Le Yan, Zhen Qin, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky

Existing work on search result diversification typically falls into the “next document” paradigm, that is, selecting the next document based on the ones already chosen. A sequential process of selecting documents one-by-one is naturally modeled in learning-based approaches. However, such a process makes the learning difficult because there are an exponential number of ranking lists to consider. Sampling is usually used to reduce the computational complexity but this makes the learning less effective. In this paper, we propose a soft version of the “next document” paradigm in which we associate each document with an approximate rank, and thus the subtopics covered prior to a document can also be estimated. We show that we can derive differentiable diversification-aware losses, which are smooth approximation of diversity metrics like α-NDCG, based on these estimates. We further propose to optimize the losses in the learning-to-rank setting using neural distributed representations of queries and documents. Experiments are conducted on the public benchmark TREC datasets. By comparing with an extensive list of baseline methods, we show that our Diversification-Aware LEarning-TO-Rank (DALETOR) approaches outperform them by a large margin, while being much simpler during learning and inference.

现有的搜索结果多样化工作通常属于“下一个文档”范式，即根据已经选择的文档选择下一个文档。一个接一个地选择文档的顺序过程在基于学习的方法中自然地被建模。然而，这样的过程使学习变得困难，因为要考虑的排名列表的数量是指数级的。采样通常用于降低计算复杂度，但这会降低学习的效率。在本文中，我们提出了“下一个文档”范式的软版本，其中我们将每个文档与一个近似等级相关联，因此也可以估计文档之前覆盖的子主题。我们证明，基于这些估计，我们可以推导出可微的多样化感知损失，这是多样性指标(如α-NDCG)的光滑逼近。我们进一步建议使用查询和文档的神经分布式表示来优化学习排序设置中的损失。在公共基准TREC数据集上进行了实验。通过与广泛的基线方法列表进行比较，我们表明我们的多样化感知学习排序(DALETOR)方法在很大程度上优于它们，同时在学习和推理过程中更简单。

{"title":"Diversification-Aware Learning to Rank using Distributed Representation","authors":"Le Yan, Zhen Qin, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky","doi":"10.1145/3442381.3449831","DOIUrl":"https://doi.org/10.1145/3442381.3449831","url":null,"abstract":"Existing work on search result diversification typically falls into the “next document” paradigm, that is, selecting the next document based on the ones already chosen. A sequential process of selecting documents one-by-one is naturally modeled in learning-based approaches. However, such a process makes the learning difficult because there are an exponential number of ranking lists to consider. Sampling is usually used to reduce the computational complexity but this makes the learning less effective. In this paper, we propose a soft version of the “next document” paradigm in which we associate each document with an approximate rank, and thus the subtopics covered prior to a document can also be estimated. We show that we can derive differentiable diversification-aware losses, which are smooth approximation of diversity metrics like α-NDCG, based on these estimates. We further propose to optimize the losses in the learning-to-rank setting using neural distributed representations of queries and documents. Experiments are conducted on the public benchmark TREC datasets. By comparing with an extensive list of baseline methods, we show that our Diversification-Aware LEarning-TO-Rank (DALETOR) approaches outperform them by a large margin, while being much simpler during learning and inference.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127979485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Cost-Effective and Interpretable Job Skill Recommendation with Deep Reinforcement Learning 基于深度强化学习的经济高效且可解释的工作技能推荐

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449985

Ying Sun, Fuzhen Zhuang, Hengshu Zhu, Qing He, Hui Xiong

Nowadays, as organizations operate in very fast-paced and competitive environments, workforce has to be agile and adaptable to regularly learning new job skills. However, it is nontrivial for talents to know which skills to develop at each working stage. To this end, in this paper, we aim to develop a cost-effective recommendation system based on deep reinforcement learning, which can provide personalized and interpretable job skill recommendation for each talent. Specifically, we first design an environment to estimate the utilities of skill learning by mining the massive job advertisement data, which includes a skill-matching-based salary estimator and a frequent itemset-based learning difficulty estimator. Based on the environment, we design a Skill Recommendation Deep Q-Network (SRDQN) with multi-task structure to estimate the long-term skill learning utilities. In particular, SRDQN recommends job skills in a personalized and cost-effective manner; that is, the talents will only learn the recommended necessary skills for achieving their career goals. Finally, extensive experiments on a real-world dataset clearly validate the effectiveness and interpretability of our approach.

如今，由于组织在非常快节奏和竞争激烈的环境中运作，员工必须敏捷并适应定期学习新的工作技能。然而，对于人才来说，知道在每个工作阶段应该发展哪些技能是非常重要的。为此，本文旨在开发一种基于深度强化学习的高性价比推荐系统，为每个人才提供个性化的、可解释的工作技能推荐。具体来说，我们首先设计了一个环境，通过挖掘大量的招聘广告数据来估计技能学习的效用，该环境包括一个基于技能匹配的工资估计器和一个基于频繁项目集的学习难度估计器。基于环境，我们设计了一个多任务结构的技能推荐深度q网络(SRDQN)来估计长期的技能学习效用。特别是，SRDQN以个性化和成本效益的方式推荐工作技能;也就是说，这些人才只会学习为实现他们的职业目标所推荐的必要技能。最后，在真实数据集上的大量实验清楚地验证了我们方法的有效性和可解释性。

{"title":"Cost-Effective and Interpretable Job Skill Recommendation with Deep Reinforcement Learning","authors":"Ying Sun, Fuzhen Zhuang, Hengshu Zhu, Qing He, Hui Xiong","doi":"10.1145/3442381.3449985","DOIUrl":"https://doi.org/10.1145/3442381.3449985","url":null,"abstract":"Nowadays, as organizations operate in very fast-paced and competitive environments, workforce has to be agile and adaptable to regularly learning new job skills. However, it is nontrivial for talents to know which skills to develop at each working stage. To this end, in this paper, we aim to develop a cost-effective recommendation system based on deep reinforcement learning, which can provide personalized and interpretable job skill recommendation for each talent. Specifically, we first design an environment to estimate the utilities of skill learning by mining the massive job advertisement data, which includes a skill-matching-based salary estimator and a frequent itemset-based learning difficulty estimator. Based on the environment, we design a Skill Recommendation Deep Q-Network (SRDQN) with multi-task structure to estimate the long-term skill learning utilities. In particular, SRDQN recommends job skills in a personalized and cost-effective manner; that is, the talents will only learn the recommended necessary skills for achieving their career goals. Finally, extensive experiments on a real-world dataset clearly validate the effectiveness and interpretability of our approach.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128814494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Twin Peaks, a Model for Recurring Cascades 双峰，一个重复级联的模型

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449807

Matteo Almanza, Silvio Lattanzi, A. Panconesi, G. Re

Understanding information dynamics and their resulting cascades is a central topic in social network analysis. In a recent seminal work, Cheng et al. analyzed multiples cascades on Facebook over several months, and noticed that many of them exhibit a recurring behaviour. They tend to have multiple peaks of popularity, with periods of quiescence in between. In this paper, we propose the first mathematical model that provably explains this interesting phenomenon, besides exhibiting other fundamental properties of information cascades. Our model is simple and shows that it is enough to have a good clustering structure to observe this interesting recurring behaviour with a standard information diffusion model. Furthermore, we complement our theoretical analysis with an experimental evaluation where we show that our model is able to reproduce the observed phenomenon on several social networks.

理解信息动态及其产生的级联是社会网络分析的中心主题。在最近的一项开创性工作中，Cheng等人在几个月内分析了Facebook上的多个级联，并注意到其中许多人表现出反复出现的行为。他们往往有多个人气高峰，中间有一段沉寂期。在本文中，我们提出了第一个可以证明解释这一有趣现象的数学模型，并展示了信息级联的其他基本特性。我们的模型很简单，并且表明用标准的信息扩散模型来观察这种有趣的重复行为，只要有一个良好的聚类结构就足够了。此外，我们用实验评估来补充我们的理论分析，我们表明我们的模型能够在几个社交网络上重现观察到的现象。

引用次数: 3

HGCF: Hyperbolic Graph Convolution Networks for Collaborative Filtering 协同过滤的双曲图卷积网络

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450101

Jianing Sun, Zhaoyue Cheng, S. Zuberi, Felipe Pérez, M. Volkovs

Hyperbolic spaces offer a rich setup to learn embeddings with superior properties that have been leveraged in areas such as computer vision, natural language processing and computational biology. Recently, several hyperbolic approaches have been proposed to learn robust representations for users and items in the recommendation setting. However, these approaches don’t capture the higher order relationships that typically exist in the recommendation domain. Graph convolutional neural networks (GCNs) on the other hand excel at capturing higher order information by applying multiple levels of aggregation to local representations. In this paper we combine these frameworks in a novel way, by proposing a hyperbolic GCN model for collaborative filtering. We demonstrate that our model can be effectively learned with a margin ranking loss, and show that hyperbolic space has desirable properties under the rank margin setting. At test time, inference in our model is done using the hyperbolic distance which preserves the structure of the learned space. We conduct extensive empirical analysis on three public benchmarks and compare against a large set of baselines. Our approach achieves highly competitive results and outperforms leading baselines including the Euclidean GCN counterpart. We further study the properties of the learned hyperbolic embeddings and show that they offer meaningful insights into the data. Full code for this work is available here: https://github.com/layer6ai-labs/HGCF.

双曲空间为学习在计算机视觉、自然语言处理和计算生物学等领域中利用的具有优越特性的嵌入提供了丰富的设置。最近，人们提出了几种双曲线方法来学习推荐设置中用户和项目的鲁棒表示。然而，这些方法不能捕获推荐领域中通常存在的高阶关系。另一方面，图卷积神经网络(GCNs)擅长通过对局部表示应用多级聚合来捕获高阶信息。在本文中，我们以一种新颖的方式结合了这些框架，提出了一个双曲GCN模型用于协同过滤。我们证明了我们的模型可以有效地学习与边际排序损失，并表明双曲空间具有理想的性质下的排名边际设置。在测试时，我们的模型使用双曲距离进行推理，该距离保留了学习空间的结构。我们对三个公共基准进行了广泛的实证分析，并与大量基线进行了比较。我们的方法取得了极具竞争力的结果，并优于领先的基线，包括欧几里得GCN对应。我们进一步研究了学习到的双曲嵌入的性质，并表明它们为数据提供了有意义的见解。完整的代码可以在这里找到:https://github.com/layer6ai-labs/HGCF。

{"title":"HGCF: Hyperbolic Graph Convolution Networks for Collaborative Filtering","authors":"Jianing Sun, Zhaoyue Cheng, S. Zuberi, Felipe Pérez, M. Volkovs","doi":"10.1145/3442381.3450101","DOIUrl":"https://doi.org/10.1145/3442381.3450101","url":null,"abstract":"Hyperbolic spaces offer a rich setup to learn embeddings with superior properties that have been leveraged in areas such as computer vision, natural language processing and computational biology. Recently, several hyperbolic approaches have been proposed to learn robust representations for users and items in the recommendation setting. However, these approaches don’t capture the higher order relationships that typically exist in the recommendation domain. Graph convolutional neural networks (GCNs) on the other hand excel at capturing higher order information by applying multiple levels of aggregation to local representations. In this paper we combine these frameworks in a novel way, by proposing a hyperbolic GCN model for collaborative filtering. We demonstrate that our model can be effectively learned with a margin ranking loss, and show that hyperbolic space has desirable properties under the rank margin setting. At test time, inference in our model is done using the hyperbolic distance which preserves the structure of the learned space. We conduct extensive empirical analysis on three public benchmarks and compare against a large set of baselines. Our approach achieves highly competitive results and outperforms leading baselines including the Euclidean GCN counterpart. We further study the properties of the learned hyperbolic embeddings and show that they offer meaningful insights into the data. Full code for this work is available here: https://github.com/layer6ai-labs/HGCF.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132763326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76

DAPter: Preventing User Data Abuse in Deep Learning Inference Services DAPter:防止深度学习推理服务中的用户数据滥用

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449907

Hao Wu, Xuejin Tian, Yuhang Gong, Xing Su, Minghao Li, Fengyuan Xu

The data abuse issue has risen along with the widespread development of the deep learning inference service (DLIS). Specifically, mobile users worry about their input data being labeled to secretly train new deep learning models that are unrelated to the DLIS they subscribe to. This unique issue, unlike the privacy problem, is about the rights of data owners in the context of deep learning. However, preventing data abuse is demanding when considering the usability and generality in the mobile scenario. In this work, we propose, to our best knowledge, the first data abuse prevention mechanism called DAPter. DAPter is a user-side DLIS-input converter, which removes unnecessary information with respect to the targeted DLIS. The converted input data by DAPter maintains good inference accuracy and is difficult to be labeled manually or automatically for the new model training. DAPter’s conversion is empowered by our lightweight generative model trained with a novel loss function to minimize abusable information in the input data. Furthermore, adapting DAPter requires no change in the existing DLIS backend and models. We conduct comprehensive experiments with our DAPter prototype on mobile devices and demonstrate that DAPter can substantially raise the bar of the data abuse difficulty with little impact on the service quality and overhead.

随着深度学习推理服务(DLIS)的广泛发展，数据滥用问题也随之出现。具体来说，移动用户担心他们的输入数据被标记为秘密训练新的深度学习模型，而这些模型与他们订阅的DLIS无关。与隐私问题不同，这个独特的问题是关于深度学习背景下数据所有者的权利。然而，在考虑移动场景的可用性和通用性时，防止数据滥用是一项要求。在这项工作中，据我们所知，我们提出了第一个数据滥用预防机制，称为DAPter。dapper是一种用户端DLIS输入转换器，它可以去除与目标DLIS相关的不必要信息。经过dapper转换后的输入数据保持了良好的推理精度，难以对新模型训练进行人工或自动标注。我们的轻量级生成模型训练了一个新的损失函数，以最小化输入数据中的可滥用信息，从而增强了dapper的转换能力。此外，调整dapper不需要更改现有的lis后端和模型。我们在移动设备上对我们的DAPter原型进行了全面的实验，并证明DAPter可以在对服务质量和开销影响很小的情况下大幅提高数据滥用难度。

{"title":"DAPter: Preventing User Data Abuse in Deep Learning Inference Services","authors":"Hao Wu, Xuejin Tian, Yuhang Gong, Xing Su, Minghao Li, Fengyuan Xu","doi":"10.1145/3442381.3449907","DOIUrl":"https://doi.org/10.1145/3442381.3449907","url":null,"abstract":"The data abuse issue has risen along with the widespread development of the deep learning inference service (DLIS). Specifically, mobile users worry about their input data being labeled to secretly train new deep learning models that are unrelated to the DLIS they subscribe to. This unique issue, unlike the privacy problem, is about the rights of data owners in the context of deep learning. However, preventing data abuse is demanding when considering the usability and generality in the mobile scenario. In this work, we propose, to our best knowledge, the first data abuse prevention mechanism called DAPter. DAPter is a user-side DLIS-input converter, which removes unnecessary information with respect to the targeted DLIS. The converted input data by DAPter maintains good inference accuracy and is difficult to be labeled manually or automatically for the new model training. DAPter’s conversion is empowered by our lightweight generative model trained with a novel loss function to minimize abusable information in the input data. Furthermore, adapting DAPter requires no change in the existing DLIS backend and models. We conduct comprehensive experiments with our DAPter prototype on mobile devices and demonstrate that DAPter can substantially raise the bar of the data abuse difficulty with little impact on the service quality and overhead.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117232381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification 基于标签关注的极端分类图神经网络

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449937

Deepak Saini, A. Jain, Kushal Dave, Jian Jiao, Amit Singh, Ruofei Zhang, M. Varma

This paper develops the GalaXC algorithm for Extreme Classification, where the task is to annotate a document with the most relevant subset of labels from an extremely large label set. Extreme classification has been successfully applied to several real world web-scale applications such as web search, product recommendation, query rewriting, etc. GalaXC identifies two critical deficiencies in leading extreme classification algorithms. First, existing approaches generally assume that documents and labels reside in disjoint sets, even though in several applications, labels and documents cohabit the same space. Second, several approaches, albeit scalable, do not utilize various forms of metadata offered by applications, such as label text and label correlations. To remedy these, GalaXC presents a framework that enables collaborative learning over joint document-label graphs at massive scales, in a way that naturally allows various auxiliary sources of information, including label metadata, to be incorporated. GalaXC also introduces a novel label-wise attention mechanism to meld high-capacity extreme classifiers with its framework. An efficient end-to-end implementation of GalaXC is presented that could be trained on a dataset with 50M labels and 97M training documents in less than 100 hours on 4 × V100 GPUs. This allowed GalaXC to not only scale to applications with several millions of labels, but also be up to 18% more accurate than leading deep extreme classifiers, while being upto 2-50 × faster to train and 10 × faster to predict on benchmark datasets. GalaXC is particularly well-suited to warm-start scenarios where predictions need to be made on data points with partially revealed label sets, and was found to be up to 25% more accurate than extreme classification algorithms specifically designed for warm start settings. In A/B tests conducted on the Bing search engine, GalaXC could improve the Click Yield (CY) and coverage by 1.52% and 1.11% respectively. Code for GalaXC is available at https://github.com/Extreme-classification/GalaXC

本文开发了用于极端分类的GalaXC算法，该算法的任务是从一个极大的标签集中使用最相关的标签子集来注释文档。极端分类已经成功地应用于几个现实世界的网络规模应用，如网络搜索、产品推荐、查询重写等。GalaXC发现了两个主要极端分类算法的关键缺陷。首先，现有的方法通常假设文档和标签位于不相交的集合中，即使在几个应用程序中，标签和文档共存于同一空间。第二，有几种方法虽然是可伸缩的，但不利用应用程序提供的各种形式的元数据，如标签文本和标签相关性。为了解决这些问题，GalaXC提出了一个框架，可以在大规模的联合文档标签图上进行协作学习，以一种自然允许各种辅助信息源(包括标签元数据)被合并的方式。GalaXC还引入了一种新颖的标签注意机制，将高容量极端分类器与其框架融合在一起。提出了一种有效的端到端GalaXC实现，在4 × V100 gpu上，可以在不到100小时的时间内对具有50M个标签和97M个训练文档的数据集进行训练。这使得GalaXC不仅可以扩展到具有数百万个标签的应用程序，而且比领先的深度极端分类器的准确率提高了18%，同时训练速度提高了2-50倍，在基准数据集上的预测速度提高了10倍。GalaXC特别适合热启动场景，在这种场景中，需要对部分显示标签集的数据点进行预测，并且被发现比专门为热启动设置设计的极端分类算法准确率高出25%。在Bing搜索引擎上进行的A/B测试中，GalaXC可以将点击率(Click Yield, CY)和覆盖率分别提高1.52%和1.11%。GalaXC的代码可在https://github.com/Extreme-classification/GalaXC上获得

{"title":"GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification","authors":"Deepak Saini, A. Jain, Kushal Dave, Jian Jiao, Amit Singh, Ruofei Zhang, M. Varma","doi":"10.1145/3442381.3449937","DOIUrl":"https://doi.org/10.1145/3442381.3449937","url":null,"abstract":"This paper develops the GalaXC algorithm for Extreme Classification, where the task is to annotate a document with the most relevant subset of labels from an extremely large label set. Extreme classification has been successfully applied to several real world web-scale applications such as web search, product recommendation, query rewriting, etc. GalaXC identifies two critical deficiencies in leading extreme classification algorithms. First, existing approaches generally assume that documents and labels reside in disjoint sets, even though in several applications, labels and documents cohabit the same space. Second, several approaches, albeit scalable, do not utilize various forms of metadata offered by applications, such as label text and label correlations. To remedy these, GalaXC presents a framework that enables collaborative learning over joint document-label graphs at massive scales, in a way that naturally allows various auxiliary sources of information, including label metadata, to be incorporated. GalaXC also introduces a novel label-wise attention mechanism to meld high-capacity extreme classifiers with its framework. An efficient end-to-end implementation of GalaXC is presented that could be trained on a dataset with 50M labels and 97M training documents in less than 100 hours on 4 × V100 GPUs. This allowed GalaXC to not only scale to applications with several millions of labels, but also be up to 18% more accurate than leading deep extreme classifiers, while being upto 2-50 × faster to train and 10 × faster to predict on benchmark datasets. GalaXC is particularly well-suited to warm-start scenarios where predictions need to be made on data points with partially revealed label sets, and was found to be up to 25% more accurate than extreme classification algorithms specifically designed for warm start settings. In A/B tests conducted on the Bing search engine, GalaXC could improve the Click Yield (CY) and coverage by 1.52% and 1.11% respectively. Code for GalaXC is available at https://github.com/Extreme-classification/GalaXC","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"71 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131653856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31