Proceedings of the Web Conference 2021最新文献_第9页

Heterogeneous Graph Neural Network via Attribute Completion 基于属性补全的异构图神经网络

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449914

Di Jin, Cuiying Huo, Chundong Liang, Liang Yang

Heterogeneous information networks (HINs), also called heterogeneous graphs, are composed of multiple types of nodes and edges, and contain comprehensive information and rich semantics. Graph neural networks (GNNs), as powerful tools for graph data, have shown superior performance on network analysis. Recently, many excellent models have been proposed to process hetero-graph data using GNNs and have achieved great success. These GNN-based heterogeneous models can be interpreted as smooth node attributes guided by graph structure, which requires all nodes to have attributes. However, this is not easy to satisfy, as some types of nodes often have no attributes in heterogeneous graphs. Previous studies take some handcrafted methods to solve this problem, which separate the attribute completion from the graph learning process and, in turn, result in poor performance. In this paper, we hold that missing attributes can be acquired by a learnable manner, and propose a general framework for Heterogeneous Graph Neural Network via Attribute Completion (HGNN-AC), including pre-learning of topological embedding and attribute completion with attention mechanism. HGNN-AC first uses existing HIN-Embedding methods to obtain node topological embedding. Then it uses the topological relationship between nodes as guidance to complete attributes for no-attribute nodes by weighted aggregation of the attributes from these attributed nodes. Our complement mechanism can be easily combined with an arbitrary GNN-based heterogeneous model making the whole system end-to-end. We conduct extensive experiments on three real-world heterogeneous graphs. The results demonstrate the superiority of the proposed framework over state-of-the-art baselines.

异构信息网络(HINs)又称异构图，由多种类型的节点和边组成，包含全面的信息和丰富的语义。图神经网络作为处理图数据的强大工具，在网络分析方面表现出了优异的性能。近年来，人们提出了许多利用gnn处理异图数据的优秀模型，并取得了很大的成功。这些基于gnn的异构模型可以解释为在图结构引导下的平滑节点属性，要求所有节点都具有属性。然而，这并不容易满足，因为在异构图中，某些类型的节点通常没有属性。以往的研究采用一些手工制作的方法来解决这个问题，这些方法将属性补全与图学习过程分离开来，从而导致性能不佳。本文认为缺失属性可以通过可学习的方式获得，并提出了基于属性补全(Attribute Completion, HGNN-AC)的异构图神经网络通用框架，包括拓扑嵌入的预学习和带有注意机制的属性补全。HGNN-AC首先利用现有的HIN-Embedding方法获得节点拓扑嵌入。然后以节点间的拓扑关系为指导，通过对这些有属性节点的属性进行加权聚合来完成无属性节点的属性。我们的互补机制可以很容易地与任意基于gnn的异构模型相结合，使整个系统端到端。我们在三个真实的异构图上进行了广泛的实验。结果表明，所提出的框架优于最先进的基线。

{"title":"Heterogeneous Graph Neural Network via Attribute Completion","authors":"Di Jin, Cuiying Huo, Chundong Liang, Liang Yang","doi":"10.1145/3442381.3449914","DOIUrl":"https://doi.org/10.1145/3442381.3449914","url":null,"abstract":"Heterogeneous information networks (HINs), also called heterogeneous graphs, are composed of multiple types of nodes and edges, and contain comprehensive information and rich semantics. Graph neural networks (GNNs), as powerful tools for graph data, have shown superior performance on network analysis. Recently, many excellent models have been proposed to process hetero-graph data using GNNs and have achieved great success. These GNN-based heterogeneous models can be interpreted as smooth node attributes guided by graph structure, which requires all nodes to have attributes. However, this is not easy to satisfy, as some types of nodes often have no attributes in heterogeneous graphs. Previous studies take some handcrafted methods to solve this problem, which separate the attribute completion from the graph learning process and, in turn, result in poor performance. In this paper, we hold that missing attributes can be acquired by a learnable manner, and propose a general framework for Heterogeneous Graph Neural Network via Attribute Completion (HGNN-AC), including pre-learning of topological embedding and attribute completion with attention mechanism. HGNN-AC first uses existing HIN-Embedding methods to obtain node topological embedding. Then it uses the topological relationship between nodes as guidance to complete attributes for no-attribute nodes by weighted aggregation of the attributes from these attributed nodes. Our complement mechanism can be easily combined with an arbitrary GNN-based heterogeneous model making the whole system end-to-end. We conduct extensive experiments on three real-world heterogeneous graphs. The results demonstrate the superiority of the proposed framework over state-of-the-art baselines.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121762973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 104

High-Dimensional Sparse Cross-Modal Hashing with Fine-Grained Similarity Embedding 具有细粒度相似嵌入的高维稀疏跨模态哈希

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449798

Yongxin Wang, Zhen-Duo Chen, Xin Luo, Xin-Shun Xu

Recently, with the discoveries in neurobiology, high-dimensional sparse hashing has attracted increasing attention. In contrast with general hashing that generates low-dimensional hash codes, the high-dimensional sparse hashing maps inputs into a higher dimensional space and generates sparse hash codes, achieving superior performance. However, the sparse hashing has not been fully studied in hashing literature yet. For example, how to fully explore the power of sparse coding in cross-modal retrieval tasks; how to discretely solve the binary and sparse constraints so as to avoid the quantization error problem. Motivated by these issues, in this paper, we present an efficient sparse hashing method, i.e., High-dimensional Sparse Cross-modal Hashing, HSCH for short. It not only takes the high-level semantic similarity of data into consideration, but also properly exploits the low-level feature similarity. In specific, we theoretically design a fine-grained similarity with two critical fusion rules. Then we take advantage of sparse codes to embed the fine-grained similarity into the to-be-learnt hash codes. Moreover, an efficient discrete optimization algorithm is proposed to solve the binary and sparse constraints, reducing the quantization error. In light of this, it becomes much more trainable, and the learnt hash codes are more discriminative. More importantly, the retrieval complexity of HSCH is as efficient as general hash methods. Extensive experiments on three widely-used datasets demonstrate the superior performance of HSCH compared with several state-of-the-art cross-modal hashing approaches.

近年来，随着神经生物学的新发现，高维稀疏散列越来越受到人们的关注。与生成低维哈希码的普通哈希相比，高维稀疏哈希将输入映射到高维空间并生成稀疏哈希码，从而获得更好的性能。然而，在散列文献中对稀疏散列的研究还不够充分。例如，如何在跨模态检索任务中充分挖掘稀疏编码的力量;如何离散地求解二值约束和稀疏约束以避免量化误差问题。基于这些问题，本文提出了一种高效的稀疏哈希方法，即高维稀疏交叉模态哈希，简称HSCH。它既考虑了数据的高层语义相似度，又适当地利用了低层特征相似度。具体来说，我们从理论上设计了一个具有两个关键融合规则的细粒度相似性。然后利用稀疏码将细粒度相似度嵌入到待学习的哈希码中。此外，提出了一种有效的离散优化算法来解决二值约束和稀疏约束，减小了量化误差。鉴于此，它变得更加可训练，并且学习到的哈希码更具辨别性。更重要的是，HSCH的检索复杂度与一般哈希方法一样高效。在三个广泛使用的数据集上进行的大量实验表明，与几种最先进的跨模态哈希方法相比，HSCH具有优越的性能。

{"title":"High-Dimensional Sparse Cross-Modal Hashing with Fine-Grained Similarity Embedding","authors":"Yongxin Wang, Zhen-Duo Chen, Xin Luo, Xin-Shun Xu","doi":"10.1145/3442381.3449798","DOIUrl":"https://doi.org/10.1145/3442381.3449798","url":null,"abstract":"Recently, with the discoveries in neurobiology, high-dimensional sparse hashing has attracted increasing attention. In contrast with general hashing that generates low-dimensional hash codes, the high-dimensional sparse hashing maps inputs into a higher dimensional space and generates sparse hash codes, achieving superior performance. However, the sparse hashing has not been fully studied in hashing literature yet. For example, how to fully explore the power of sparse coding in cross-modal retrieval tasks; how to discretely solve the binary and sparse constraints so as to avoid the quantization error problem. Motivated by these issues, in this paper, we present an efficient sparse hashing method, i.e., High-dimensional Sparse Cross-modal Hashing, HSCH for short. It not only takes the high-level semantic similarity of data into consideration, but also properly exploits the low-level feature similarity. In specific, we theoretically design a fine-grained similarity with two critical fusion rules. Then we take advantage of sparse codes to embed the fine-grained similarity into the to-be-learnt hash codes. Moreover, an efficient discrete optimization algorithm is proposed to solve the binary and sparse constraints, reducing the quantization error. In light of this, it becomes much more trainable, and the learnt hash codes are more discriminative. More importantly, the retrieval complexity of HSCH is as efficient as general hash methods. Extensive experiments on three widely-used datasets demonstrate the superior performance of HSCH compared with several state-of-the-art cross-modal hashing approaches.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123321661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Highly Liquid Temporal Interaction Graph Embeddings 高度流动的时间相互作用图嵌入

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449921

Huidi Chen, Yun Xiong, Yangyong Zhu, Philip S. Yu

Capturing the topological and temporal information of interactions and predicting future interactions are crucial for many domains, such as social networks, financial transactions, and e-commerce. With the advent of co-evolutional models, the mutual influence between the interacted users and items are captured. However, existing models only update the interaction information of nodes along the timeline. It causes the problem of information asymmetry, where early updated nodes often have much less information than the most recently updated nodes. The information asymmetry is essentially a blockage of information flow. We propose HILI (Highly Liquid Temporal Interaction Graph Embeddings) to predict highly liquid embeddings on temporal interaction graphs. Our embedding model makes interaction information highly liquid without information asymmetry. A specific least recently used-based and frequency-based windows are used to determine the priority of the nodes that receive the latest interaction information. HILI updates node embeddings by attention layers. The attention layers learn the correlation between nodes and update node embedding simply and quickly. In addition, HILI elaborately designs, a self-linear layer, a linear layer initialized in a novel method. A self-linear layer reduces the expected space of predicted embedding of the next interacting node and makes predicted embedding focus more on relevant nodes. We illustrate the geometric meaning of a self-linear layer in the paper. Furthermore, the results of the experiments show that our model outperforms other state-of-the-art temporal interaction prediction models.

捕获交互的拓扑和时间信息并预测未来的交互对于许多领域都是至关重要的，例如社会网络、金融交易和电子商务。随着共同进化模型的出现，交互用户和项目之间的相互影响被捕获。然而，现有的模型只更新节点沿时间轴的交互信息。它导致了信息不对称的问题，早期更新的节点通常比最近更新的节点拥有更少的信息。信息不对称实质上是信息流的阻塞。我们提出了HILI(高液体时间相互作用图嵌入)来预测时间相互作用图上的高液体嵌入。我们的嵌入模型使交互信息高度流动，没有信息不对称。使用特定的基于最近最少使用的和基于频率的窗口来确定接收最新交互信息的节点的优先级。HILI通过关注层更新节点嵌入。注意层学习节点间的相关性，简单快速地更新节点嵌入。此外，HILI还精心设计了一个自线性层，一个以新颖方法初始化的线性层。自线性层减少了下一个交互节点预测嵌入的期望空间，使预测嵌入更加关注相关节点。本文阐述了自线性层的几何意义。此外，实验结果表明，我们的模型优于其他最先进的时间相互作用预测模型。

{"title":"Highly Liquid Temporal Interaction Graph Embeddings","authors":"Huidi Chen, Yun Xiong, Yangyong Zhu, Philip S. Yu","doi":"10.1145/3442381.3449921","DOIUrl":"https://doi.org/10.1145/3442381.3449921","url":null,"abstract":"Capturing the topological and temporal information of interactions and predicting future interactions are crucial for many domains, such as social networks, financial transactions, and e-commerce. With the advent of co-evolutional models, the mutual influence between the interacted users and items are captured. However, existing models only update the interaction information of nodes along the timeline. It causes the problem of information asymmetry, where early updated nodes often have much less information than the most recently updated nodes. The information asymmetry is essentially a blockage of information flow. We propose HILI (Highly Liquid Temporal Interaction Graph Embeddings) to predict highly liquid embeddings on temporal interaction graphs. Our embedding model makes interaction information highly liquid without information asymmetry. A specific least recently used-based and frequency-based windows are used to determine the priority of the nodes that receive the latest interaction information. HILI updates node embeddings by attention layers. The attention layers learn the correlation between nodes and update node embedding simply and quickly. In addition, HILI elaborately designs, a self-linear layer, a linear layer initialized in a novel method. A self-linear layer reduces the expected space of predicted embedding of the next interacting node and makes predicted embedding focus more on relevant nodes. We illustrate the geometric meaning of a self-linear layer in the paper. Furthermore, the results of the experiments show that our model outperforms other state-of-the-art temporal interaction prediction models.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131611811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Distilling Knowledge from Publicly Available Online EMR Data to Emerging Epidemic for Prognosis 从公开可用的在线电子病历数据中提取知识以预测新出现的流行病

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449855

Liantao Ma, Xinyu Ma, Junyi Gao, Xianfeng Jiao, Zhihao Yu, Chaohe Zhang, Wenjie Ruan, Yasha Wang, Wen Tang, Jiangtao Wang

Due to the characteristics of COVID-19, the epidemic develops rapidly and overwhelms health service systems worldwide. Many patients suffer from life-threatening systemic problems and need to be carefully monitored in ICUs. An intelligent prognosis can help physicians take an early intervention, prevent adverse outcomes, and optimize the medical resource allocation, which is urgently needed, especially in this ongoing global pandemic crisis. However, in the early stage of the epidemic outbreak, the data available for analysis is limited due to the lack of effective diagnostic mechanisms, the rarity of the cases, and privacy concerns. In this paper, we propose a distilled transfer learning framework, which leverages the existing publicly available online Electronic Medical Records to enhance the prognosis for inpatients with emerging infectious diseases. It learns to embed the COVID-19-related medical features based on massive existing EMR data. The transferred parameters are further trained to imitate the teacher model’s representation based on distillation, which embeds the health status more comprehensively on the source dataset. We conduct Length-of-Stay prediction experiments for patients in ICUs on real-world COVID-19 datasets. The experiment results indicate that our proposed model consistently outperforms competitive baseline methods. In order to further verify the scalability of o deal with different clinical tasks on different EMR datasets, we conduct an additional mortality prediction experiment on End-Stage Renal Disease datasets. The extensive experiments demonstrate that an benefit the prognosis for emerging pandemics and other diseases with limited EMR.

由于COVID-19的特点，疫情发展迅速，使全球卫生服务系统不堪重负。许多患者患有危及生命的系统性问题，需要在icu中仔细监测。智能预后可以帮助医生采取早期干预措施，预防不良后果，优化医疗资源配置，这是迫切需要的，特别是在当前的全球大流行危机中。然而，在疫情爆发的早期阶段，由于缺乏有效的诊断机制、病例罕见以及隐私问题，可用于分析的数据有限。在本文中，我们提出了一个提炼的迁移学习框架，该框架利用现有的公开在线电子病历来提高新发传染病住院患者的预后。它学习基于大量现有电子病历数据嵌入与covid -19相关的医疗功能。对传递的参数进行进一步训练，模仿基于蒸馏的教师模型表示，从而更全面地将健康状态嵌入到源数据集上。我们在真实的COVID-19数据集上对icu患者进行了住院时间预测实验。实验结果表明，我们提出的模型始终优于竞争性基线方法。为了进一步验证o在不同EMR数据集上处理不同临床任务的可扩展性，我们在终末期肾脏疾病数据集上进行了额外的死亡率预测实验。广泛的实验表明，这有利于新兴流行病和其他EMR有限的疾病的预后。

{"title":"Distilling Knowledge from Publicly Available Online EMR Data to Emerging Epidemic for Prognosis","authors":"Liantao Ma, Xinyu Ma, Junyi Gao, Xianfeng Jiao, Zhihao Yu, Chaohe Zhang, Wenjie Ruan, Yasha Wang, Wen Tang, Jiangtao Wang","doi":"10.1145/3442381.3449855","DOIUrl":"https://doi.org/10.1145/3442381.3449855","url":null,"abstract":"Due to the characteristics of COVID-19, the epidemic develops rapidly and overwhelms health service systems worldwide. Many patients suffer from life-threatening systemic problems and need to be carefully monitored in ICUs. An intelligent prognosis can help physicians take an early intervention, prevent adverse outcomes, and optimize the medical resource allocation, which is urgently needed, especially in this ongoing global pandemic crisis. However, in the early stage of the epidemic outbreak, the data available for analysis is limited due to the lack of effective diagnostic mechanisms, the rarity of the cases, and privacy concerns. In this paper, we propose a distilled transfer learning framework, which leverages the existing publicly available online Electronic Medical Records to enhance the prognosis for inpatients with emerging infectious diseases. It learns to embed the COVID-19-related medical features based on massive existing EMR data. The transferred parameters are further trained to imitate the teacher model’s representation based on distillation, which embeds the health status more comprehensively on the source dataset. We conduct Length-of-Stay prediction experiments for patients in ICUs on real-world COVID-19 datasets. The experiment results indicate that our proposed model consistently outperforms competitive baseline methods. In order to further verify the scalability of o deal with different clinical tasks on different EMR datasets, we conduct an additional mortality prediction experiment on End-Stage Renal Disease datasets. The extensive experiments demonstrate that an benefit the prognosis for emerging pandemics and other diseases with limited EMR.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"152 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120876645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Cross-Positional Attention for Debiasing Clicks 消除偏误点击的交叉位置注意

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450098

Honglei Zhuang, Zhen Qin, Xuanhui Wang, Michael Bendersky, Xinyu Qian, Po Hu, Dan Chary Chen

A well-known challenge in leveraging implicit user feedback like clicks to improve real-world search services and recommender systems is its inherent bias. Most existing click models are based on the examination hypothesis in user behaviors and differ in how to model such an examination bias. However, they are constrained by assuming a simple position-based bias or enforcing a sequential order in user examination behaviors. These assumptions are insufficient to capture complex real-world user behaviors and hardly generalize to modern user interfaces (UI) in web applications (e.g., results shown in a grid view). In this work, we propose a fully data-driven neural model for the examination bias, Cross-Positional Attention (XPA), which is more flexible in fitting complex user behaviors. Our model leverages the attention mechanism to effectively capture cross-positional interactions among displayed items and is applicable to arbitrary UIs. We employ XPA in a novel neural click model that can both predict clicks and estimate relevance. Our experiments on offline synthetic data sets show that XPA is robust among different click generation processes. We further apply XPA to a large-scale real-world recommender system, showing significantly better results than baselines in online A/B experiments that involve millions of users. This validates the necessity to model more complex user behaviors than those proposed in the literature.

利用用户的隐性反馈(如点击)来改进现实世界的搜索服务和推荐系统，一个众所周知的挑战是其固有的偏见。大多数现有的点击模型都是基于用户行为中的检查假设，并且在如何模拟这种检查偏差方面存在差异。然而，它们受到假设简单的基于位置的偏差或在用户检查行为中强制执行顺序的限制。这些假设不足以捕捉复杂的现实世界用户行为，也很难推广到web应用程序中的现代用户界面(UI)中(例如，网格视图中显示的结果)。在这项工作中，我们提出了一个完全数据驱动的神经模型，交叉位置注意(XPA)，它在拟合复杂的用户行为方面更加灵活。我们的模型利用注意机制来有效地捕获显示项目之间的交叉位置交互，并且适用于任意ui。我们将XPA应用于一种新的神经点击模型中，该模型既可以预测点击，也可以估计相关性。我们在离线合成数据集上的实验表明，XPA在不同的点击生成过程中具有鲁棒性。我们进一步将XPA应用于大规模的真实世界推荐系统，在涉及数百万用户的在线a /B实验中显示出明显优于基线的结果。这证实了建立比文献中提出的更复杂的用户行为模型的必要性。

{"title":"Cross-Positional Attention for Debiasing Clicks","authors":"Honglei Zhuang, Zhen Qin, Xuanhui Wang, Michael Bendersky, Xinyu Qian, Po Hu, Dan Chary Chen","doi":"10.1145/3442381.3450098","DOIUrl":"https://doi.org/10.1145/3442381.3450098","url":null,"abstract":"A well-known challenge in leveraging implicit user feedback like clicks to improve real-world search services and recommender systems is its inherent bias. Most existing click models are based on the examination hypothesis in user behaviors and differ in how to model such an examination bias. However, they are constrained by assuming a simple position-based bias or enforcing a sequential order in user examination behaviors. These assumptions are insufficient to capture complex real-world user behaviors and hardly generalize to modern user interfaces (UI) in web applications (e.g., results shown in a grid view). In this work, we propose a fully data-driven neural model for the examination bias, Cross-Positional Attention (XPA), which is more flexible in fitting complex user behaviors. Our model leverages the attention mechanism to effectively capture cross-positional interactions among displayed items and is applicable to arbitrary UIs. We employ XPA in a novel neural click model that can both predict clicks and estimate relevance. Our experiments on offline synthetic data sets show that XPA is robust among different click generation processes. We further apply XPA to a large-scale real-world recommender system, showing significantly better results than baselines in online A/B experiments that involve millions of users. This validates the necessity to model more complex user behaviors than those proposed in the literature.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116487955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

RETA: A Schema-Aware, End-to-End Solution for Instance Completion in Knowledge Graphs RETA:知识图实例补全的模式感知端到端解决方案

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449883

Paolo Rosso, Dingqi Yang, Natalia Ostapuk, P. Cudré-Mauroux

Knowledge Graph (KG) completion has been widely studied to tackle the incompleteness issue (i.e., missing facts) in modern KGs. A fact in a KG is represented as a triplet (h, r, t) linking two entities h and t via a relation r. Existing work mostly consider link prediction to solve this problem, i.e., given two elements of a triplet predicting the missing one, such as (h, r, ?). This task has, however, a strong assumption on the two given elements in a triplet, which have to be correlated, resulting otherwise in meaningless predictions, such as (Marie Curie, headquarters location, ?). In addition, the KG completion problem has also been formulated as a relation prediction task, i.e., when predicting relations r for a given entity h. Without predicting t, this task is however a step away from the ultimate goal of KG completion. Against this background, this paper studies an instance completion task suggesting r-t pairs for a given h, i.e., (h, ?, ?). We propose an end-to-end solution called RETA (as it suggests the Relation and Tail for a given head entity) consisting of two components: a RETA-Filter and RETA-Grader. More precisely, our RETA-Filter first generates candidate r-t pairs for a given h by extracting and leveraging the schema of a KG; our RETA-Grader then evaluates and ranks the candidate r-t pairs considering the plausibility of both the candidate triplet and its corresponding schema using a newly-designed KG embedding model. We evaluate our methods against a sizable collection of state-of-the-art techniques on three real-world KG datasets. Results show that our RETA-Filter generates of high-quality candidate r-t pairs, outperforming the best baseline techniques while reducing by 10.61%-84.75% the candidate size under the same candidate quality guarantees. Moreover, our RETA-Grader also significantly outperforms state-of-the-art link prediction techniques on the instance completion task by 16.25%-65.92% across different datasets.

知识图谱(Knowledge Graph, KG)补全已被广泛研究，以解决现代知识图谱中的不完备问题(即缺失事实)。知识图谱中的事实被表示为一个三元组(h, r, t)，通过关系r连接两个实体h和t。现有的工作大多考虑链接预测来解决这个问题，即给定三元组中的两个元素预测缺失的元素，如(h, r， ?)。然而，这个任务有一个很强的假设，在一个三元组中两个给定的元素必须是相关的，否则会导致无意义的预测，如(居里夫人，总部的位置，?)此外，KG补全问题也被表述为关系预测任务，即，当预测给定实体h的关系r时，如果不预测t，则该任务离KG补全的最终目标还有一步之遥。在此背景下，本文研究了对给定h (h， ?， ?)提出r-t对的实例补全任务。我们提出了一个端到端解决方案，称为RETA(因为它建议给定头部实体的关系和尾部)，由两个组件组成:RETA- filter和RETA- grader。更准确地说，我们的RETA-Filter首先通过提取和利用KG的模式为给定h生成候选r-t对;然后，我们的RETA-Grader使用新设计的KG嵌入模型，考虑候选三元组及其相应模式的可信性，对候选r-t对进行评估和排序。我们在三个真实世界的KG数据集上评估了我们的方法。结果表明，我们的RETA-Filter生成了高质量的候选r-t对，优于最佳基线技术，同时在相同的候选质量保证下减少了10.61%-84.75%的候选大小。此外，在不同的数据集上，我们的RETA-Grader在实例完成任务上的表现也显著优于最先进的链接预测技术，高出16.25%-65.92%。

{"title":"RETA: A Schema-Aware, End-to-End Solution for Instance Completion in Knowledge Graphs","authors":"Paolo Rosso, Dingqi Yang, Natalia Ostapuk, P. Cudré-Mauroux","doi":"10.1145/3442381.3449883","DOIUrl":"https://doi.org/10.1145/3442381.3449883","url":null,"abstract":"Knowledge Graph (KG) completion has been widely studied to tackle the incompleteness issue (i.e., missing facts) in modern KGs. A fact in a KG is represented as a triplet (h, r, t) linking two entities h and t via a relation r. Existing work mostly consider link prediction to solve this problem, i.e., given two elements of a triplet predicting the missing one, such as (h, r, ?). This task has, however, a strong assumption on the two given elements in a triplet, which have to be correlated, resulting otherwise in meaningless predictions, such as (Marie Curie, headquarters location, ?). In addition, the KG completion problem has also been formulated as a relation prediction task, i.e., when predicting relations r for a given entity h. Without predicting t, this task is however a step away from the ultimate goal of KG completion. Against this background, this paper studies an instance completion task suggesting r-t pairs for a given h, i.e., (h, ?, ?). We propose an end-to-end solution called RETA (as it suggests the Relation and Tail for a given head entity) consisting of two components: a RETA-Filter and RETA-Grader. More precisely, our RETA-Filter first generates candidate r-t pairs for a given h by extracting and leveraging the schema of a KG; our RETA-Grader then evaluates and ranks the candidate r-t pairs considering the plausibility of both the candidate triplet and its corresponding schema using a newly-designed KG embedding model. We evaluate our methods against a sizable collection of state-of-the-art techniques on three real-world KG datasets. Results show that our RETA-Filter generates of high-quality candidate r-t pairs, outperforming the best baseline techniques while reducing by 10.61%-84.75% the candidate size under the same candidate quality guarantees. Moreover, our RETA-Grader also significantly outperforms state-of-the-art link prediction techniques on the instance completion task by 16.25%-65.92% across different datasets.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129873143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Consistent Sampling Through Extremal Process 通过极值过程进行一致性采样

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449955

P. Li, Xiaoyun Li, G. Samorodnitsky, Weijie Zhao

The1 Jaccard similarity has been widely used in search and machine learning, especially in industrial practice. For binary (0/1) data, the Jaccard similarity is often called the “resemblance” and the method of minwise hashing has been the standard tool for computing resemblances in massive data. For general weighted data, the commonly used sampling algorithm for computing the (weighted) Jaccard similarity is the Consistent Weighted Sampling (CWS). A convenient (and perhaps also mysterious) implementation of CWS is the so-called “0-bit CWS” published in KDD 2015 [31], which, in this paper, we refer to as the “relaxed CWS” and was purely an empirical observation without theoretical justification. The difficulty in the analysis of the “relaxed CWS” is due to the complicated probability problem, which we could not resolve at this point. In this paper, we propose using extremal processes to generate samples for estimating the Jaccard similarity. Surprisingly, the proposed “extremal sampling” (ES) scheme makes it possible to analyze the “relaxed ES” variant. Through some novel probability endeavours, we are able to rigorously compute the bias of the “relaxed ES” which, to a good extent, explains why the “relaxed ES” works so well and when it does not in extreme corner cases. Interestingly, compared with CWS, the resultant algorithm only involves counting and does not need sophisticated mathematical operations (as required by CWS). It is therefore not surprising that the proposed ES scheme is actually noticeably faster than CWS. Although ES is different from CWS (and other algorithms in the literature for estimating the Jaccard similarity), in retrospect ES is indeed closely related to CWS. This paper provides the much needed insight which connects CWS with extremal processes. This insight may help understand CWS (and variants), and might help develop new algorithms for similarity estimation, in future research.

Jaccard相似度已广泛应用于搜索和机器学习，特别是在工业实践中。对于二进制(0/1)数据，Jaccard相似度通常被称为“相似度”，最小哈希方法已经成为计算大量数据相似度的标准工具。对于一般加权数据，计算(加权)Jaccard相似度的常用抽样算法是一致加权抽样(CWS)。一种方便的(也许也是神秘的)CWS实现是KDD 2015[31]中发布的所谓“0位CWS”，在本文中，我们将其称为“宽松CWS”，纯粹是经验观察，没有理论依据。“松弛CWS”分析的难点在于复杂的概率问题，目前还无法解决。在本文中，我们提出使用极值过程来生成样本来估计Jaccard相似度。令人惊讶的是，提出的“极值抽样”(ES)方案使分析“放松ES”变体成为可能。通过一些新颖的概率尝试，我们能够严格计算“放松的ES”的偏差，这在很大程度上解释了为什么“放松的ES”在极端的极端情况下工作得如此之好，以及什么时候它不工作。有趣的是，与CWS相比，生成的算法只涉及计数，不需要复杂的数学运算(CWS需要)。因此，提议的ES方案实际上明显快于CWS也就不足为奇了。虽然ES不同于CWS(以及文献中用于估计Jaccard相似性的其他算法)，但回顾起来ES确实与CWS密切相关。本文提供了将CWS与极值过程联系起来的急需的见解。这种见解可能有助于理解CWS(及其变体)，并可能有助于在未来的研究中开发用于相似性估计的新算法。

{"title":"Consistent Sampling Through Extremal Process","authors":"P. Li, Xiaoyun Li, G. Samorodnitsky, Weijie Zhao","doi":"10.1145/3442381.3449955","DOIUrl":"https://doi.org/10.1145/3442381.3449955","url":null,"abstract":"The1 Jaccard similarity has been widely used in search and machine learning, especially in industrial practice. For binary (0/1) data, the Jaccard similarity is often called the “resemblance” and the method of minwise hashing has been the standard tool for computing resemblances in massive data. For general weighted data, the commonly used sampling algorithm for computing the (weighted) Jaccard similarity is the Consistent Weighted Sampling (CWS). A convenient (and perhaps also mysterious) implementation of CWS is the so-called “0-bit CWS” published in KDD 2015 [31], which, in this paper, we refer to as the “relaxed CWS” and was purely an empirical observation without theoretical justification. The difficulty in the analysis of the “relaxed CWS” is due to the complicated probability problem, which we could not resolve at this point. In this paper, we propose using extremal processes to generate samples for estimating the Jaccard similarity. Surprisingly, the proposed “extremal sampling” (ES) scheme makes it possible to analyze the “relaxed ES” variant. Through some novel probability endeavours, we are able to rigorously compute the bias of the “relaxed ES” which, to a good extent, explains why the “relaxed ES” works so well and when it does not in extreme corner cases. Interestingly, compared with CWS, the resultant algorithm only involves counting and does not need sophisticated mathematical operations (as required by CWS). It is therefore not surprising that the proposed ES scheme is actually noticeably faster than CWS. Although ES is different from CWS (and other algorithms in the literature for estimating the Jaccard similarity), in retrospect ES is indeed closely related to CWS. This paper provides the much needed insight which connects CWS with extremal processes. This insight may help understand CWS (and variants), and might help develop new algorithms for similarity estimation, in future research.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130669826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Effective Named Entity Recognition with Boundary-aware Bidirectional Neural Networks 边界感知双向神经网络的有效命名实体识别

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449995

Fei Li, Z. Wang, S. Hui, L. Liao, Dandan Song, Jing Xu

Named Entity Recognition (NER) is a fundamental problem in Natural Language Processing and has received much research attention. Although the current neural-based NER approaches have achieved the state-of-the-art performance, they still suffer from one or more of the following three problems in their architectures: (1) boundary tag sparsity, (2) lacking of global decoding information; and (3) boundary error propagation. In this paper, we propose a novel Boundary-aware Bidirectional Neural Networks (Ba-BNN) model to tackle these problems for neural-based NER. The proposed Ba-BNN model is constructed based on the structure of pointer networks for tackling the first problem on boundary tag sparsity. Moreover, we also use a boundary-aware binary classifier to capture the global decoding information as input to the decoders. In the Ba-BNN model, we propose to use two decoders to process the information in two different directions (i.e., from left-to-right and right-to-left). The final hidden states of the left-to-right decoder are obtained by incorporating the hidden states of the right-to-left decoder in the decoding process. In addition, a boundary retraining strategy is also proposed to help reduce boundary error propagation caused by the pointer networks in boundary detection and entity classification. We have conducted extensive experiments based on three NER benchmark datasets. The performance results have shown that the proposed Ba-BNN model has outperformed the current state-of-the-art models.

命名实体识别(NER)是自然语言处理中的一个基本问题，受到了广泛的关注。尽管目前基于神经的NER方法已经取得了最先进的性能，但它们的架构仍然存在以下三个问题:(1)边界标签稀疏性;(2)缺乏全局解码信息;(3)边界误差传播。在本文中，我们提出了一种新的边界感知双向神经网络(Ba-BNN)模型来解决这些问题。提出了基于指针网络结构的Ba-BNN模型，解决了边界标签稀疏性问题。此外，我们还使用边界感知的二进制分类器来捕获全局解码信息作为解码器的输入。在Ba-BNN模型中，我们建议使用两个解码器以两个不同的方向(即从左到右和从右到左)处理信息。通过在解码过程中结合从右到左的解码器的隐藏状态，得到从左到右的解码器的最终隐藏状态。此外，还提出了一种边界再训练策略，以减少指针网络在边界检测和实体分类中引起的边界误差传播。我们基于三个NER基准数据集进行了广泛的实验。性能结果表明，所提出的Ba-BNN模型优于目前最先进的模型。

{"title":"Effective Named Entity Recognition with Boundary-aware Bidirectional Neural Networks","authors":"Fei Li, Z. Wang, S. Hui, L. Liao, Dandan Song, Jing Xu","doi":"10.1145/3442381.3449995","DOIUrl":"https://doi.org/10.1145/3442381.3449995","url":null,"abstract":"Named Entity Recognition (NER) is a fundamental problem in Natural Language Processing and has received much research attention. Although the current neural-based NER approaches have achieved the state-of-the-art performance, they still suffer from one or more of the following three problems in their architectures: (1) boundary tag sparsity, (2) lacking of global decoding information; and (3) boundary error propagation. In this paper, we propose a novel Boundary-aware Bidirectional Neural Networks (Ba-BNN) model to tackle these problems for neural-based NER. The proposed Ba-BNN model is constructed based on the structure of pointer networks for tackling the first problem on boundary tag sparsity. Moreover, we also use a boundary-aware binary classifier to capture the global decoding information as input to the decoders. In the Ba-BNN model, we propose to use two decoders to process the information in two different directions (i.e., from left-to-right and right-to-left). The final hidden states of the left-to-right decoder are obtained by incorporating the hidden states of the right-to-left decoder in the decoding process. In addition, a boundary retraining strategy is also proposed to help reduce boundary error propagation caused by the pointer networks in boundary detection and entity classification. We have conducted extensive experiments based on three NER benchmark datasets. The performance results have shown that the proposed Ba-BNN model has outperformed the current state-of-the-art models.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134477898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

CoResident Evil: Covert Communication In The Cloud With Lambdas 《邪恶总统:与Lambdas在云端的秘密交流

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450100

Anil Yelam, Shibani Subbareddy, Keerthana Ganesan, S. Savage, A. Mirian

“Serverless” cloud services, such as AWS lambdas, are one of the fastest growing segments of the cloud services market. These services are popular in part due to their light-weight nature and flexibility in scheduling and cost, however the security issues associated with serverless computing are not well understood. In this work, we explore the feasibility of constructing a practical covert channel from lambdas. We establish that a fast co-residence detection for lambdas is key to enabling such a covert channel, and proceed to develop a reliable and scalable co-residence detector based on the memory bus hardware. Our technique enables dynamic discovery for co-resident lambdas and is incredibly fast, executing in a matter of seconds. We evaluate our approach for correctness and scalability, and use it to establish covert channels and perform data transfer on AWS lambdas. We show that we can establish hundreds of individual covert channels for every 1000 lambdas deployed, and each of those channels can send data at a rate of 00 bits per second, thus demonstrating that covert communication via lambdas is entirely feasible.

“无服务器”云服务，如AWS lambda，是云服务市场中增长最快的部分之一。这些服务之所以流行，部分原因在于它们的轻量级特性以及调度和成本方面的灵活性，然而，与无服务器计算相关的安全问题还没有得到很好的理解。在这项工作中，我们探索了从lambda构造一个实用隐蔽信道的可行性。我们建立了快速的lambda共居检测是实现这种隐蔽通道的关键，并继续开发基于内存总线硬件的可靠且可扩展的共居检测器。我们的技术可以对共同驻留lambda进行动态发现，并且速度非常快，在几秒钟内即可执行。我们评估了我们的方法的正确性和可扩展性，并使用它来建立隐蔽通道并在AWS lambda上执行数据传输。我们表明，我们可以为每部署1000个lambda建立数百个单独的隐蔽通道，并且每个通道都可以以每秒00比特的速率发送数据，从而证明通过lambda进行隐蔽通信是完全可行的。

{"title":"CoResident Evil: Covert Communication In The Cloud With Lambdas","authors":"Anil Yelam, Shibani Subbareddy, Keerthana Ganesan, S. Savage, A. Mirian","doi":"10.1145/3442381.3450100","DOIUrl":"https://doi.org/10.1145/3442381.3450100","url":null,"abstract":"“Serverless” cloud services, such as AWS lambdas, are one of the fastest growing segments of the cloud services market. These services are popular in part due to their light-weight nature and flexibility in scheduling and cost, however the security issues associated with serverless computing are not well understood. In this work, we explore the feasibility of constructing a practical covert channel from lambdas. We establish that a fast co-residence detection for lambdas is key to enabling such a covert channel, and proceed to develop a reliable and scalable co-residence detector based on the memory bus hardware. Our technique enables dynamic discovery for co-resident lambdas and is incredibly fast, executing in a matter of seconds. We evaluate our approach for correctness and scalability, and use it to establish covert channels and perform data transfer on AWS lambdas. We show that we can establish hundreds of individual covert channels for every 1000 lambdas deployed, and each of those channels can send data at a rate of 00 bits per second, thus demonstrating that covert communication via lambdas is entirely feasible.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134457363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Superways: A Datacenter Topology for Incast-heavy workloads Superways:一种用于超大负载的数据中心拓扑

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449966

Hamed Rezaei, Balajee Vamanan

Several important datacenter applications cause incast congestion, which severely degrades flow completion times of short flows and throughput of long flows. Further, because most flows are short and the incast duration is shorter than typical round-trip times, reactive mechanisms that rely on congestion control are not effective. While modern datacenter topologies provide high bisection bandwidth to support all-to-all traffic, incast is fundamentally a many-to-one traffic pattern, and therefore, requires deep buffers or high bandwidth at the network edge. We propose Superways, a heterogeneous datacenter topology that provides higher bandwidth for some servers to absorb incasts, as incasts occur only at a small number of servers that aggregate responses from other senders. Our design is based on the key observation that a small subset of servers which aggregate responses are likely to be network bound, whereas most other servers that communicate only with random servers are not. Superways can be implemented over many of the existing datacenter topologies and can be expanded flexibly without incurring high cost and cabling complexity. We also provide a heuristic for scheduling jobs in our topology to fully utilize the extra capacity. Using a real CloudLab implementation and using ns-3 simulations, we show that Superways significantly improves flow completion times and throughput over existing datacenter topologies. We also analyze cost and cabling complexity, and discuss how to expand our topology.

一些重要的数据中心应用程序会导致即时拥塞，这严重降低了短流的流完成时间和长流的吞吐量。此外，由于大多数流都很短，并且持续时间短于典型的往返时间，依赖于拥塞控制的反应机制并不有效。虽然现代数据中心拓扑结构提供高对分带宽来支持所有到所有的流量，但从根本上说，即时传输是一种多对一的流量模式，因此需要在网络边缘提供深缓冲区或高带宽。我们提出了Superways，这是一种异构数据中心拓扑，它为一些服务器提供了更高的带宽来吸收注入，因为注入只发生在少数服务器上，这些服务器聚合了来自其他发送者的响应。我们的设计是基于一个关键的观察，即一小部分聚合响应的服务器可能是网络绑定的，而大多数其他仅与随机服务器通信的服务器则不是。超级通道可以在许多现有的数据中心拓扑上实现，并且可以灵活地扩展，而不会产生高成本和布线复杂性。我们还提供了一种启发式方法来调度拓扑中的作业，以充分利用额外的容量。通过使用真实的CloudLab实现和ns-3模拟，我们发现Superways显著改善了现有数据中心拓扑的流完成时间和吞吐量。我们还分析了成本和布线复杂性，并讨论了如何扩展我们的拓扑。

{"title":"Superways: A Datacenter Topology for Incast-heavy workloads","authors":"Hamed Rezaei, Balajee Vamanan","doi":"10.1145/3442381.3449966","DOIUrl":"https://doi.org/10.1145/3442381.3449966","url":null,"abstract":"Several important datacenter applications cause incast congestion, which severely degrades flow completion times of short flows and throughput of long flows. Further, because most flows are short and the incast duration is shorter than typical round-trip times, reactive mechanisms that rely on congestion control are not effective. While modern datacenter topologies provide high bisection bandwidth to support all-to-all traffic, incast is fundamentally a many-to-one traffic pattern, and therefore, requires deep buffers or high bandwidth at the network edge. We propose Superways, a heterogeneous datacenter topology that provides higher bandwidth for some servers to absorb incasts, as incasts occur only at a small number of servers that aggregate responses from other senders. Our design is based on the key observation that a small subset of servers which aggregate responses are likely to be network bound, whereas most other servers that communicate only with random servers are not. Superways can be implemented over many of the existing datacenter topologies and can be expanded flexibly without incurring high cost and cabling complexity. We also provide a heuristic for scheduling jobs in our topology to fully utilize the extra capacity. Using a real CloudLab implementation and using ns-3 simulations, we show that Superways significantly improves flow completion times and throughput over existing datacenter topologies. We also analyze cost and cabling complexity, and discuss how to expand our topology.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133161811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3