Heterogeneous information networks (HINs), also called heterogeneous graphs, are composed of multiple types of nodes and edges, and contain comprehensive information and rich semantics. Graph neural networks (GNNs), as powerful tools for graph data, have shown superior performance on network analysis. Recently, many excellent models have been proposed to process hetero-graph data using GNNs and have achieved great success. These GNN-based heterogeneous models can be interpreted as smooth node attributes guided by graph structure, which requires all nodes to have attributes. However, this is not easy to satisfy, as some types of nodes often have no attributes in heterogeneous graphs. Previous studies take some handcrafted methods to solve this problem, which separate the attribute completion from the graph learning process and, in turn, result in poor performance. In this paper, we hold that missing attributes can be acquired by a learnable manner, and propose a general framework for Heterogeneous Graph Neural Network via Attribute Completion (HGNN-AC), including pre-learning of topological embedding and attribute completion with attention mechanism. HGNN-AC first uses existing HIN-Embedding methods to obtain node topological embedding. Then it uses the topological relationship between nodes as guidance to complete attributes for no-attribute nodes by weighted aggregation of the attributes from these attributed nodes. Our complement mechanism can be easily combined with an arbitrary GNN-based heterogeneous model making the whole system end-to-end. We conduct extensive experiments on three real-world heterogeneous graphs. The results demonstrate the superiority of the proposed framework over state-of-the-art baselines.
{"title":"Heterogeneous Graph Neural Network via Attribute Completion","authors":"Di Jin, Cuiying Huo, Chundong Liang, Liang Yang","doi":"10.1145/3442381.3449914","DOIUrl":"https://doi.org/10.1145/3442381.3449914","url":null,"abstract":"Heterogeneous information networks (HINs), also called heterogeneous graphs, are composed of multiple types of nodes and edges, and contain comprehensive information and rich semantics. Graph neural networks (GNNs), as powerful tools for graph data, have shown superior performance on network analysis. Recently, many excellent models have been proposed to process hetero-graph data using GNNs and have achieved great success. These GNN-based heterogeneous models can be interpreted as smooth node attributes guided by graph structure, which requires all nodes to have attributes. However, this is not easy to satisfy, as some types of nodes often have no attributes in heterogeneous graphs. Previous studies take some handcrafted methods to solve this problem, which separate the attribute completion from the graph learning process and, in turn, result in poor performance. In this paper, we hold that missing attributes can be acquired by a learnable manner, and propose a general framework for Heterogeneous Graph Neural Network via Attribute Completion (HGNN-AC), including pre-learning of topological embedding and attribute completion with attention mechanism. HGNN-AC first uses existing HIN-Embedding methods to obtain node topological embedding. Then it uses the topological relationship between nodes as guidance to complete attributes for no-attribute nodes by weighted aggregation of the attributes from these attributed nodes. Our complement mechanism can be easily combined with an arbitrary GNN-based heterogeneous model making the whole system end-to-end. We conduct extensive experiments on three real-world heterogeneous graphs. The results demonstrate the superiority of the proposed framework over state-of-the-art baselines.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121762973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, with the discoveries in neurobiology, high-dimensional sparse hashing has attracted increasing attention. In contrast with general hashing that generates low-dimensional hash codes, the high-dimensional sparse hashing maps inputs into a higher dimensional space and generates sparse hash codes, achieving superior performance. However, the sparse hashing has not been fully studied in hashing literature yet. For example, how to fully explore the power of sparse coding in cross-modal retrieval tasks; how to discretely solve the binary and sparse constraints so as to avoid the quantization error problem. Motivated by these issues, in this paper, we present an efficient sparse hashing method, i.e., High-dimensional Sparse Cross-modal Hashing, HSCH for short. It not only takes the high-level semantic similarity of data into consideration, but also properly exploits the low-level feature similarity. In specific, we theoretically design a fine-grained similarity with two critical fusion rules. Then we take advantage of sparse codes to embed the fine-grained similarity into the to-be-learnt hash codes. Moreover, an efficient discrete optimization algorithm is proposed to solve the binary and sparse constraints, reducing the quantization error. In light of this, it becomes much more trainable, and the learnt hash codes are more discriminative. More importantly, the retrieval complexity of HSCH is as efficient as general hash methods. Extensive experiments on three widely-used datasets demonstrate the superior performance of HSCH compared with several state-of-the-art cross-modal hashing approaches.
{"title":"High-Dimensional Sparse Cross-Modal Hashing with Fine-Grained Similarity Embedding","authors":"Yongxin Wang, Zhen-Duo Chen, Xin Luo, Xin-Shun Xu","doi":"10.1145/3442381.3449798","DOIUrl":"https://doi.org/10.1145/3442381.3449798","url":null,"abstract":"Recently, with the discoveries in neurobiology, high-dimensional sparse hashing has attracted increasing attention. In contrast with general hashing that generates low-dimensional hash codes, the high-dimensional sparse hashing maps inputs into a higher dimensional space and generates sparse hash codes, achieving superior performance. However, the sparse hashing has not been fully studied in hashing literature yet. For example, how to fully explore the power of sparse coding in cross-modal retrieval tasks; how to discretely solve the binary and sparse constraints so as to avoid the quantization error problem. Motivated by these issues, in this paper, we present an efficient sparse hashing method, i.e., High-dimensional Sparse Cross-modal Hashing, HSCH for short. It not only takes the high-level semantic similarity of data into consideration, but also properly exploits the low-level feature similarity. In specific, we theoretically design a fine-grained similarity with two critical fusion rules. Then we take advantage of sparse codes to embed the fine-grained similarity into the to-be-learnt hash codes. Moreover, an efficient discrete optimization algorithm is proposed to solve the binary and sparse constraints, reducing the quantization error. In light of this, it becomes much more trainable, and the learnt hash codes are more discriminative. More importantly, the retrieval complexity of HSCH is as efficient as general hash methods. Extensive experiments on three widely-used datasets demonstrate the superior performance of HSCH compared with several state-of-the-art cross-modal hashing approaches.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123321661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Capturing the topological and temporal information of interactions and predicting future interactions are crucial for many domains, such as social networks, financial transactions, and e-commerce. With the advent of co-evolutional models, the mutual influence between the interacted users and items are captured. However, existing models only update the interaction information of nodes along the timeline. It causes the problem of information asymmetry, where early updated nodes often have much less information than the most recently updated nodes. The information asymmetry is essentially a blockage of information flow. We propose HILI (Highly Liquid Temporal Interaction Graph Embeddings) to predict highly liquid embeddings on temporal interaction graphs. Our embedding model makes interaction information highly liquid without information asymmetry. A specific least recently used-based and frequency-based windows are used to determine the priority of the nodes that receive the latest interaction information. HILI updates node embeddings by attention layers. The attention layers learn the correlation between nodes and update node embedding simply and quickly. In addition, HILI elaborately designs, a self-linear layer, a linear layer initialized in a novel method. A self-linear layer reduces the expected space of predicted embedding of the next interacting node and makes predicted embedding focus more on relevant nodes. We illustrate the geometric meaning of a self-linear layer in the paper. Furthermore, the results of the experiments show that our model outperforms other state-of-the-art temporal interaction prediction models.
{"title":"Highly Liquid Temporal Interaction Graph Embeddings","authors":"Huidi Chen, Yun Xiong, Yangyong Zhu, Philip S. Yu","doi":"10.1145/3442381.3449921","DOIUrl":"https://doi.org/10.1145/3442381.3449921","url":null,"abstract":"Capturing the topological and temporal information of interactions and predicting future interactions are crucial for many domains, such as social networks, financial transactions, and e-commerce. With the advent of co-evolutional models, the mutual influence between the interacted users and items are captured. However, existing models only update the interaction information of nodes along the timeline. It causes the problem of information asymmetry, where early updated nodes often have much less information than the most recently updated nodes. The information asymmetry is essentially a blockage of information flow. We propose HILI (Highly Liquid Temporal Interaction Graph Embeddings) to predict highly liquid embeddings on temporal interaction graphs. Our embedding model makes interaction information highly liquid without information asymmetry. A specific least recently used-based and frequency-based windows are used to determine the priority of the nodes that receive the latest interaction information. HILI updates node embeddings by attention layers. The attention layers learn the correlation between nodes and update node embedding simply and quickly. In addition, HILI elaborately designs, a self-linear layer, a linear layer initialized in a novel method. A self-linear layer reduces the expected space of predicted embedding of the next interacting node and makes predicted embedding focus more on relevant nodes. We illustrate the geometric meaning of a self-linear layer in the paper. Furthermore, the results of the experiments show that our model outperforms other state-of-the-art temporal interaction prediction models.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131611811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the characteristics of COVID-19, the epidemic develops rapidly and overwhelms health service systems worldwide. Many patients suffer from life-threatening systemic problems and need to be carefully monitored in ICUs. An intelligent prognosis can help physicians take an early intervention, prevent adverse outcomes, and optimize the medical resource allocation, which is urgently needed, especially in this ongoing global pandemic crisis. However, in the early stage of the epidemic outbreak, the data available for analysis is limited due to the lack of effective diagnostic mechanisms, the rarity of the cases, and privacy concerns. In this paper, we propose a distilled transfer learning framework, which leverages the existing publicly available online Electronic Medical Records to enhance the prognosis for inpatients with emerging infectious diseases. It learns to embed the COVID-19-related medical features based on massive existing EMR data. The transferred parameters are further trained to imitate the teacher model’s representation based on distillation, which embeds the health status more comprehensively on the source dataset. We conduct Length-of-Stay prediction experiments for patients in ICUs on real-world COVID-19 datasets. The experiment results indicate that our proposed model consistently outperforms competitive baseline methods. In order to further verify the scalability of o deal with different clinical tasks on different EMR datasets, we conduct an additional mortality prediction experiment on End-Stage Renal Disease datasets. The extensive experiments demonstrate that an benefit the prognosis for emerging pandemics and other diseases with limited EMR.
{"title":"Distilling Knowledge from Publicly Available Online EMR Data to Emerging Epidemic for Prognosis","authors":"Liantao Ma, Xinyu Ma, Junyi Gao, Xianfeng Jiao, Zhihao Yu, Chaohe Zhang, Wenjie Ruan, Yasha Wang, Wen Tang, Jiangtao Wang","doi":"10.1145/3442381.3449855","DOIUrl":"https://doi.org/10.1145/3442381.3449855","url":null,"abstract":"Due to the characteristics of COVID-19, the epidemic develops rapidly and overwhelms health service systems worldwide. Many patients suffer from life-threatening systemic problems and need to be carefully monitored in ICUs. An intelligent prognosis can help physicians take an early intervention, prevent adverse outcomes, and optimize the medical resource allocation, which is urgently needed, especially in this ongoing global pandemic crisis. However, in the early stage of the epidemic outbreak, the data available for analysis is limited due to the lack of effective diagnostic mechanisms, the rarity of the cases, and privacy concerns. In this paper, we propose a distilled transfer learning framework, which leverages the existing publicly available online Electronic Medical Records to enhance the prognosis for inpatients with emerging infectious diseases. It learns to embed the COVID-19-related medical features based on massive existing EMR data. The transferred parameters are further trained to imitate the teacher model’s representation based on distillation, which embeds the health status more comprehensively on the source dataset. We conduct Length-of-Stay prediction experiments for patients in ICUs on real-world COVID-19 datasets. The experiment results indicate that our proposed model consistently outperforms competitive baseline methods. In order to further verify the scalability of o deal with different clinical tasks on different EMR datasets, we conduct an additional mortality prediction experiment on End-Stage Renal Disease datasets. The extensive experiments demonstrate that an benefit the prognosis for emerging pandemics and other diseases with limited EMR.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"152 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120876645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Honglei Zhuang, Zhen Qin, Xuanhui Wang, Michael Bendersky, Xinyu Qian, Po Hu, Dan Chary Chen
A well-known challenge in leveraging implicit user feedback like clicks to improve real-world search services and recommender systems is its inherent bias. Most existing click models are based on the examination hypothesis in user behaviors and differ in how to model such an examination bias. However, they are constrained by assuming a simple position-based bias or enforcing a sequential order in user examination behaviors. These assumptions are insufficient to capture complex real-world user behaviors and hardly generalize to modern user interfaces (UI) in web applications (e.g., results shown in a grid view). In this work, we propose a fully data-driven neural model for the examination bias, Cross-Positional Attention (XPA), which is more flexible in fitting complex user behaviors. Our model leverages the attention mechanism to effectively capture cross-positional interactions among displayed items and is applicable to arbitrary UIs. We employ XPA in a novel neural click model that can both predict clicks and estimate relevance. Our experiments on offline synthetic data sets show that XPA is robust among different click generation processes. We further apply XPA to a large-scale real-world recommender system, showing significantly better results than baselines in online A/B experiments that involve millions of users. This validates the necessity to model more complex user behaviors than those proposed in the literature.
{"title":"Cross-Positional Attention for Debiasing Clicks","authors":"Honglei Zhuang, Zhen Qin, Xuanhui Wang, Michael Bendersky, Xinyu Qian, Po Hu, Dan Chary Chen","doi":"10.1145/3442381.3450098","DOIUrl":"https://doi.org/10.1145/3442381.3450098","url":null,"abstract":"A well-known challenge in leveraging implicit user feedback like clicks to improve real-world search services and recommender systems is its inherent bias. Most existing click models are based on the examination hypothesis in user behaviors and differ in how to model such an examination bias. However, they are constrained by assuming a simple position-based bias or enforcing a sequential order in user examination behaviors. These assumptions are insufficient to capture complex real-world user behaviors and hardly generalize to modern user interfaces (UI) in web applications (e.g., results shown in a grid view). In this work, we propose a fully data-driven neural model for the examination bias, Cross-Positional Attention (XPA), which is more flexible in fitting complex user behaviors. Our model leverages the attention mechanism to effectively capture cross-positional interactions among displayed items and is applicable to arbitrary UIs. We employ XPA in a novel neural click model that can both predict clicks and estimate relevance. Our experiments on offline synthetic data sets show that XPA is robust among different click generation processes. We further apply XPA to a large-scale real-world recommender system, showing significantly better results than baselines in online A/B experiments that involve millions of users. This validates the necessity to model more complex user behaviors than those proposed in the literature.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116487955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paolo Rosso, Dingqi Yang, Natalia Ostapuk, P. Cudré-Mauroux
Knowledge Graph (KG) completion has been widely studied to tackle the incompleteness issue (i.e., missing facts) in modern KGs. A fact in a KG is represented as a triplet (h, r, t) linking two entities h and t via a relation r. Existing work mostly consider link prediction to solve this problem, i.e., given two elements of a triplet predicting the missing one, such as (h, r, ?). This task has, however, a strong assumption on the two given elements in a triplet, which have to be correlated, resulting otherwise in meaningless predictions, such as (Marie Curie, headquarters location, ?). In addition, the KG completion problem has also been formulated as a relation prediction task, i.e., when predicting relations r for a given entity h. Without predicting t, this task is however a step away from the ultimate goal of KG completion. Against this background, this paper studies an instance completion task suggesting r-t pairs for a given h, i.e., (h, ?, ?). We propose an end-to-end solution called RETA (as it suggests the Relation and Tail for a given head entity) consisting of two components: a RETA-Filter and RETA-Grader. More precisely, our RETA-Filter first generates candidate r-t pairs for a given h by extracting and leveraging the schema of a KG; our RETA-Grader then evaluates and ranks the candidate r-t pairs considering the plausibility of both the candidate triplet and its corresponding schema using a newly-designed KG embedding model. We evaluate our methods against a sizable collection of state-of-the-art techniques on three real-world KG datasets. Results show that our RETA-Filter generates of high-quality candidate r-t pairs, outperforming the best baseline techniques while reducing by 10.61%-84.75% the candidate size under the same candidate quality guarantees. Moreover, our RETA-Grader also significantly outperforms state-of-the-art link prediction techniques on the instance completion task by 16.25%-65.92% across different datasets.
{"title":"RETA: A Schema-Aware, End-to-End Solution for Instance Completion in Knowledge Graphs","authors":"Paolo Rosso, Dingqi Yang, Natalia Ostapuk, P. Cudré-Mauroux","doi":"10.1145/3442381.3449883","DOIUrl":"https://doi.org/10.1145/3442381.3449883","url":null,"abstract":"Knowledge Graph (KG) completion has been widely studied to tackle the incompleteness issue (i.e., missing facts) in modern KGs. A fact in a KG is represented as a triplet (h, r, t) linking two entities h and t via a relation r. Existing work mostly consider link prediction to solve this problem, i.e., given two elements of a triplet predicting the missing one, such as (h, r, ?). This task has, however, a strong assumption on the two given elements in a triplet, which have to be correlated, resulting otherwise in meaningless predictions, such as (Marie Curie, headquarters location, ?). In addition, the KG completion problem has also been formulated as a relation prediction task, i.e., when predicting relations r for a given entity h. Without predicting t, this task is however a step away from the ultimate goal of KG completion. Against this background, this paper studies an instance completion task suggesting r-t pairs for a given h, i.e., (h, ?, ?). We propose an end-to-end solution called RETA (as it suggests the Relation and Tail for a given head entity) consisting of two components: a RETA-Filter and RETA-Grader. More precisely, our RETA-Filter first generates candidate r-t pairs for a given h by extracting and leveraging the schema of a KG; our RETA-Grader then evaluates and ranks the candidate r-t pairs considering the plausibility of both the candidate triplet and its corresponding schema using a newly-designed KG embedding model. We evaluate our methods against a sizable collection of state-of-the-art techniques on three real-world KG datasets. Results show that our RETA-Filter generates of high-quality candidate r-t pairs, outperforming the best baseline techniques while reducing by 10.61%-84.75% the candidate size under the same candidate quality guarantees. Moreover, our RETA-Grader also significantly outperforms state-of-the-art link prediction techniques on the instance completion task by 16.25%-65.92% across different datasets.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129873143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The1 Jaccard similarity has been widely used in search and machine learning, especially in industrial practice. For binary (0/1) data, the Jaccard similarity is often called the “resemblance” and the method of minwise hashing has been the standard tool for computing resemblances in massive data. For general weighted data, the commonly used sampling algorithm for computing the (weighted) Jaccard similarity is the Consistent Weighted Sampling (CWS). A convenient (and perhaps also mysterious) implementation of CWS is the so-called “0-bit CWS” published in KDD 2015 [31], which, in this paper, we refer to as the “relaxed CWS” and was purely an empirical observation without theoretical justification. The difficulty in the analysis of the “relaxed CWS” is due to the complicated probability problem, which we could not resolve at this point. In this paper, we propose using extremal processes to generate samples for estimating the Jaccard similarity. Surprisingly, the proposed “extremal sampling” (ES) scheme makes it possible to analyze the “relaxed ES” variant. Through some novel probability endeavours, we are able to rigorously compute the bias of the “relaxed ES” which, to a good extent, explains why the “relaxed ES” works so well and when it does not in extreme corner cases. Interestingly, compared with CWS, the resultant algorithm only involves counting and does not need sophisticated mathematical operations (as required by CWS). It is therefore not surprising that the proposed ES scheme is actually noticeably faster than CWS. Although ES is different from CWS (and other algorithms in the literature for estimating the Jaccard similarity), in retrospect ES is indeed closely related to CWS. This paper provides the much needed insight which connects CWS with extremal processes. This insight may help understand CWS (and variants), and might help develop new algorithms for similarity estimation, in future research.
{"title":"Consistent Sampling Through Extremal Process","authors":"P. Li, Xiaoyun Li, G. Samorodnitsky, Weijie Zhao","doi":"10.1145/3442381.3449955","DOIUrl":"https://doi.org/10.1145/3442381.3449955","url":null,"abstract":"The1 Jaccard similarity has been widely used in search and machine learning, especially in industrial practice. For binary (0/1) data, the Jaccard similarity is often called the “resemblance” and the method of minwise hashing has been the standard tool for computing resemblances in massive data. For general weighted data, the commonly used sampling algorithm for computing the (weighted) Jaccard similarity is the Consistent Weighted Sampling (CWS). A convenient (and perhaps also mysterious) implementation of CWS is the so-called “0-bit CWS” published in KDD 2015 [31], which, in this paper, we refer to as the “relaxed CWS” and was purely an empirical observation without theoretical justification. The difficulty in the analysis of the “relaxed CWS” is due to the complicated probability problem, which we could not resolve at this point. In this paper, we propose using extremal processes to generate samples for estimating the Jaccard similarity. Surprisingly, the proposed “extremal sampling” (ES) scheme makes it possible to analyze the “relaxed ES” variant. Through some novel probability endeavours, we are able to rigorously compute the bias of the “relaxed ES” which, to a good extent, explains why the “relaxed ES” works so well and when it does not in extreme corner cases. Interestingly, compared with CWS, the resultant algorithm only involves counting and does not need sophisticated mathematical operations (as required by CWS). It is therefore not surprising that the proposed ES scheme is actually noticeably faster than CWS. Although ES is different from CWS (and other algorithms in the literature for estimating the Jaccard similarity), in retrospect ES is indeed closely related to CWS. This paper provides the much needed insight which connects CWS with extremal processes. This insight may help understand CWS (and variants), and might help develop new algorithms for similarity estimation, in future research.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130669826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fei Li, Z. Wang, S. Hui, L. Liao, Dandan Song, Jing Xu
Named Entity Recognition (NER) is a fundamental problem in Natural Language Processing and has received much research attention. Although the current neural-based NER approaches have achieved the state-of-the-art performance, they still suffer from one or more of the following three problems in their architectures: (1) boundary tag sparsity, (2) lacking of global decoding information; and (3) boundary error propagation. In this paper, we propose a novel Boundary-aware Bidirectional Neural Networks (Ba-BNN) model to tackle these problems for neural-based NER. The proposed Ba-BNN model is constructed based on the structure of pointer networks for tackling the first problem on boundary tag sparsity. Moreover, we also use a boundary-aware binary classifier to capture the global decoding information as input to the decoders. In the Ba-BNN model, we propose to use two decoders to process the information in two different directions (i.e., from left-to-right and right-to-left). The final hidden states of the left-to-right decoder are obtained by incorporating the hidden states of the right-to-left decoder in the decoding process. In addition, a boundary retraining strategy is also proposed to help reduce boundary error propagation caused by the pointer networks in boundary detection and entity classification. We have conducted extensive experiments based on three NER benchmark datasets. The performance results have shown that the proposed Ba-BNN model has outperformed the current state-of-the-art models.
{"title":"Effective Named Entity Recognition with Boundary-aware Bidirectional Neural Networks","authors":"Fei Li, Z. Wang, S. Hui, L. Liao, Dandan Song, Jing Xu","doi":"10.1145/3442381.3449995","DOIUrl":"https://doi.org/10.1145/3442381.3449995","url":null,"abstract":"Named Entity Recognition (NER) is a fundamental problem in Natural Language Processing and has received much research attention. Although the current neural-based NER approaches have achieved the state-of-the-art performance, they still suffer from one or more of the following three problems in their architectures: (1) boundary tag sparsity, (2) lacking of global decoding information; and (3) boundary error propagation. In this paper, we propose a novel Boundary-aware Bidirectional Neural Networks (Ba-BNN) model to tackle these problems for neural-based NER. The proposed Ba-BNN model is constructed based on the structure of pointer networks for tackling the first problem on boundary tag sparsity. Moreover, we also use a boundary-aware binary classifier to capture the global decoding information as input to the decoders. In the Ba-BNN model, we propose to use two decoders to process the information in two different directions (i.e., from left-to-right and right-to-left). The final hidden states of the left-to-right decoder are obtained by incorporating the hidden states of the right-to-left decoder in the decoding process. In addition, a boundary retraining strategy is also proposed to help reduce boundary error propagation caused by the pointer networks in boundary detection and entity classification. We have conducted extensive experiments based on three NER benchmark datasets. The performance results have shown that the proposed Ba-BNN model has outperformed the current state-of-the-art models.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134477898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anil Yelam, Shibani Subbareddy, Keerthana Ganesan, S. Savage, A. Mirian
“Serverless” cloud services, such as AWS lambdas, are one of the fastest growing segments of the cloud services market. These services are popular in part due to their light-weight nature and flexibility in scheduling and cost, however the security issues associated with serverless computing are not well understood. In this work, we explore the feasibility of constructing a practical covert channel from lambdas. We establish that a fast co-residence detection for lambdas is key to enabling such a covert channel, and proceed to develop a reliable and scalable co-residence detector based on the memory bus hardware. Our technique enables dynamic discovery for co-resident lambdas and is incredibly fast, executing in a matter of seconds. We evaluate our approach for correctness and scalability, and use it to establish covert channels and perform data transfer on AWS lambdas. We show that we can establish hundreds of individual covert channels for every 1000 lambdas deployed, and each of those channels can send data at a rate of 00 bits per second, thus demonstrating that covert communication via lambdas is entirely feasible.
{"title":"CoResident Evil: Covert Communication In The Cloud With Lambdas","authors":"Anil Yelam, Shibani Subbareddy, Keerthana Ganesan, S. Savage, A. Mirian","doi":"10.1145/3442381.3450100","DOIUrl":"https://doi.org/10.1145/3442381.3450100","url":null,"abstract":"“Serverless” cloud services, such as AWS lambdas, are one of the fastest growing segments of the cloud services market. These services are popular in part due to their light-weight nature and flexibility in scheduling and cost, however the security issues associated with serverless computing are not well understood. In this work, we explore the feasibility of constructing a practical covert channel from lambdas. We establish that a fast co-residence detection for lambdas is key to enabling such a covert channel, and proceed to develop a reliable and scalable co-residence detector based on the memory bus hardware. Our technique enables dynamic discovery for co-resident lambdas and is incredibly fast, executing in a matter of seconds. We evaluate our approach for correctness and scalability, and use it to establish covert channels and perform data transfer on AWS lambdas. We show that we can establish hundreds of individual covert channels for every 1000 lambdas deployed, and each of those channels can send data at a rate of 00 bits per second, thus demonstrating that covert communication via lambdas is entirely feasible.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134457363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several important datacenter applications cause incast congestion, which severely degrades flow completion times of short flows and throughput of long flows. Further, because most flows are short and the incast duration is shorter than typical round-trip times, reactive mechanisms that rely on congestion control are not effective. While modern datacenter topologies provide high bisection bandwidth to support all-to-all traffic, incast is fundamentally a many-to-one traffic pattern, and therefore, requires deep buffers or high bandwidth at the network edge. We propose Superways, a heterogeneous datacenter topology that provides higher bandwidth for some servers to absorb incasts, as incasts occur only at a small number of servers that aggregate responses from other senders. Our design is based on the key observation that a small subset of servers which aggregate responses are likely to be network bound, whereas most other servers that communicate only with random servers are not. Superways can be implemented over many of the existing datacenter topologies and can be expanded flexibly without incurring high cost and cabling complexity. We also provide a heuristic for scheduling jobs in our topology to fully utilize the extra capacity. Using a real CloudLab implementation and using ns-3 simulations, we show that Superways significantly improves flow completion times and throughput over existing datacenter topologies. We also analyze cost and cabling complexity, and discuss how to expand our topology.
{"title":"Superways: A Datacenter Topology for Incast-heavy workloads","authors":"Hamed Rezaei, Balajee Vamanan","doi":"10.1145/3442381.3449966","DOIUrl":"https://doi.org/10.1145/3442381.3449966","url":null,"abstract":"Several important datacenter applications cause incast congestion, which severely degrades flow completion times of short flows and throughput of long flows. Further, because most flows are short and the incast duration is shorter than typical round-trip times, reactive mechanisms that rely on congestion control are not effective. While modern datacenter topologies provide high bisection bandwidth to support all-to-all traffic, incast is fundamentally a many-to-one traffic pattern, and therefore, requires deep buffers or high bandwidth at the network edge. We propose Superways, a heterogeneous datacenter topology that provides higher bandwidth for some servers to absorb incasts, as incasts occur only at a small number of servers that aggregate responses from other senders. Our design is based on the key observation that a small subset of servers which aggregate responses are likely to be network bound, whereas most other servers that communicate only with random servers are not. Superways can be implemented over many of the existing datacenter topologies and can be expanded flexibly without incurring high cost and cabling complexity. We also provide a heuristic for scheduling jobs in our topology to fully utilize the extra capacity. Using a real CloudLab implementation and using ns-3 simulations, we show that Superways significantly improves flow completion times and throughput over existing datacenter topologies. We also analyze cost and cabling complexity, and discuss how to expand our topology.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133161811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}