Dongmin Park, Hwanjun Song, Minseok Kim, Jae-Gil Lee
Recently, autoencoder (AE)-based embedding approaches have achieved state-of-the-art performance in many tasks, especially in top-k recommendation with user embedding or node classification with node embedding. However, we find that many real-world data follow the power-law distribution with respect to the data object sparsity. When learning AE-based embeddings of these data, dense inputs move away from sparse inputs in an embedding space even when they are highly correlated. This phenomenon, which we call polarization, obviously distorts the embedding. In this paper, we propose TRAP that leverages two-level regularizers to effectively alleviate the polarization problem. The macroscopic regularizer generally prevents dense input objects from being distant from other sparse input objects, and the microscopic regularizer individually attracts each object to correlated neighbor objects rather than uncorrelated ones. Importantly, TRAP is a meta-algorithm that can be easily coupled with existing AE-based embedding methods with a simple modification. In extensive experiments on two representative embedding tasks using six-real world datasets, TRAP boosted the performance of the state-of-the-art algorithms by up to 31.53% and 94.99% respectively.
{"title":"TRAP: Two-level Regularized Autoencoder-based Embedding for Power-law Distributed Data","authors":"Dongmin Park, Hwanjun Song, Minseok Kim, Jae-Gil Lee","doi":"10.1145/3366423.3380233","DOIUrl":"https://doi.org/10.1145/3366423.3380233","url":null,"abstract":"Recently, autoencoder (AE)-based embedding approaches have achieved state-of-the-art performance in many tasks, especially in top-k recommendation with user embedding or node classification with node embedding. However, we find that many real-world data follow the power-law distribution with respect to the data object sparsity. When learning AE-based embeddings of these data, dense inputs move away from sparse inputs in an embedding space even when they are highly correlated. This phenomenon, which we call polarization, obviously distorts the embedding. In this paper, we propose TRAP that leverages two-level regularizers to effectively alleviate the polarization problem. The macroscopic regularizer generally prevents dense input objects from being distant from other sparse input objects, and the microscopic regularizer individually attracts each object to correlated neighbor objects rather than uncorrelated ones. Importantly, TRAP is a meta-algorithm that can be easily coupled with existing AE-based embedding methods with a simple modification. In extensive experiments on two representative embedding tasks using six-real world datasets, TRAP boosted the performance of the state-of-the-art algorithms by up to 31.53% and 94.99% respectively.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80507857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
How are communities in real multi-aspect or multi-view graphs structured? How we can effectively and concisely summarize and explore those communities in a high-dimensional, multi-aspect graph without losing important information? State-of-the-art studies focused on patterns in single graphs, identifying structures in a single snapshot of a large network or in time evolving graphs and stitch them over time. However, to the best of our knowledge, there is no method that discovers and summarizes community structure from a multi-aspect graph, by jointly leveraging information from all aspects. State-of-the-art in multi-aspect/tensor community extraction is limited to discovering clique structure in the extracted communities, or even worse, imposing clique structure where it does not exist. In this paper we bridge that gap by empowering tensor-based methods to extract rich community structure from multi-aspect graphs. In particular, we introduce cLL1, a novel constrained Block Term Tensor Decomposition, that is generally capable of extracting higher than rank-1 but still interpretable structure from a multi-aspect dataset. Subsequently, we propose RichCom, a community structure extraction and summarization algorithm that leverages cLL1to identify rich community structure (e.g., cliques, stars, chains, etc) while leveraging higher-order correlations between the different aspects of the graph. Our contributions are four-fold: (a) Novel algorithm: we develop cLL1, an efficient framework to extract rich and interpretable structure from general multi-aspect data; (b) Graph summarization and exploration: we provide RichCom, a summarization and encoding scheme to discover and explore structures of communities identified by cLL1; (c) Multi-aspect graph generator: we provide a simple and effective synthetic multi-aspect graph generator, and (d) Real-world utility: we present empirical results on small and large real datasets that demonstrate performance on par or superior to existing state-of-the-art.
{"title":"Beyond Rank-1: Discovering Rich Community Structure in Multi-Aspect Graphs","authors":"Ekta Gujral, Ravdeep Pasricha, E. Papalexakis","doi":"10.1145/3366423.3380129","DOIUrl":"https://doi.org/10.1145/3366423.3380129","url":null,"abstract":"How are communities in real multi-aspect or multi-view graphs structured? How we can effectively and concisely summarize and explore those communities in a high-dimensional, multi-aspect graph without losing important information? State-of-the-art studies focused on patterns in single graphs, identifying structures in a single snapshot of a large network or in time evolving graphs and stitch them over time. However, to the best of our knowledge, there is no method that discovers and summarizes community structure from a multi-aspect graph, by jointly leveraging information from all aspects. State-of-the-art in multi-aspect/tensor community extraction is limited to discovering clique structure in the extracted communities, or even worse, imposing clique structure where it does not exist. In this paper we bridge that gap by empowering tensor-based methods to extract rich community structure from multi-aspect graphs. In particular, we introduce cLL1, a novel constrained Block Term Tensor Decomposition, that is generally capable of extracting higher than rank-1 but still interpretable structure from a multi-aspect dataset. Subsequently, we propose RichCom, a community structure extraction and summarization algorithm that leverages cLL1to identify rich community structure (e.g., cliques, stars, chains, etc) while leveraging higher-order correlations between the different aspects of the graph. Our contributions are four-fold: (a) Novel algorithm: we develop cLL1, an efficient framework to extract rich and interpretable structure from general multi-aspect data; (b) Graph summarization and exploration: we provide RichCom, a summarization and encoding scheme to discover and explore structures of communities identified by cLL1; (c) Multi-aspect graph generator: we provide a simple and effective synthetic multi-aspect graph generator, and (d) Real-world utility: we present empirical results on small and large real datasets that demonstrate performance on par or superior to existing state-of-the-art.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87473033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Spangher, G. Ranade, Besmira Nushi, Adam Fourney, E. Horvitz
The Russia-based Internet Research Agency (IRA) carried out a broad information campaign in the U.S. before and after the 2016 presidential election. The organization created an expansive set of internet properties: web domains, Facebook pages, and Twitter bots, which received traffic via purchased Facebook ads, tweets, and search engines indexing their domains. In this paper, we focus on IRA activities that received exposure through search engines, by joining data from Facebook and Twitter with logs from the Internet Explorer 11 and Edge browsers and the Bing.com search engine. We find that a substantial volume of Russian content was apolitical and emotionally-neutral in nature. Our observations demonstrate that such content gave IRA web-properties considerable exposure through search-engines and brought readers to websites hosting inflammatory content and engagement hooks. Our findings show that, like social media, web search also directed traffic to IRA generated web content, and the resultant traffic patterns are distinct from those of social media.
总部位于俄罗斯的互联网研究机构(IRA)在2016年总统大选前后在美国开展了广泛的信息宣传活动。该组织创建了一套庞大的互联网资产:网络域名、Facebook页面和Twitter机器人,它们通过购买Facebook广告、推文和索引其域名的搜索引擎获得流量。在本文中,我们通过将来自Facebook和Twitter的数据与Internet Explorer 11和Edge浏览器以及Bing.com搜索引擎的日志相结合,重点关注通过搜索引擎曝光的IRA活动。我们发现大量的俄罗斯内容在本质上是非政治和情感中立的。我们的观察表明,这些内容通过搜索引擎给IRA网站带来了相当大的曝光率,并将读者带到了承载煽动性内容和吸引人的网站上。我们的研究结果表明,与社交媒体一样,网络搜索也会将流量导向IRA生成的网络内容,并且由此产生的流量模式与社交媒体截然不同。
{"title":"Characterizing Search-Engine Traffic to Internet Research Agency Web Properties","authors":"Alexander Spangher, G. Ranade, Besmira Nushi, Adam Fourney, E. Horvitz","doi":"10.1145/3366423.3380290","DOIUrl":"https://doi.org/10.1145/3366423.3380290","url":null,"abstract":"The Russia-based Internet Research Agency (IRA) carried out a broad information campaign in the U.S. before and after the 2016 presidential election. The organization created an expansive set of internet properties: web domains, Facebook pages, and Twitter bots, which received traffic via purchased Facebook ads, tweets, and search engines indexing their domains. In this paper, we focus on IRA activities that received exposure through search engines, by joining data from Facebook and Twitter with logs from the Internet Explorer 11 and Edge browsers and the Bing.com search engine. We find that a substantial volume of Russian content was apolitical and emotionally-neutral in nature. Our observations demonstrate that such content gave IRA web-properties considerable exposure through search-engines and brought readers to websites hosting inflammatory content and engagement hooks. Our findings show that, like social media, web search also directed traffic to IRA generated web content, and the resultant traffic patterns are distinct from those of social media.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"79 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91449412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huiqiang Mao, Yanzhi Li, Chenliang Li, Di Chen, Xiaoqing Wang, Yuming Deng
The presence or absence of one item in a recommendation list will affect the demand for other items because customers are often willing to switch to other items if their most preferred items are not available. The cross-item influence, called “peers effect”, has been largely ignored in the literature. In this paper, we develop a peers-aware recommender system, named PARS. We apply a ranking-based choice model to capture the cross-item influence and solve the resultant MaxMin problem with a decomposition algorithm. The MaxMin model solves for the recommendation decision in the meanwhile of estimating users’ preferences towards the items, which yields high-quality recommendations robust to input data variation. Experimental results illustrate that PARS outperforms a few frequently used methods in practice. An online evaluation with a flash sales scenario at Taobao also shows that PARS delivers significant improvements in terms of both conversion rates and user value.
{"title":"PARS: Peers-aware Recommender System","authors":"Huiqiang Mao, Yanzhi Li, Chenliang Li, Di Chen, Xiaoqing Wang, Yuming Deng","doi":"10.1145/3366423.3380013","DOIUrl":"https://doi.org/10.1145/3366423.3380013","url":null,"abstract":"The presence or absence of one item in a recommendation list will affect the demand for other items because customers are often willing to switch to other items if their most preferred items are not available. The cross-item influence, called “peers effect”, has been largely ignored in the literature. In this paper, we develop a peers-aware recommender system, named PARS. We apply a ranking-based choice model to capture the cross-item influence and solve the resultant MaxMin problem with a decomposition algorithm. The MaxMin model solves for the recommendation decision in the meanwhile of estimating users’ preferences towards the items, which yields high-quality recommendations robust to input data variation. Experimental results illustrate that PARS outperforms a few frequently used methods in practice. An online evaluation with a flash sales scenario at Taobao also shows that PARS delivers significant improvements in terms of both conversion rates and user value.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87391534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent works show that end-to-end, (semi-) supervised network embedding models can generate satisfactory vectors to represent network topology, and are even applicable to unseen graphs by inductive learning. However, domain mismatch between training and testing network for inductive learning, as well as lack of labeled data often compromises the outcome of such methods. To make matters worse, while transfer learning and active learning techniques, being able to solve such problems correspondingly, have been well studied on regular i.i.d data, relatively few attention has been paid on networks. Consequently, we propose in this paper a method for active transfer learning on networks named active-transfer network embedding, abbreviated ATNE. In ATNE we jointly consider the influence of each node on the network from the perspectives of transfer and active learning, and hence design novel and effective influence scores combining both aspects in the training process to facilitate node selection. We demonstrate that ATNE is efficient and decoupled from the actual model used. Further extensive experiments show that ATNE outperforms state-of-the-art active node selection methods and shows versatility in different situations.
{"title":"Active Domain Transfer on Network Embedding","authors":"Lichen Jin, Yizhou Zhang, Guojie Song, Yilun Jin","doi":"10.1145/3366423.3380024","DOIUrl":"https://doi.org/10.1145/3366423.3380024","url":null,"abstract":"Recent works show that end-to-end, (semi-) supervised network embedding models can generate satisfactory vectors to represent network topology, and are even applicable to unseen graphs by inductive learning. However, domain mismatch between training and testing network for inductive learning, as well as lack of labeled data often compromises the outcome of such methods. To make matters worse, while transfer learning and active learning techniques, being able to solve such problems correspondingly, have been well studied on regular i.i.d data, relatively few attention has been paid on networks. Consequently, we propose in this paper a method for active transfer learning on networks named active-transfer network embedding, abbreviated ATNE. In ATNE we jointly consider the influence of each node on the network from the perspectives of transfer and active learning, and hence design novel and effective influence scores combining both aspects in the training process to facilitate node selection. We demonstrate that ATNE is efficient and decoupled from the actual model used. Further extensive experiments show that ATNE outperforms state-of-the-art active node selection methods and shows versatility in different situations.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89768377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Chan, F. Du, Ryan A. Rossi, Anup B. Rao, Eunyee Koh, Cláudio T. Silva, J. Freire
Online visitor behaviors are often modeled as a large sparse matrix, where rows represent visitors and columns represent behavior. To discover customer segments with different hierarchies, marketers often need to cluster the data in different splits. Such analyses require the clustering algorithm to provide real-time responses on user parameter changes, which the current techniques cannot support. In this paper, we propose a real-time clustering algorithm, sparse density peaks, for large-scale sparse data. It pre-processes the input points to compute annotations and a hierarchy for cluster assignment. While the assignment is only a single scan of the points, a naive pre-processing requires measuring all pairwise distances, which incur a quadratic computation overhead and is infeasible for any moderately sized data. Thus, we propose a new approach based on MinHash and LSH that provides fast and accurate estimations. We also describe an efficient implementation on Spark that addresses data skew and memory usage. Our experiments show that our approach (1) provides a better approximation compared to a straightforward MinHash and LSH implementation in terms of accuracy on real datasets, (2) achieves a 20 × speedup in the end-to-end clustering pipeline, and (3) can maintain computations with a small memory. Finally, we present an interface to explore customer segments from millions of online visitor records in real-time.
{"title":"Real-Time Clustering for Large Sparse Online Visitor Data","authors":"G. Chan, F. Du, Ryan A. Rossi, Anup B. Rao, Eunyee Koh, Cláudio T. Silva, J. Freire","doi":"10.1145/3366423.3380183","DOIUrl":"https://doi.org/10.1145/3366423.3380183","url":null,"abstract":"Online visitor behaviors are often modeled as a large sparse matrix, where rows represent visitors and columns represent behavior. To discover customer segments with different hierarchies, marketers often need to cluster the data in different splits. Such analyses require the clustering algorithm to provide real-time responses on user parameter changes, which the current techniques cannot support. In this paper, we propose a real-time clustering algorithm, sparse density peaks, for large-scale sparse data. It pre-processes the input points to compute annotations and a hierarchy for cluster assignment. While the assignment is only a single scan of the points, a naive pre-processing requires measuring all pairwise distances, which incur a quadratic computation overhead and is infeasible for any moderately sized data. Thus, we propose a new approach based on MinHash and LSH that provides fast and accurate estimations. We also describe an efficient implementation on Spark that addresses data skew and memory usage. Our experiments show that our approach (1) provides a better approximation compared to a straightforward MinHash and LSH implementation in terms of accuracy on real datasets, (2) achieves a 20 × speedup in the end-to-end clustering pipeline, and (3) can maintain computations with a small memory. Finally, we present an interface to explore customer segments from millions of online visitor records in real-time.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75558719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yao Zhou, A. R. Nelakurthi, Ross Maciejewski, Wei Fan, Jingrui He
The need for annotated labels to train machine learning models led to a surge in crowdsourcing - collecting labels from non-experts. Instead of annotating from scratch, given an imperfect labeled set, how can we leverage the label information obtained from amateur crowd workers to improve the data quality? Furthermore, is there a way to teach the amateur crowd workers using this imperfect labeled set in order to improve their labeling performance? In this paper, we aim to answer both questions via a novel interactive teaching framework, which uses visual explanations to simultaneously teach and gauge the confidence level of the crowd workers. Motivated by the huge demand for fine-grained label information in real-world applications, we start from the realistic and yet challenging assumption that neither the teacher nor the crowd workers are perfect. Then, we propose an adaptive scheme that could improve both of them through a sequence of interactions: the teacher teaches the workers using labeled data, and in return, the workers provide labels and the associated confidence level based on their own expertise. In particular, the teacher performs teaching using an empirical risk minimizer learned from an imperfect labeled set; the workers are assumed to have a forgetting behavior during learning and their learning rate depends on the interpretation difficulty of the teaching item. Furthermore, depending on the level of confidence when the workers perform labeling, we also show that the empirical risk minimizer used by the teacher is a reliable and realistic substitute of the unknown target concept by utilizing the unbiased surrogate loss. Finally, the performance of the proposed framework is demonstrated through experiments on multiple real-world image and text data sets.
{"title":"Crowd Teaching with Imperfect Labels","authors":"Yao Zhou, A. R. Nelakurthi, Ross Maciejewski, Wei Fan, Jingrui He","doi":"10.1145/3366423.3380099","DOIUrl":"https://doi.org/10.1145/3366423.3380099","url":null,"abstract":"The need for annotated labels to train machine learning models led to a surge in crowdsourcing - collecting labels from non-experts. Instead of annotating from scratch, given an imperfect labeled set, how can we leverage the label information obtained from amateur crowd workers to improve the data quality? Furthermore, is there a way to teach the amateur crowd workers using this imperfect labeled set in order to improve their labeling performance? In this paper, we aim to answer both questions via a novel interactive teaching framework, which uses visual explanations to simultaneously teach and gauge the confidence level of the crowd workers. Motivated by the huge demand for fine-grained label information in real-world applications, we start from the realistic and yet challenging assumption that neither the teacher nor the crowd workers are perfect. Then, we propose an adaptive scheme that could improve both of them through a sequence of interactions: the teacher teaches the workers using labeled data, and in return, the workers provide labels and the associated confidence level based on their own expertise. In particular, the teacher performs teaching using an empirical risk minimizer learned from an imperfect labeled set; the workers are assumed to have a forgetting behavior during learning and their learning rate depends on the interpretation difficulty of the teaching item. Furthermore, depending on the level of confidence when the workers perform labeling, we also show that the empirical risk minimizer used by the teacher is a reliable and realistic substitute of the unknown target concept by utilizing the unbiased surrogate loss. Finally, the performance of the proposed framework is demonstrated through experiments on multiple real-world image and text data sets.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"129 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73401030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueqi Li, Wenjun Jiang, Weiguang Chen, Jie Wu, Guojun Wang, Kenli Li
Serendipity recommendation has attracted more and more attention in recent years; it is committed to providing recommendations which could not only cater to users’ demands but also broaden their horizons. However, existing approaches usually measure user-item relevance with a scalar instead of a vector, ignoring user preference direction, which increases the risk of unrelated recommendations. In addition, reasonable explanations increase users’ trust and acceptance, but there is no work to provide explanations for serendipitous recommendations. To address these limitations, we propose a Directional and Explainable Serendipity Recommendation method named DESR. Specifically, we extract users’ long-term preferences with an unsupervised method based on GMM (Gaussian Mixture Model) and capture their short-term demands with the capsule network at first. Then, we propose the serendipity vector to combine long-term preferences with short-term demands and generate directionally serendipitous recommendations with it. Finally, a back-routing scheme is exploited to offer explanations. Extensive experiments on real-world datasets show that DESR could effectively improve the serendipity and explainability, and give impetus to the diversity, compared with existing serendipity-based methods.
{"title":"Directional and Explainable Serendipity Recommendation","authors":"Xueqi Li, Wenjun Jiang, Weiguang Chen, Jie Wu, Guojun Wang, Kenli Li","doi":"10.1145/3366423.3380100","DOIUrl":"https://doi.org/10.1145/3366423.3380100","url":null,"abstract":"Serendipity recommendation has attracted more and more attention in recent years; it is committed to providing recommendations which could not only cater to users’ demands but also broaden their horizons. However, existing approaches usually measure user-item relevance with a scalar instead of a vector, ignoring user preference direction, which increases the risk of unrelated recommendations. In addition, reasonable explanations increase users’ trust and acceptance, but there is no work to provide explanations for serendipitous recommendations. To address these limitations, we propose a Directional and Explainable Serendipity Recommendation method named DESR. Specifically, we extract users’ long-term preferences with an unsupervised method based on GMM (Gaussian Mixture Model) and capture their short-term demands with the capsule network at first. Then, we propose the serendipity vector to combine long-term preferences with short-term demands and generate directionally serendipitous recommendations with it. Finally, a back-routing scheme is exploited to offer explanations. Extensive experiments on real-world datasets show that DESR could effectively improve the serendipity and explainability, and give impetus to the diversity, compared with existing serendipity-based methods.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84583696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Datta, P. Kumar, Tristan Morris, M. Grace, Amir Rahmati, Adam Bates
Serverless Computing has quickly emerged as a dominant cloud computing paradigm, allowing developers to rapidly prototype event-driven applications using a composition of small functions that each perform a single logical task. However, many such application workflows are based in part on publicly-available functions developed by third-parties, creating the potential for functions to behave in unexpected, or even malicious, ways. At present, developers are not in total control of where and how their data is flowing, creating significant security and privacy risks in growth markets that have embraced serverless (e.g., IoT). As a practical means of addressing this problem, we present Valve, a serverless platform that enables developers to exert complete fine-grained control of information flows in their applications. Valve enables workflow developers to reason about function behaviors, and specify restrictions, through auditing of network-layer information flows. By proxying network requests and propagating taint labels across network flows, Valve is able to restrict function behavior without code modification. We demonstrate that Valve is able defend against known serverless attack behaviors including container reuse-based persistence and data exfiltration over cloud platform APIs with less than 2.8% runtime overhead, 6.25% deployment overhead and 2.35% teardown overhead.
{"title":"Valve: Securing Function Workflows on Serverless Computing Platforms","authors":"P. Datta, P. Kumar, Tristan Morris, M. Grace, Amir Rahmati, Adam Bates","doi":"10.1145/3366423.3380173","DOIUrl":"https://doi.org/10.1145/3366423.3380173","url":null,"abstract":"Serverless Computing has quickly emerged as a dominant cloud computing paradigm, allowing developers to rapidly prototype event-driven applications using a composition of small functions that each perform a single logical task. However, many such application workflows are based in part on publicly-available functions developed by third-parties, creating the potential for functions to behave in unexpected, or even malicious, ways. At present, developers are not in total control of where and how their data is flowing, creating significant security and privacy risks in growth markets that have embraced serverless (e.g., IoT). As a practical means of addressing this problem, we present Valve, a serverless platform that enables developers to exert complete fine-grained control of information flows in their applications. Valve enables workflow developers to reason about function behaviors, and specify restrictions, through auditing of network-layer information flows. By proxying network requests and propagating taint labels across network flows, Valve is able to restrict function behavior without code modification. We demonstrate that Valve is able defend against known serverless attack behaviors including container reuse-based persistence and data exfiltration over cloud platform APIs with less than 2.8% runtime overhead, 6.25% deployment overhead and 2.35% teardown overhead.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83700139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Zhu, Hao Wei, Bunyamin Sisman, Da Zheng, C. Faloutsos, Xin Dong, Jiawei Han
Knowledge graph (e.g. Freebase, YAGO) is a multi-relational graph representing rich factual information among entities of various types. Entity alignment is the key step towards knowledge graph integration from multiple sources. It aims to identify entities across different knowledge graphs that refer to the same real world entity. However, current entity alignment systems overlook the sparsity of different knowledge graphs and can not align multi-type entities by one single model. In this paper, we present a Collective Graph neural network for Multi-type entity Alignment, called CG-MuAlign. Different from previous work, CG-MuAlign jointly aligns multiple types of entities, collectively leverages the neighborhood information and generalizes to unlabeled entity types. Specifically, we propose novel collective aggregation function tailored for this task, that (1) relieves the incompleteness of knowledge graphs via both cross-graph and self attentions, (2) scales up efficiently with mini-batch training paradigm and effective neighborhood sampling strategy. We conduct experiments on real world knowledge graphs with millions of entities and observe the superior performance beyond existing methods. In addition, the running time of our approach is much less than the current state-of-the-art deep learning methods.
{"title":"Collective Multi-type Entity Alignment Between Knowledge Graphs","authors":"Qi Zhu, Hao Wei, Bunyamin Sisman, Da Zheng, C. Faloutsos, Xin Dong, Jiawei Han","doi":"10.1145/3366423.3380289","DOIUrl":"https://doi.org/10.1145/3366423.3380289","url":null,"abstract":"Knowledge graph (e.g. Freebase, YAGO) is a multi-relational graph representing rich factual information among entities of various types. Entity alignment is the key step towards knowledge graph integration from multiple sources. It aims to identify entities across different knowledge graphs that refer to the same real world entity. However, current entity alignment systems overlook the sparsity of different knowledge graphs and can not align multi-type entities by one single model. In this paper, we present a Collective Graph neural network for Multi-type entity Alignment, called CG-MuAlign. Different from previous work, CG-MuAlign jointly aligns multiple types of entities, collectively leverages the neighborhood information and generalizes to unlabeled entity types. Specifically, we propose novel collective aggregation function tailored for this task, that (1) relieves the incompleteness of knowledge graphs via both cross-graph and self attentions, (2) scales up efficiently with mini-batch training paradigm and effective neighborhood sampling strategy. We conduct experiments on real world knowledge graphs with millions of entities and observe the superior performance beyond existing methods. In addition, the running time of our approach is much less than the current state-of-the-art deep learning methods.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83130864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}