Yuan Deng, Sébastien Lahaie, V. Mirrokni, Song Zuo
An incentive-compatible auction incentivizes buyers to truthfully reveal their private valuations. However, many ad auction mechanisms deployed in practice are not incentive-compatible, such as first-price auctions (for display advertising) and the generalized second-price auction (for search advertising). We introduce a new metric to quantify incentive compatibility in both static and dynamic environments. Our metric is data-driven and can be computed directly through black-box auction simulations without relying on reference mechanisms or complex optimizations. We provide interpretable characterizations of our metric and prove that it is monotone in auction parameters for several mechanisms used in practice, such as soft floors and dynamic reserve prices. We empirically evaluate our metric on ad auction data from a major ad exchange and a major search engine to demonstrate its broad applicability in practice.
{"title":"A Data-Driven Metric of Incentive Compatibility","authors":"Yuan Deng, Sébastien Lahaie, V. Mirrokni, Song Zuo","doi":"10.1145/3366423.3380249","DOIUrl":"https://doi.org/10.1145/3366423.3380249","url":null,"abstract":"An incentive-compatible auction incentivizes buyers to truthfully reveal their private valuations. However, many ad auction mechanisms deployed in practice are not incentive-compatible, such as first-price auctions (for display advertising) and the generalized second-price auction (for search advertising). We introduce a new metric to quantify incentive compatibility in both static and dynamic environments. Our metric is data-driven and can be computed directly through black-box auction simulations without relying on reference mechanisms or complex optimizations. We provide interpretable characterizations of our metric and prove that it is monotone in auction parameters for several mechanisms used in practice, such as soft floors and dynamic reserve prices. We empirically evaluate our metric on ad auction data from a major ad exchange and a major search engine to demonstrate its broad applicability in practice.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76932737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The measure of similarity between nodes in a graph is a useful tool in many areas of computer science. SimRank, proposed by Jeh and Widom [7], is a classic measure of similarities of nodes in graph that has both theoretical and intuitive properties and has been extensively studied and used in many applications such as Query-Rewriting, link prediction, collaborative filtering and so on. Existing works based on Simrank primarily focus on preserving the microscopic structure, such as the second and third order proximity of the vertices, while the macroscopic scale-free property is largely ignored. Scale-free property is a critical property of any real-world web graphs where the vertex degrees follow a heavy-tailed distribution. In this paper, we introduce P-Simrank which extends the idea of Simrank to Scale-free bipartite networks. To study the efficacy of the proposed solution on a real world problem, we tested the same on the well known query-rewriting problem in sponsored search domain using bipartite click graph, similar to Simrank++ [1], which acts as our baseline. We show that Simrank++ produces sub-optimal similarity scores in case of bipartite graphs where degree distribution of vertices follow power-law. We also show how P-Simrank can be optimized for real-world large graphs. Finally, we experimentally evaluate P-Simrank algorithm against Simrank++, using actual click graphs obtained from Bing, and show that P-Simrank outperforms Simrank++ in variety of metrics.
{"title":"P-Simrank: Extending Simrank to Scale-Free Bipartite Networks","authors":"Prasenjit Dey, Kunal Goel, Rahul Agrawal","doi":"10.1145/3366423.3380081","DOIUrl":"https://doi.org/10.1145/3366423.3380081","url":null,"abstract":"The measure of similarity between nodes in a graph is a useful tool in many areas of computer science. SimRank, proposed by Jeh and Widom [7], is a classic measure of similarities of nodes in graph that has both theoretical and intuitive properties and has been extensively studied and used in many applications such as Query-Rewriting, link prediction, collaborative filtering and so on. Existing works based on Simrank primarily focus on preserving the microscopic structure, such as the second and third order proximity of the vertices, while the macroscopic scale-free property is largely ignored. Scale-free property is a critical property of any real-world web graphs where the vertex degrees follow a heavy-tailed distribution. In this paper, we introduce P-Simrank which extends the idea of Simrank to Scale-free bipartite networks. To study the efficacy of the proposed solution on a real world problem, we tested the same on the well known query-rewriting problem in sponsored search domain using bipartite click graph, similar to Simrank++ [1], which acts as our baseline. We show that Simrank++ produces sub-optimal similarity scores in case of bipartite graphs where degree distribution of vertices follow power-law. We also show how P-Simrank can be optimized for real-world large graphs. Finally, we experimentally evaluate P-Simrank algorithm against Simrank++, using actual click graphs obtained from Bing, and show that P-Simrank outperforms Simrank++ in variety of metrics.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"84 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77021093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vasudevan Nagendra, A. Bhattacharya, V. Yegneswaran, Amir Rahmati, Samir R Das
Consumer IoT networks are characterized by heterogeneous devices with diverse functionality and programming interfaces. This lack of homogeneity makes the integration and secure management of IoT infrastructures a daunting task for users and administrators. In this paper, we introduce VISCR, a Vendor-Independent policy Specification and Conflict Resolution engine that enables intent-based conflict-free policy specification and enforcement in IoT environments. VISCR converts the topology of the IoT infrastructure into a tree-based abstraction and translates existing policies from heterogeneous vendor-specific programming languages, such as Groovy-based SmartThings, OpenHAB, IFTTT-based templates, and MUD-based profiles, into a vendor-independent graph-based specification. These are then used to automatically detect rogue policies, policy conflicts, and automation bugs. We evaluated VISCR using a dataset of 907 IoT apps, programmed using heterogeneous automation specifications, in a simulated smart-building IoT infrastructure. In our experiments, among 907 IoT apps, VISCR exposed 342 of IoT apps as exhibiting one or more violations, while also running 14.2x faster than the state-of-the-art tool (Soteria). VISCR detected 100% of violations reported by Soteria while also detecting new types of violations in 266 additional apps.
{"title":"An Intent-Based Automation Framework for Securing Dynamic Consumer IoT Infrastructures","authors":"Vasudevan Nagendra, A. Bhattacharya, V. Yegneswaran, Amir Rahmati, Samir R Das","doi":"10.1145/3366423.3380234","DOIUrl":"https://doi.org/10.1145/3366423.3380234","url":null,"abstract":"Consumer IoT networks are characterized by heterogeneous devices with diverse functionality and programming interfaces. This lack of homogeneity makes the integration and secure management of IoT infrastructures a daunting task for users and administrators. In this paper, we introduce VISCR, a Vendor-Independent policy Specification and Conflict Resolution engine that enables intent-based conflict-free policy specification and enforcement in IoT environments. VISCR converts the topology of the IoT infrastructure into a tree-based abstraction and translates existing policies from heterogeneous vendor-specific programming languages, such as Groovy-based SmartThings, OpenHAB, IFTTT-based templates, and MUD-based profiles, into a vendor-independent graph-based specification. These are then used to automatically detect rogue policies, policy conflicts, and automation bugs. We evaluated VISCR using a dataset of 907 IoT apps, programmed using heterogeneous automation specifications, in a simulated smart-building IoT infrastructure. In our experiments, among 907 IoT apps, VISCR exposed 342 of IoT apps as exhibiting one or more violations, while also running 14.2x faster than the state-of-the-art tool (Soteria). VISCR detected 100% of violations reported by Soteria while also detecting new types of violations in 266 additional apps.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84596450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Personalized search improves generic ranking models by taking user interests into consideration and returning more accurate search results to individual users. In recent years, machine learning and deep learning techniques have been successfully applied in personalized search. Most existing personalization models simply regard the search history as a static set of user behaviours and learn fixed ranking strategies based on the recorded data. Though improvements have been observed, it is obvious that these methods ignore the dynamic nature of the search process: search is a sequence of interactions between the search engine and the user. During the search process, the user interests may dynamically change. It would be more helpful if a personalized search model could track the whole interaction process and update its ranking strategy continuously. In this paper, we propose a reinforcement learning based personalization model, referred to as RLPer, to track the sequential interactions between the users and search engine with a hierarchical Markov Decision Process (MDP). In RLPer, the search engine interacts with the user to update the underlying ranking model continuously with real-time feedback. And we design a feedback-aware personalized ranking component to catch the user’s feedback which has impacts on the user interest profile for the next query. Experimental results on the publicly available AOL search log verify that our proposed model can significantly outperform state-of-the-art personalized search models.
{"title":"RLPer: A Reinforcement Learning Model for Personalized Search","authors":"Jing Yao, Zhicheng Dou, Jun Xu, Ji-rong Wen","doi":"10.1145/3366423.3380294","DOIUrl":"https://doi.org/10.1145/3366423.3380294","url":null,"abstract":"Personalized search improves generic ranking models by taking user interests into consideration and returning more accurate search results to individual users. In recent years, machine learning and deep learning techniques have been successfully applied in personalized search. Most existing personalization models simply regard the search history as a static set of user behaviours and learn fixed ranking strategies based on the recorded data. Though improvements have been observed, it is obvious that these methods ignore the dynamic nature of the search process: search is a sequence of interactions between the search engine and the user. During the search process, the user interests may dynamically change. It would be more helpful if a personalized search model could track the whole interaction process and update its ranking strategy continuously. In this paper, we propose a reinforcement learning based personalization model, referred to as RLPer, to track the sequential interactions between the users and search engine with a hierarchical Markov Decision Process (MDP). In RLPer, the search engine interacts with the user to update the underlying ranking model continuously with real-time feedback. And we design a feedback-aware personalized ranking component to catch the user’s feedback which has impacts on the user interest profile for the next query. Experimental results on the publicly available AOL search log verify that our proposed model can significantly outperform state-of-the-art personalized search models.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"87 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89915837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
User services increasingly base their actions on AI models, e.g., to offer personalized and proactive support. However, the underlying AI algorithms require a continuous stream of personal data—leading to privacy issues, as users typically have to share this data out of their territory. Current privacy-preserving concepts are either not applicable to such AI-based services or to the disadvantage of any party. This paper presents PrivAI, a new decentralized and privacy-by-design platform for overcoming the need for sharing user data to benefit from personalized AI services. In short, PrivAI complements existing approaches to personal data stores, but strictly enforces the confinement of raw user data. PrivAI further addresses the resulting challenges by (1) dividing AI algorithms into cloud-based general model training, subsequent local personalization, and community-based sharing of model updates for new users; by (2) loading confidential AI models into a trusted execution environment, and thus, protecting provider’s intellectual property (IP). Our experiments show the feasibility and effectiveness of PrivAI with comparable performance as currently-practiced approaches.
{"title":"Privacy-preserving AI Services Through Data Decentralization","authors":"Christian Meurisch, Bekir Bayrak, M. Mühlhäuser","doi":"10.1145/3366423.3380106","DOIUrl":"https://doi.org/10.1145/3366423.3380106","url":null,"abstract":"User services increasingly base their actions on AI models, e.g., to offer personalized and proactive support. However, the underlying AI algorithms require a continuous stream of personal data—leading to privacy issues, as users typically have to share this data out of their territory. Current privacy-preserving concepts are either not applicable to such AI-based services or to the disadvantage of any party. This paper presents PrivAI, a new decentralized and privacy-by-design platform for overcoming the need for sharing user data to benefit from personalized AI services. In short, PrivAI complements existing approaches to personal data stores, but strictly enforces the confinement of raw user data. PrivAI further addresses the resulting challenges by (1) dividing AI algorithms into cloud-based general model training, subsequent local personalization, and community-based sharing of model updates for new users; by (2) loading confidential AI models into a trusted execution environment, and thus, protecting provider’s intellectual property (IP). Our experiments show the feasibility and effectiveness of PrivAI with comparable performance as currently-practiced approaches.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74247680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As considerable amounts of POI check-in data have been accumulated, successive point-of-interest (POI) recommendation is increasingly popular. Existing successive POI recommendation methods only predict where user will go next, ignoring when this behavior will occur. In this work, we focus on predicting POIs that will be visited by users in the next 24 hours. As check-in data is very sparse, it is challenging to accurately capture user preferences in temporal patterns. To this end, we propose a category-aware deep model CatDM that incorporates POI category and geographical influence to reduce search space to overcome data sparsity. We design two deep encoders based on LSTM to model the time series data. The first encoder captures user preferences in POI categories, whereas the second exploits user preferences in POIs. Considering clock influence in the second encoder, we divide each user’s check-in history into several different time windows and develop a personalized attention mechanism for each window to facilitate CatDM to exploit temporal patterns. Moreover, to sort the candidate set, we consider four specific dependencies: user-POI, user-category, POI-time and POI-user current preferences. Extensive experiments are conducted on two large real datasets. The experimental results demonstrate that our CatDM outperforms the state-of-the-art models for successive POI recommendation on sparse check-in data.
{"title":"A Category-Aware Deep Model for Successive POI Recommendation on Sparse Check-in Data","authors":"Fuqiang Yu, Li-zhen Cui, Wei Guo, Xudong Lu, Qingzhong Li, Hua Lu","doi":"10.1145/3366423.3380202","DOIUrl":"https://doi.org/10.1145/3366423.3380202","url":null,"abstract":"As considerable amounts of POI check-in data have been accumulated, successive point-of-interest (POI) recommendation is increasingly popular. Existing successive POI recommendation methods only predict where user will go next, ignoring when this behavior will occur. In this work, we focus on predicting POIs that will be visited by users in the next 24 hours. As check-in data is very sparse, it is challenging to accurately capture user preferences in temporal patterns. To this end, we propose a category-aware deep model CatDM that incorporates POI category and geographical influence to reduce search space to overcome data sparsity. We design two deep encoders based on LSTM to model the time series data. The first encoder captures user preferences in POI categories, whereas the second exploits user preferences in POIs. Considering clock influence in the second encoder, we divide each user’s check-in history into several different time windows and develop a personalized attention mechanism for each window to facilitate CatDM to exploit temporal patterns. Moreover, to sort the candidate set, we consider four specific dependencies: user-POI, user-category, POI-time and POI-user current preferences. Extensive experiments are conducted on two large real datasets. The experimental results demonstrate that our CatDM outperforms the state-of-the-art models for successive POI recommendation on sparse check-in data.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74289044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rise of mobile video, user-generated content, and social networks, there is a massive opportunity for disruptive innovations in the media and content industry. It is now a fast-changing landscape with rapid advances in AI-powered content creation, dissemination and interaction technologies. I believe the current trends are leading us towards a world where everyone is equally empowered to produce high-quality content in video, music, augmented reality or more – and to share their information, knowledge, and stories with a large global audience. This new AI- powered content platform can further lead to innovations in advertising, e-commerce, online education, and productivity. I will share the current research efforts at ByteDance connected to this emerging new platform through products such as Douyin and TikTok, and discuss the challenges and the direction of our future research.
{"title":"Democratizing Content Creation and Dissemination through AI Technology","authors":"Wei-Ying Ma","doi":"10.1145/3366423.3382669","DOIUrl":"https://doi.org/10.1145/3366423.3382669","url":null,"abstract":"With the rise of mobile video, user-generated content, and social networks, there is a massive opportunity for disruptive innovations in the media and content industry. It is now a fast-changing landscape with rapid advances in AI-powered content creation, dissemination and interaction technologies. I believe the current trends are leading us towards a world where everyone is equally empowered to produce high-quality content in video, music, augmented reality or more – and to share their information, knowledge, and stories with a large global audience. This new AI- powered content platform can further lead to innovations in advertising, e-commerce, online education, and productivity. I will share the current research efforts at ByteDance connected to this emerging new platform through products such as Douyin and TikTok, and discuss the challenges and the direction of our future research.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"116 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80371635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Conventional multi-task model restricts the task structure to be linearly related, which may not be suitable when data is linearly nonseparable. To remedy this issue, we propose a kernel algorithm for online multi-task classification, as the large approximation space provided by reproducing kernel Hilbert spaces often contains an accurate function. Specifically, it maintains a local-global Gaussian distribution over each task model that guides the direction and scale of parameter updates. Nonetheless, optimizing over this space is computationally expensive. Moreover, most multi-task learning methods require accessing to the entire training instances, which is luxury unavailable in the large-scale streaming learning scenario. To overcome this issue, we propose a randomized kernel sampling technique across multiple tasks. Instead of requiring all inputs’ labels, the proposed algorithm determines whether to query a label or not via considering the confidence from the related tasks over label prediction. Theoretically, the algorithm trained on actively sampled labels can achieve a comparable result with one learned on all labels. Empirically, the proposed algorithm is able to achieve promising learning efficacy, while reducing the computational complexity and labeling cost simultaneously.
{"title":"Efficient Online Multi-Task Learning via Adaptive Kernel Selection","authors":"Peng Yang, P. Li","doi":"10.1145/3366423.3379993","DOIUrl":"https://doi.org/10.1145/3366423.3379993","url":null,"abstract":"Conventional multi-task model restricts the task structure to be linearly related, which may not be suitable when data is linearly nonseparable. To remedy this issue, we propose a kernel algorithm for online multi-task classification, as the large approximation space provided by reproducing kernel Hilbert spaces often contains an accurate function. Specifically, it maintains a local-global Gaussian distribution over each task model that guides the direction and scale of parameter updates. Nonetheless, optimizing over this space is computationally expensive. Moreover, most multi-task learning methods require accessing to the entire training instances, which is luxury unavailable in the large-scale streaming learning scenario. To overcome this issue, we propose a randomized kernel sampling technique across multiple tasks. Instead of requiring all inputs’ labels, the proposed algorithm determines whether to query a label or not via considering the confidence from the related tasks over label prediction. Theoretically, the algorithm trained on actively sampled labels can achieve a comparable result with one learned on all labels. Empirically, the proposed algorithm is able to achieve promising learning efficacy, while reducing the computational complexity and labeling cost simultaneously.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89504862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Searching for documents with semantically similar content is a fundamental problem in the information retrieval domain with various challenges, primarily, in terms of efficiency and effectiveness. Despite the promise of modeling structured dependencies in documents, several existing text hashing methods lack an efficient mechanism to incorporate such vital information. Additionally, the desired characteristics of an ideal hash function, such as robustness to noise, low quantization error and bit balance/uncorrelation, are not effectively learned with existing methods. This is because of the requirement to either tune additional hyper-parameters or optimize these heuristically and explicitly constructed cost functions. In this paper, we propose a Denoising Adversarial Binary Autoencoder (DABA) model which presents a novel representation learning framework that captures structured representation of text documents in the learned hash function. Also, adversarial training provides an alternative direction to implicitly learn a hash function that captures all the desired characteristics of an ideal hash function. Essentially, DABA adopts a novel single-optimization adversarial training procedure that minimizes the Wasserstein distance in its primal domain to regularize the encoder’s output of either a recurrent neural network or a convolutional autoencoder. We empirically demonstrate the effectiveness of our proposed method in capturing the intrinsic semantic manifold of the related documents. The proposed method outperforms the current state-of-the-art shallow and deep unsupervised hashing methods for the document retrieval task on several prominent document collections.
{"title":"Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder","authors":"Khoa D Doan, C. Reddy","doi":"10.1145/3366423.3380150","DOIUrl":"https://doi.org/10.1145/3366423.3380150","url":null,"abstract":"Searching for documents with semantically similar content is a fundamental problem in the information retrieval domain with various challenges, primarily, in terms of efficiency and effectiveness. Despite the promise of modeling structured dependencies in documents, several existing text hashing methods lack an efficient mechanism to incorporate such vital information. Additionally, the desired characteristics of an ideal hash function, such as robustness to noise, low quantization error and bit balance/uncorrelation, are not effectively learned with existing methods. This is because of the requirement to either tune additional hyper-parameters or optimize these heuristically and explicitly constructed cost functions. In this paper, we propose a Denoising Adversarial Binary Autoencoder (DABA) model which presents a novel representation learning framework that captures structured representation of text documents in the learned hash function. Also, adversarial training provides an alternative direction to implicitly learn a hash function that captures all the desired characteristics of an ideal hash function. Essentially, DABA adopts a novel single-optimization adversarial training procedure that minimizes the Wasserstein distance in its primal domain to regularize the encoder’s output of either a recurrent neural network or a convolutional autoencoder. We empirically demonstrate the effectiveness of our proposed method in capturing the intrinsic semantic manifold of the related documents. The proposed method outperforms the current state-of-the-art shallow and deep unsupervised hashing methods for the document retrieval task on several prominent document collections.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81116063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuoren Jiang, Zheng Gao, Jinjiong Lan, Hongxia Yang, Yao Lu, Xiaozhong Liu
The recent success of deep graph embedding innovates the graphical information characterization methodologies. However, in real-world applications, such a method still struggles with the challenges of heterogeneity, scalability, and multiplex. To address these challenges, in this study, we propose a novel solution, Genetic hEterogeneous gRaph eMbedding (GERM), which enables flexible and efficient task-driven vertex embedding in a complex heterogeneous graph. Unlike prior efforts for this track of studies, we employ a task-oriented genetic activation strategy to efficiently generate the “Edge Type Activated Vector” (ETAV) over the edge types in the graph. The generated ETAV can not only reduce the incompatible noise and navigate the heterogeneous graph random walk at the graph-schema level, but also activate an optimized subgraph for efficient representation learning. By revealing the correlation between the graph structure and task information, the model interpretability can be enhanced as well. Meanwhile, an activated heterogeneous skip-gram framework is proposed to encapsulate both topological and task-specific information of a given heterogeneous graph. Through extensive experiments on both scholarly and e-commerce datasets, we demonstrate the efficacy and scalability of the proposed methods via various search/recommendation tasks. GERM can significantly reduces the running time and remove expert-intervention without sacrificing the performance (or even modestly improve) by comparing with baselines.
{"title":"Task-Oriented Genetic Activation for Large-Scale Complex Heterogeneous Graph Embedding","authors":"Zhuoren Jiang, Zheng Gao, Jinjiong Lan, Hongxia Yang, Yao Lu, Xiaozhong Liu","doi":"10.1145/3366423.3380230","DOIUrl":"https://doi.org/10.1145/3366423.3380230","url":null,"abstract":"The recent success of deep graph embedding innovates the graphical information characterization methodologies. However, in real-world applications, such a method still struggles with the challenges of heterogeneity, scalability, and multiplex. To address these challenges, in this study, we propose a novel solution, Genetic hEterogeneous gRaph eMbedding (GERM), which enables flexible and efficient task-driven vertex embedding in a complex heterogeneous graph. Unlike prior efforts for this track of studies, we employ a task-oriented genetic activation strategy to efficiently generate the “Edge Type Activated Vector” (ETAV) over the edge types in the graph. The generated ETAV can not only reduce the incompatible noise and navigate the heterogeneous graph random walk at the graph-schema level, but also activate an optimized subgraph for efficient representation learning. By revealing the correlation between the graph structure and task information, the model interpretability can be enhanced as well. Meanwhile, an activated heterogeneous skip-gram framework is proposed to encapsulate both topological and task-specific information of a given heterogeneous graph. Through extensive experiments on both scholarly and e-commerce datasets, we demonstrate the efficacy and scalability of the proposed methods via various search/recommendation tasks. GERM can significantly reduces the running time and remove expert-intervention without sacrificing the performance (or even modestly improve) by comparing with baselines.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90469714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}