Taxi ridesharing becomes promising and attractive because of the wide availability of taxis in a city and tremendous benefits of ridesharing, e.g., alleviating traffic congestion and reducing energy consumption. Existing taxi ridesharing schemes, however, are not efficient and practical, due to they simply match ride requests and taxis based on partial trip information and omit the offline passengers, who hail a taxi at roadside with no explicit requests to the system. In this paper, we consider the mobility-aware taxi ridesharing problem, and present mT- Share to address these limitations. mT-Share fully exploits the mobility information of ride requests and taxis to achieve efficient indexing of taxis/requests and better passenger-taxi matching, while still satisfying the constraints on passengers’ deadlines and taxis’ capacities. Specifically, mT-Share indexes taxis and ride requests with both geographical information and travel directions, and supports the shortest path based routing and probabilistic routing to serve both online and offline ride requests. Extensive experiments with a large real-world taxi dataset demonstrate the efficiency and effectiveness of mT-Share, which can response each ride request in milliseconds and with a moderate detour cost. Compared to state-of-the-art methods, mT-Share serves 42% and 62% more ride requests in peak and non-peak hours, respectively.
{"title":"Mobility-Aware Dynamic Taxi Ridesharing","authors":"Zhidan Liu, Zengyang Gong, Jiangzhou Li, Kaishun Wu","doi":"10.1109/ICDE48307.2020.00088","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00088","url":null,"abstract":"Taxi ridesharing becomes promising and attractive because of the wide availability of taxis in a city and tremendous benefits of ridesharing, e.g., alleviating traffic congestion and reducing energy consumption. Existing taxi ridesharing schemes, however, are not efficient and practical, due to they simply match ride requests and taxis based on partial trip information and omit the offline passengers, who hail a taxi at roadside with no explicit requests to the system. In this paper, we consider the mobility-aware taxi ridesharing problem, and present mT- Share to address these limitations. mT-Share fully exploits the mobility information of ride requests and taxis to achieve efficient indexing of taxis/requests and better passenger-taxi matching, while still satisfying the constraints on passengers’ deadlines and taxis’ capacities. Specifically, mT-Share indexes taxis and ride requests with both geographical information and travel directions, and supports the shortest path based routing and probabilistic routing to serve both online and offline ride requests. Extensive experiments with a large real-world taxi dataset demonstrate the efficiency and effectiveness of mT-Share, which can response each ride request in milliseconds and with a moderate detour cost. Compared to state-of-the-art methods, mT-Share serves 42% and 62% more ride requests in peak and non-peak hours, respectively.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"961-972"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72722056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00025
Qi Zhang, Ronghua Li, Qixuan Yang, Guoren Wang, Lu Qin
The structural diversity of an edge, which is measured by the number of connected components of the edge’s ego-network, has recently been recognized as a key metric for analyzing social influence and information diffusion in social networks. Given this, an important problem in social network analysis is to identify top-k edges that have the highest structural diversities. In this work, we for the first time perform a systematical study for the top-k edge structural diversity search problem on large graphs. Specifically, we first develop a new online search framework with two basic upper-bounding rules to efficiently solve this problem. Then, we propose a new index structure using near-linear space to process the top-k edge structural diversity search in near-optimal time. To create such an index structure, we devise an efficient algorithm based on an interesting connection between our problem and the 4-clique enumeration problem. In addition, we also propose efficient index maintenance techniques to handle dynamic graphs. The results of extensive experiments on five large real-life datasets demonstrate the efficiency, scalability, and effectiveness of our algorithms.
{"title":"Efficient Top-k Edge Structural Diversity Search","authors":"Qi Zhang, Ronghua Li, Qixuan Yang, Guoren Wang, Lu Qin","doi":"10.1109/ICDE48307.2020.00025","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00025","url":null,"abstract":"The structural diversity of an edge, which is measured by the number of connected components of the edge’s ego-network, has recently been recognized as a key metric for analyzing social influence and information diffusion in social networks. Given this, an important problem in social network analysis is to identify top-k edges that have the highest structural diversities. In this work, we for the first time perform a systematical study for the top-k edge structural diversity search problem on large graphs. Specifically, we first develop a new online search framework with two basic upper-bounding rules to efficiently solve this problem. Then, we propose a new index structure using near-linear space to process the top-k edge structural diversity search in near-optimal time. To create such an index structure, we devise an efficient algorithm based on an interesting connection between our problem and the 4-clique enumeration problem. In addition, we also propose efficient index maintenance techniques to handle dynamic graphs. The results of extensive experiments on five large real-life datasets demonstrate the efficiency, scalability, and effectiveness of our algorithms.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"56 1","pages":"205-216"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74990275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00236
Pengfei Li, Hua Lu, Qian Zheng, Shijian Li, Gang Pan
This study explores the problem of co-location judgement, i.e., to decide whether two Twitter users are co-located at some point-of-interest (POI). We extract novel features, named HisRect, from users’ historical visits and recent tweets: The former has impact on where a user visits in general, whereas the latter gives more hints about where a user is currently. To alleviate the issue of data scarcity, a semi-supervised learning (SSL) framework is designed to extract HisRect features. Moreover, we use an embedding neural network layer to decide co-location based on the difference between two users’ His-Rect features. Extensive experiments on real Twitter data suggest that our HisRect features and SSL framework are highly effective at deciding co-locations.
{"title":"HisRect: Features from Historical Visits and Recent Tweet for Co-Location Judgement","authors":"Pengfei Li, Hua Lu, Qian Zheng, Shijian Li, Gang Pan","doi":"10.1109/ICDE48307.2020.00236","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00236","url":null,"abstract":"This study explores the problem of co-location judgement, i.e., to decide whether two Twitter users are co-located at some point-of-interest (POI). We extract novel features, named HisRect, from users’ historical visits and recent tweets: The former has impact on where a user visits in general, whereas the latter gives more hints about where a user is currently. To alleviate the issue of data scarcity, a semi-supervised learning (SSL) framework is designed to extract HisRect features. Moreover, we use an embedding neural network layer to decide co-location based on the difference between two users’ His-Rect features. Extensive experiments on real Twitter data suggest that our HisRect features and SSL framework are highly effective at deciding co-locations.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"21 1","pages":"2034-2035"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77428164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00091
Lisi Chen, Shuo Shang, Christian S. Jensen, Bin Yao, Panos Kalnis
Matching similar pairs of trajectories, called trajectory similarity join, is a fundamental functionality in spatial data management. We consider the problem of semantic trajectory similarity join (STS-Join). Each semantic trajectory is a sequence of Points-of-interest (POIs) with both location and text information. Thus, given two sets of semantic trajectories and a threshold θ, the STS-Join returns all pairs of semantic trajectories from the two sets with spatio-textual similarity no less than θ. This join targets applications such as term-based trajectory near-duplicate detection, geo-text data cleaning, personalized ridesharing recommendation, keyword-aware route planning, and travel itinerary recommendation.With these applications in mind, we provide a purposeful definition of spatio-textual similarity. To enable efficient STS-Join processing on large sets of semantic trajectories, we develop trajectory pair filtering techniques and consider the parallel processing capabilities of modern processors. Specifically, we present a two-phase parallel search algorithm. We first group semantic trajectories based on their text information. The algorithm’s per-group searches are independent of each other and thus can be performed in parallel. For each group, the trajectories are further partitioned based on the spatial domain. We generate spatial and textual summaries for each trajectory batch, based on which we develop batch filtering and trajectory-batch filtering techniques to prune unqualified trajectory pairs in a batch mode. Additionally, we propose an efficient divide-and-conquer algorithm to derive bounds of spatial similarity and textual similarity between two semantic trajectories, which enable us prune dissimilar trajectory pairs without the need of computing the exact value of spatio-textual similarity. Experimental study with large semantic trajectory data confirms that our algorithm of processing semantic trajectory join is capable of outperforming our well-designed baseline by a factor of 8–12.
{"title":"Parallel Semantic Trajectory Similarity Join","authors":"Lisi Chen, Shuo Shang, Christian S. Jensen, Bin Yao, Panos Kalnis","doi":"10.1109/ICDE48307.2020.00091","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00091","url":null,"abstract":"Matching similar pairs of trajectories, called trajectory similarity join, is a fundamental functionality in spatial data management. We consider the problem of semantic trajectory similarity join (STS-Join). Each semantic trajectory is a sequence of Points-of-interest (POIs) with both location and text information. Thus, given two sets of semantic trajectories and a threshold θ, the STS-Join returns all pairs of semantic trajectories from the two sets with spatio-textual similarity no less than θ. This join targets applications such as term-based trajectory near-duplicate detection, geo-text data cleaning, personalized ridesharing recommendation, keyword-aware route planning, and travel itinerary recommendation.With these applications in mind, we provide a purposeful definition of spatio-textual similarity. To enable efficient STS-Join processing on large sets of semantic trajectories, we develop trajectory pair filtering techniques and consider the parallel processing capabilities of modern processors. Specifically, we present a two-phase parallel search algorithm. We first group semantic trajectories based on their text information. The algorithm’s per-group searches are independent of each other and thus can be performed in parallel. For each group, the trajectories are further partitioned based on the spatial domain. We generate spatial and textual summaries for each trajectory batch, based on which we develop batch filtering and trajectory-batch filtering techniques to prune unqualified trajectory pairs in a batch mode. Additionally, we propose an efficient divide-and-conquer algorithm to derive bounds of spatial similarity and textual similarity between two semantic trajectories, which enable us prune dissimilar trajectory pairs without the need of computing the exact value of spatio-textual similarity. Experimental study with large semantic trajectory data confirms that our algorithm of processing semantic trajectory join is capable of outperforming our well-designed baseline by a factor of 8–12.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"40 1","pages":"997-1008"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74033183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00228
Sin G. Teo, Jianneng Cao, V. Lee
Secure multi-party computation (SMC) allows parties to jointly compute a function over their inputs, while keeping every input confidential. SMC has been extensively applied in tasks with privacy requirements, such as privacy-preserving data mining (PPDM), to learn task output and at the same time protect input data privacy. However, existing SMC-based solutions are ad-hoc – they are proposed for specific applications, and thus cannot be applied to other applications directly. To address this issue, we propose a privacy model DAG (Directed Acyclic Graph) that consists of a set of fundamental secure operators (e.g., +, −, ×, /, and power). Our model is general – its operators, if pipelined together, can implement various functions, even complicated ones. The experimental results also show that our DAG model can run in acceptable time.
{"title":"DAG: A General Model for Privacy-Preserving Data Mining : (Extended Abstract)","authors":"Sin G. Teo, Jianneng Cao, V. Lee","doi":"10.1109/ICDE48307.2020.00228","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00228","url":null,"abstract":"Secure multi-party computation (SMC) allows parties to jointly compute a function over their inputs, while keeping every input confidential. SMC has been extensively applied in tasks with privacy requirements, such as privacy-preserving data mining (PPDM), to learn task output and at the same time protect input data privacy. However, existing SMC-based solutions are ad-hoc – they are proposed for specific applications, and thus cannot be applied to other applications directly. To address this issue, we propose a privacy model DAG (Directed Acyclic Graph) that consists of a set of fundamental secure operators (e.g., +, −, ×, /, and power). Our model is general – its operators, if pipelined together, can implement various functions, even complicated ones. The experimental results also show that our DAG model can run in acceptable time.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"14 1","pages":"2018-2019"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81193458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00016
Wenlu Wang, Yingtao Tian, Haixun Wang, Wei-Shinn Ku
Relational database management systems (RDBMSs) are powerful because they are able to optimize and execute queries against relational databases. However, when it comes to NLIDB (natural language interface for databases), the entire system is often custom-made for a particular database. Overcoming the complexity and expressiveness of natural languages so that a single NLI can support a variety of databases is an unsolved problem. In this work, we show that it is possible to separate data specific components from latent semantic structures in expressing relational queries in a natural language. With the separation, transferring an NLI from one database to another becomes possible. We develop a neural network classifier to detect data specific components and an adversarial mechanism to locate them in a natural language question. We then introduce a general purpose transfer-learnable NLI that focuses on the latent semantic structure. We devise a deep sequence model that translates the latent semantic structure to an SQL query. Experiments show that our approach outperforms previous NLI methods on the WikiSQL [49] dataset, and the model we learned can be applied to other benchmark datasets without retraining.
{"title":"A Natural Language Interface for Database: Achieving Transfer-learnability Using Adversarial Method for Question Understanding","authors":"Wenlu Wang, Yingtao Tian, Haixun Wang, Wei-Shinn Ku","doi":"10.1109/ICDE48307.2020.00016","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00016","url":null,"abstract":"Relational database management systems (RDBMSs) are powerful because they are able to optimize and execute queries against relational databases. However, when it comes to NLIDB (natural language interface for databases), the entire system is often custom-made for a particular database. Overcoming the complexity and expressiveness of natural languages so that a single NLI can support a variety of databases is an unsolved problem. In this work, we show that it is possible to separate data specific components from latent semantic structures in expressing relational queries in a natural language. With the separation, transferring an NLI from one database to another becomes possible. We develop a neural network classifier to detect data specific components and an adversarial mechanism to locate them in a natural language question. We then introduce a general purpose transfer-learnable NLI that focuses on the latent semantic structure. We devise a deep sequence model that translates the latent semantic structure to an SQL query. Experiments show that our approach outperforms previous NLI methods on the WikiSQL [49] dataset, and the model we learned can be applied to other benchmark datasets without retraining.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"7 1","pages":"97-108"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86376204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00237
Tenindra Abeywickrama, M. A. Cheema, Arijit Khan
Given the prevalence and volume of local search queries, today’s search engines are required to find results by both spatial proximity and textual relevance at high query throughput. Existing techniques to answer such spatial keyword queries employ a keyword aggregation strategy that suffers from certain drawbacks when applied to road networks. Instead, we propose the K-SPIN framework, which uses an alternative keyword separation strategy that is more suitable on road networks. While this strategy was previously thought to entail prohibitive pre-processing costs, we further propose novel techniques to make our framework viable and even light-weight. Thorough experimentation shows that K-SPIN outperforms the state-of-the-art by up to two orders of magnitude on a wide range of settings and real-world datasets.
{"title":"K-SPIN: Efficiently Processing Spatial Keyword Queries on Road Networks : (Extended Abstract)","authors":"Tenindra Abeywickrama, M. A. Cheema, Arijit Khan","doi":"10.1109/ICDE48307.2020.00237","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00237","url":null,"abstract":"Given the prevalence and volume of local search queries, today’s search engines are required to find results by both spatial proximity and textual relevance at high query throughput. Existing techniques to answer such spatial keyword queries employ a keyword aggregation strategy that suffers from certain drawbacks when applied to road networks. Instead, we propose the K-SPIN framework, which uses an alternative keyword separation strategy that is more suitable on road networks. While this strategy was previously thought to entail prohibitive pre-processing costs, we further propose novel techniques to make our framework viable and even light-weight. Thorough experimentation shows that K-SPIN outperforms the state-of-the-art by up to two orders of magnitude on a wide range of settings and real-world datasets.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"69 2 1","pages":"2036-2037"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86437181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00022
Kazutoshi Umemoto, T. Milo, M. Kitsuregawa
How can recommender systems help people improve their skills? As a first step toward recommendation for the upskilling of users, this paper addresses the problems of modeling the improvement of user skills and the difficulty of items in action sequences where users select items at different times. We propose a progression model that uses latent variables to learn the monotonically non-decreasing progression of user skills. Once this model is trained with the given sequence data, we leverage it to find a statistical solution to the item difficulty estimation problem, where we assume that users usually select items within their skill capacity. Experiments on five datasets (four from real domains, and one generated synthetically) revealed that (1) our model successfully captured the progression of domain-dependent skills; (2) multi-faceted item features helped to learn better models that aligned well with the ground-truth skill and difficulty levels in the synthetic dataset; (3) the learned models were practically useful to predict items and ratings in action sequences; and (4) exploiting the dependency structure of our skill model for parallel computation made the training process more efficient.
{"title":"Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Difficulty in Action Sequences","authors":"Kazutoshi Umemoto, T. Milo, M. Kitsuregawa","doi":"10.1109/ICDE48307.2020.00022","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00022","url":null,"abstract":"How can recommender systems help people improve their skills? As a first step toward recommendation for the upskilling of users, this paper addresses the problems of modeling the improvement of user skills and the difficulty of items in action sequences where users select items at different times. We propose a progression model that uses latent variables to learn the monotonically non-decreasing progression of user skills. Once this model is trained with the given sequence data, we leverage it to find a statistical solution to the item difficulty estimation problem, where we assume that users usually select items within their skill capacity. Experiments on five datasets (four from real domains, and one generated synthetically) revealed that (1) our model successfully captured the progression of domain-dependent skills; (2) multi-faceted item features helped to learn better models that aligned well with the ground-truth skill and difficulty levels in the synthetic dataset; (3) the learned models were practically useful to predict items and ratings in action sequences; and (4) exploiting the dependency structure of our skill model for parallel computation made the training process more efficient.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"19 1","pages":"169-180"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86634860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data-driven recommender systems that can help to predict users’ preferences are deployed in many real online service platforms. Several studies show that they are vulnerable to data poisoning attacks, and attackers have the ability to mislead the system to perform as their desires. Considering the realistic scenario, where the recommender system is usually a black-box for attackers and complex algorithms may be deployed in them, how to learn effective attack strategies on such recommender systems is still an under-explored problem. In this paper, we propose an adaptive data poisoning framework, PoisonRec, which can automatically learn effective attack strategies on various recommender systems with very limited knowledge. PoisonRec leverages the reinforcement learning architecture, in which an attack agent actively injects fake data (user behaviors) into the recommender system, and then can improve its attack strategies through reward signals that are available under the strict black-box setting. Specifically, we model the attack behavior trajectory as the Markov Decision Process (MDP) in reinforcement learning. We also design a Biased Complete Binary Tree (BCBT) to reformulate the action space for better attack performance. We adopt 8 widely-used representative recommendation algorithms as our testbeds, and make extensive experiments on 4 different real-world datasets. The results show that PoisonRec has the ability to achieve good attack performance on various recommender systems with limited knowledge.
{"title":"PoisonRec: An Adaptive Data Poisoning Framework for Attacking Black-box Recommender Systems","authors":"Junshuai Song, Zhao Li, Zehong Hu, Yucheng Wu, Zhenpeng Li, Jian Li, Jun Gao","doi":"10.1109/ICDE48307.2020.00021","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00021","url":null,"abstract":"Data-driven recommender systems that can help to predict users’ preferences are deployed in many real online service platforms. Several studies show that they are vulnerable to data poisoning attacks, and attackers have the ability to mislead the system to perform as their desires. Considering the realistic scenario, where the recommender system is usually a black-box for attackers and complex algorithms may be deployed in them, how to learn effective attack strategies on such recommender systems is still an under-explored problem. In this paper, we propose an adaptive data poisoning framework, PoisonRec, which can automatically learn effective attack strategies on various recommender systems with very limited knowledge. PoisonRec leverages the reinforcement learning architecture, in which an attack agent actively injects fake data (user behaviors) into the recommender system, and then can improve its attack strategies through reward signals that are available under the strict black-box setting. Specifically, we model the attack behavior trajectory as the Markov Decision Process (MDP) in reinforcement learning. We also design a Biased Complete Binary Tree (BCBT) to reformulate the action space for better attack performance. We adopt 8 widely-used representative recommendation algorithms as our testbeds, and make extensive experiments on 4 different real-world datasets. The results show that PoisonRec has the ability to achieve good attack performance on various recommender systems with limited knowledge.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"23 1","pages":"157-168"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88396393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00023
Chen Zhang, Fan Zhang, W. Zhang, Boge Liu, Ying Zhang, Lu Qin, Xuemin Lin
In this paper, we propose and study a novel cohesive subgraph model, named (k,p)-core, which is a maximal subgraph where each vertex has at least k neighbours and at least p fraction of its neighbours in the subgraph. The model is motivated by the finding that each user in a community should have at least a certain fraction p of neighbors inside the community to ensure user engagement, especially for users with large degrees. Meanwhile, the uniform degree constraint k, as applied in the k-core model, guarantees a minimum level of user engagement in a community, and is especially effective for users with small degrees. We propose an O(m) algorithm to compute a (k,p)-core with given k and p, and an O(dm) algorithm to decompose a graph by (k,p)-core, where m is the number of edges in the graph G and d is the degeneracy of G. A space efficient index is designed for time-optimal (k,p)-core query processing. Novel techniques are proposed for the maintenance of (k,p)-core index against graph dynamic. Extensive experiments on 8 reallife datasets demonstrate that our (k,p)-core model is effective and the algorithms are efficient.
{"title":"Exploring Finer Granularity within the Cores: Efficient (k,p)-Core Computation","authors":"Chen Zhang, Fan Zhang, W. Zhang, Boge Liu, Ying Zhang, Lu Qin, Xuemin Lin","doi":"10.1109/ICDE48307.2020.00023","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00023","url":null,"abstract":"In this paper, we propose and study a novel cohesive subgraph model, named (k,p)-core, which is a maximal subgraph where each vertex has at least k neighbours and at least p fraction of its neighbours in the subgraph. The model is motivated by the finding that each user in a community should have at least a certain fraction p of neighbors inside the community to ensure user engagement, especially for users with large degrees. Meanwhile, the uniform degree constraint k, as applied in the k-core model, guarantees a minimum level of user engagement in a community, and is especially effective for users with small degrees. We propose an O(m) algorithm to compute a (k,p)-core with given k and p, and an O(dm) algorithm to decompose a graph by (k,p)-core, where m is the number of edges in the graph G and d is the degeneracy of G. A space efficient index is designed for time-optimal (k,p)-core query processing. Novel techniques are proposed for the maintenance of (k,p)-core index against graph dynamic. Extensive experiments on 8 reallife datasets demonstrate that our (k,p)-core model is effective and the algorithms are efficient.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"14 1","pages":"181-192"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77833439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}