2020 IEEE 36th International Conference on Data Engineering (ICDE)最新文献

英文中文

Mobility-Aware Dynamic Taxi Ridesharing 移动感知的动态出租车拼车

2020 IEEE 36th International Conference on Data Engineering (ICDE)

Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00088

Zhidan Liu, Zengyang Gong, Jiangzhou Li, Kaishun Wu

Taxi ridesharing becomes promising and attractive because of the wide availability of taxis in a city and tremendous benefits of ridesharing, e.g., alleviating traffic congestion and reducing energy consumption. Existing taxi ridesharing schemes, however, are not efficient and practical, due to they simply match ride requests and taxis based on partial trip information and omit the offline passengers, who hail a taxi at roadside with no explicit requests to the system. In this paper, we consider the mobility-aware taxi ridesharing problem, and present mT- Share to address these limitations. mT-Share fully exploits the mobility information of ride requests and taxis to achieve efficient indexing of taxis/requests and better passenger-taxi matching, while still satisfying the constraints on passengers’ deadlines and taxis’ capacities. Specifically, mT-Share indexes taxis and ride requests with both geographical information and travel directions, and supports the shortest path based routing and probabilistic routing to serve both online and offline ride requests. Extensive experiments with a large real-world taxi dataset demonstrate the efficiency and effectiveness of mT-Share, which can response each ride request in milliseconds and with a moderate detour cost. Compared to state-of-the-art methods, mT-Share serves 42% and 62% more ride requests in peak and non-peak hours, respectively.

出租车共乘变得很有前途和吸引力，因为出租车在城市里随处可见，而且共乘带来了巨大的好处，例如缓解交通拥堵和降低能源消耗。然而，现有的出租车拼车方案并不高效和实用，因为它们只是根据部分行程信息匹配乘车请求和出租车，而忽略了离线乘客，这些乘客在没有明确要求系统的情况下在路边叫出租车。在本文中，我们考虑了机动性感知的出租车拼车问题，并提出了mT- Share来解决这些限制。mT-Share充分利用了乘车请求和出租车的出行信息，实现了高效的出租车/请求索引和更好的乘客-出租车匹配，同时仍然满足乘客截止时间和出租车容量的约束。具体来说，mT-Share基于地理信息和出行方向对出租车和乘车请求进行索引，并支持基于最短路径的路由和概率路由，以满足在线和离线乘车请求。使用大型真实出租车数据集进行的大量实验证明了mT-Share的效率和有效性，它可以在毫秒内响应每个乘车请求，并且绕行成本适中。与最先进的方法相比，mT-Share在高峰和非高峰时段分别多处理42%和62%的乘车请求。

{"title":"Mobility-Aware Dynamic Taxi Ridesharing","authors":"Zhidan Liu, Zengyang Gong, Jiangzhou Li, Kaishun Wu","doi":"10.1109/ICDE48307.2020.00088","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00088","url":null,"abstract":"Taxi ridesharing becomes promising and attractive because of the wide availability of taxis in a city and tremendous benefits of ridesharing, e.g., alleviating traffic congestion and reducing energy consumption. Existing taxi ridesharing schemes, however, are not efficient and practical, due to they simply match ride requests and taxis based on partial trip information and omit the offline passengers, who hail a taxi at roadside with no explicit requests to the system. In this paper, we consider the mobility-aware taxi ridesharing problem, and present mT- Share to address these limitations. mT-Share fully exploits the mobility information of ride requests and taxis to achieve efficient indexing of taxis/requests and better passenger-taxi matching, while still satisfying the constraints on passengers’ deadlines and taxis’ capacities. Specifically, mT-Share indexes taxis and ride requests with both geographical information and travel directions, and supports the shortest path based routing and probabilistic routing to serve both online and offline ride requests. Extensive experiments with a large real-world taxi dataset demonstrate the efficiency and effectiveness of mT-Share, which can response each ride request in milliseconds and with a moderate detour cost. Compared to state-of-the-art methods, mT-Share serves 42% and 62% more ride requests in peak and non-peak hours, respectively.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"961-972"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72722056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Efficient Top-k Edge Structural Diversity Search 高效Top-k边结构多样性搜索

2020 IEEE 36th International Conference on Data Engineering (ICDE)

Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00025

Qi Zhang, Ronghua Li, Qixuan Yang, Guoren Wang, Lu Qin

The structural diversity of an edge, which is measured by the number of connected components of the edge’s ego-network, has recently been recognized as a key metric for analyzing social influence and information diffusion in social networks. Given this, an important problem in social network analysis is to identify top-k edges that have the highest structural diversities. In this work, we for the first time perform a systematical study for the top-k edge structural diversity search problem on large graphs. Specifically, we first develop a new online search framework with two basic upper-bounding rules to efficiently solve this problem. Then, we propose a new index structure using near-linear space to process the top-k edge structural diversity search in near-optimal time. To create such an index structure, we devise an efficient algorithm based on an interesting connection between our problem and the 4-clique enumeration problem. In addition, we also propose efficient index maintenance techniques to handle dynamic graphs. The results of extensive experiments on five large real-life datasets demonstrate the efficiency, scalability, and effectiveness of our algorithms.

边缘的结构多样性是通过边缘自我网络的连接组成部分的数量来衡量的，最近被认为是分析社会网络中社会影响和信息扩散的关键指标。鉴于此，社会网络分析中的一个重要问题是识别具有最高结构多样性的top-k边。在这项工作中，我们首次对大图上的top-k边结构多样性搜索问题进行了系统的研究。具体来说，我们首先开发了一个带有两个基本上限规则的新的在线搜索框架来有效地解决这个问题。然后，我们提出了一种新的索引结构，利用近线性空间在近最优时间内处理top-k边结构多样性搜索。为了创建这样一个索引结构，我们基于我们的问题和4团枚举问题之间的有趣联系设计了一个有效的算法。此外，我们还提出了有效的索引维护技术来处理动态图。在五个大型真实数据集上进行的大量实验结果证明了我们的算法的效率、可扩展性和有效性。

{"title":"Efficient Top-k Edge Structural Diversity Search","authors":"Qi Zhang, Ronghua Li, Qixuan Yang, Guoren Wang, Lu Qin","doi":"10.1109/ICDE48307.2020.00025","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00025","url":null,"abstract":"The structural diversity of an edge, which is measured by the number of connected components of the edge’s ego-network, has recently been recognized as a key metric for analyzing social influence and information diffusion in social networks. Given this, an important problem in social network analysis is to identify top-k edges that have the highest structural diversities. In this work, we for the first time perform a systematical study for the top-k edge structural diversity search problem on large graphs. Specifically, we first develop a new online search framework with two basic upper-bounding rules to efficiently solve this problem. Then, we propose a new index structure using near-linear space to process the top-k edge structural diversity search in near-optimal time. To create such an index structure, we devise an efficient algorithm based on an interesting connection between our problem and the 4-clique enumeration problem. In addition, we also propose efficient index maintenance techniques to handle dynamic graphs. The results of extensive experiments on five large real-life datasets demonstrate the efficiency, scalability, and effectiveness of our algorithms.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"56 1","pages":"205-216"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74990275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

HisRect: Features from Historical Visits and Recent Tweet for Co-Location Judgement HisRect:来自历史访问和最近推文的特征，用于共同定位判断

2020 IEEE 36th International Conference on Data Engineering (ICDE)

Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00236

Pengfei Li, Hua Lu, Qian Zheng, Shijian Li, Gang Pan

This study explores the problem of co-location judgement, i.e., to decide whether two Twitter users are co-located at some point-of-interest (POI). We extract novel features, named HisRect, from users’ historical visits and recent tweets: The former has impact on where a user visits in general, whereas the latter gives more hints about where a user is currently. To alleviate the issue of data scarcity, a semi-supervised learning (SSL) framework is designed to extract HisRect features. Moreover, we use an embedding neural network layer to decide co-location based on the difference between two users’ His-Rect features. Extensive experiments on real Twitter data suggest that our HisRect features and SSL framework are highly effective at deciding co-locations.

本研究探讨了共同定位判断的问题，即决定两个Twitter用户是否在某个兴趣点(POI)共同定位。我们从用户的历史访问和最近的推文中提取新的特征，称为HisRect:前者通常影响用户访问的位置，而后者提供了更多关于用户当前位置的提示。为了缓解数据稀缺的问题，设计了一个半监督学习(SSL)框架来提取HisRect特征。此外，我们使用嵌入神经网络层根据两个用户的His-Rect特征之间的差异来决定共定位。在真实Twitter数据上进行的大量实验表明，我们的HisRect特性和SSL框架在决定托管位置方面非常有效。

引用次数: 0

Parallel Semantic Trajectory Similarity Join 并行语义轨迹相似度连接

2020 IEEE 36th International Conference on Data Engineering (ICDE)

Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00091

Lisi Chen, Shuo Shang, Christian S. Jensen, Bin Yao, Panos Kalnis

Matching similar pairs of trajectories, called trajectory similarity join, is a fundamental functionality in spatial data management. We consider the problem of semantic trajectory similarity join (STS-Join). Each semantic trajectory is a sequence of Points-of-interest (POIs) with both location and text information. Thus, given two sets of semantic trajectories and a threshold θ, the STS-Join returns all pairs of semantic trajectories from the two sets with spatio-textual similarity no less than θ. This join targets applications such as term-based trajectory near-duplicate detection, geo-text data cleaning, personalized ridesharing recommendation, keyword-aware route planning, and travel itinerary recommendation.With these applications in mind, we provide a purposeful definition of spatio-textual similarity. To enable efficient STS-Join processing on large sets of semantic trajectories, we develop trajectory pair filtering techniques and consider the parallel processing capabilities of modern processors. Specifically, we present a two-phase parallel search algorithm. We first group semantic trajectories based on their text information. The algorithm’s per-group searches are independent of each other and thus can be performed in parallel. For each group, the trajectories are further partitioned based on the spatial domain. We generate spatial and textual summaries for each trajectory batch, based on which we develop batch filtering and trajectory-batch filtering techniques to prune unqualified trajectory pairs in a batch mode. Additionally, we propose an efficient divide-and-conquer algorithm to derive bounds of spatial similarity and textual similarity between two semantic trajectories, which enable us prune dissimilar trajectory pairs without the need of computing the exact value of spatio-textual similarity. Experimental study with large semantic trajectory data confirms that our algorithm of processing semantic trajectory join is capable of outperforming our well-designed baseline by a factor of 8–12.

轨迹相似性连接是空间数据管理中的一项基本功能。我们考虑了语义轨迹相似连接(STS-Join)问题。每个语义轨迹都是包含位置和文本信息的兴趣点(poi)序列。因此，给定两组语义轨迹和一个阈值θ， STS-Join返回两组空间文本相似度不小于θ的所有语义轨迹对。该联盟的目标应用包括基于术语的轨迹近重复检测、地理文本数据清理、个性化拼车推荐、关键字感知路线规划和旅行行程推荐。考虑到这些应用，我们提供了一个有目的的空间文本相似性定义。为了在大型语义轨迹集上实现高效的STS-Join处理，我们开发了轨迹对过滤技术，并考虑了现代处理器的并行处理能力。具体来说，我们提出了一种两阶段并行搜索算法。我们首先根据文本信息对语义轨迹进行分组。该算法的每组搜索是相互独立的，因此可以并行执行。对于每一组，基于空间域进一步划分轨迹。我们生成了每批轨迹的空间和文本摘要，在此基础上，我们开发了批过滤和轨迹批过滤技术，以批方式修剪不合格的轨迹对。此外，我们提出了一种高效的分治算法来推导两个语义轨迹之间的空间相似度和文本相似度的边界，使我们能够在不需要计算空间文本相似度的精确值的情况下修剪不同的轨迹对。大量语义轨迹数据的实验研究证实，我们处理语义轨迹连接的算法能够比我们精心设计的基线性能高出8-12倍。

{"title":"Parallel Semantic Trajectory Similarity Join","authors":"Lisi Chen, Shuo Shang, Christian S. Jensen, Bin Yao, Panos Kalnis","doi":"10.1109/ICDE48307.2020.00091","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00091","url":null,"abstract":"Matching similar pairs of trajectories, called trajectory similarity join, is a fundamental functionality in spatial data management. We consider the problem of semantic trajectory similarity join (STS-Join). Each semantic trajectory is a sequence of Points-of-interest (POIs) with both location and text information. Thus, given two sets of semantic trajectories and a threshold θ, the STS-Join returns all pairs of semantic trajectories from the two sets with spatio-textual similarity no less than θ. This join targets applications such as term-based trajectory near-duplicate detection, geo-text data cleaning, personalized ridesharing recommendation, keyword-aware route planning, and travel itinerary recommendation.With these applications in mind, we provide a purposeful definition of spatio-textual similarity. To enable efficient STS-Join processing on large sets of semantic trajectories, we develop trajectory pair filtering techniques and consider the parallel processing capabilities of modern processors. Specifically, we present a two-phase parallel search algorithm. We first group semantic trajectories based on their text information. The algorithm’s per-group searches are independent of each other and thus can be performed in parallel. For each group, the trajectories are further partitioned based on the spatial domain. We generate spatial and textual summaries for each trajectory batch, based on which we develop batch filtering and trajectory-batch filtering techniques to prune unqualified trajectory pairs in a batch mode. Additionally, we propose an efficient divide-and-conquer algorithm to derive bounds of spatial similarity and textual similarity between two semantic trajectories, which enable us prune dissimilar trajectory pairs without the need of computing the exact value of spatio-textual similarity. Experimental study with large semantic trajectory data confirms that our algorithm of processing semantic trajectory join is capable of outperforming our well-designed baseline by a factor of 8–12.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"40 1","pages":"997-1008"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74033183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

DAG: A General Model for Privacy-Preserving Data Mining : (Extended Abstract) DAG:一种保护隐私数据挖掘的通用模型(扩展摘要)

2020 IEEE 36th International Conference on Data Engineering (ICDE)

Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00228

Sin G. Teo, Jianneng Cao, V. Lee

Secure multi-party computation (SMC) allows parties to jointly compute a function over their inputs, while keeping every input confidential. SMC has been extensively applied in tasks with privacy requirements, such as privacy-preserving data mining (PPDM), to learn task output and at the same time protect input data privacy. However, existing SMC-based solutions are ad-hoc – they are proposed for specific applications, and thus cannot be applied to other applications directly. To address this issue, we propose a privacy model DAG (Directed Acyclic Graph) that consists of a set of fundamental secure operators (e.g., +, −, ×, /, and power). Our model is general – its operators, if pipelined together, can implement various functions, even complicated ones. The experimental results also show that our DAG model can run in acceptable time.

安全多方计算(SMC)允许各方根据其输入共同计算一个函数，同时对每个输入保密。SMC被广泛应用于具有隐私要求的任务中，如隐私保护数据挖掘(PPDM)，在学习任务输出的同时保护输入数据的隐私。然而，现有的基于smc的解决方案是特别的——它们是为特定的应用程序提出的，因此不能直接应用于其他应用程序。为了解决这个问题，我们提出了一个隐私模型DAG(有向无环图)，它由一组基本安全算子(例如，+，−，x， /和power)组成。我们的模型是通用的——它的操作符，如果流水线在一起，可以实现各种功能，甚至复杂的功能。实验结果也表明，我们的DAG模型可以在可接受的时间内运行。

引用次数: 2

A Natural Language Interface for Database: Achieving Transfer-learnability Using Adversarial Method for Question Understanding 数据库的自然语言接口:用对抗性方法实现问题理解的迁移可学习性

2020 IEEE 36th International Conference on Data Engineering (ICDE)

Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00016

Wenlu Wang, Yingtao Tian, Haixun Wang, Wei-Shinn Ku

Relational database management systems (RDBMSs) are powerful because they are able to optimize and execute queries against relational databases. However, when it comes to NLIDB (natural language interface for databases), the entire system is often custom-made for a particular database. Overcoming the complexity and expressiveness of natural languages so that a single NLI can support a variety of databases is an unsolved problem. In this work, we show that it is possible to separate data specific components from latent semantic structures in expressing relational queries in a natural language. With the separation, transferring an NLI from one database to another becomes possible. We develop a neural network classifier to detect data specific components and an adversarial mechanism to locate them in a natural language question. We then introduce a general purpose transfer-learnable NLI that focuses on the latent semantic structure. We devise a deep sequence model that translates the latent semantic structure to an SQL query. Experiments show that our approach outperforms previous NLI methods on the WikiSQL [49] dataset, and the model we learned can be applied to other benchmark datasets without retraining.

关系数据库管理系统(rdbms)功能强大，因为它们能够优化和执行针对关系数据库的查询。然而，当涉及到NLIDB(数据库的自然语言接口)时，整个系统通常是为特定数据库定制的。克服自然语言的复杂性和表达性，使单个NLI能够支持各种数据库是一个尚未解决的问题。在这项工作中，我们证明了在用自然语言表达关系查询时，将数据特定组件与潜在语义结构分离是可能的。通过分离，可以将NLI从一个数据库转移到另一个数据库。我们开发了一个神经网络分类器来检测数据的特定成分，并开发了一个对抗机制来定位它们在自然语言问题中。然后，我们介绍了一个通用的迁移可学习NLI，重点关注潜在的语义结构。我们设计了一个深度序列模型，将潜在的语义结构转换为SQL查询。实验表明，我们的方法在WikiSQL[49]数据集上优于以前的NLI方法，并且我们学习的模型可以应用于其他基准数据集而无需重新训练。

{"title":"A Natural Language Interface for Database: Achieving Transfer-learnability Using Adversarial Method for Question Understanding","authors":"Wenlu Wang, Yingtao Tian, Haixun Wang, Wei-Shinn Ku","doi":"10.1109/ICDE48307.2020.00016","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00016","url":null,"abstract":"Relational database management systems (RDBMSs) are powerful because they are able to optimize and execute queries against relational databases. However, when it comes to NLIDB (natural language interface for databases), the entire system is often custom-made for a particular database. Overcoming the complexity and expressiveness of natural languages so that a single NLI can support a variety of databases is an unsolved problem. In this work, we show that it is possible to separate data specific components from latent semantic structures in expressing relational queries in a natural language. With the separation, transferring an NLI from one database to another becomes possible. We develop a neural network classifier to detect data specific components and an adversarial mechanism to locate them in a natural language question. We then introduce a general purpose transfer-learnable NLI that focuses on the latent semantic structure. We devise a deep sequence model that translates the latent semantic structure to an SQL query. Experiments show that our approach outperforms previous NLI methods on the WikiSQL [49] dataset, and the model we learned can be applied to other benchmark datasets without retraining.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"7 1","pages":"97-108"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86376204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

K-SPIN: Efficiently Processing Spatial Keyword Queries on Road Networks : (Extended Abstract) K-SPIN:有效处理道路网络空间关键字查询(扩展摘要)

2020 IEEE 36th International Conference on Data Engineering (ICDE)

Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00237

Tenindra Abeywickrama, M. A. Cheema, Arijit Khan

Given the prevalence and volume of local search queries, today’s search engines are required to find results by both spatial proximity and textual relevance at high query throughput. Existing techniques to answer such spatial keyword queries employ a keyword aggregation strategy that suffers from certain drawbacks when applied to road networks. Instead, we propose the K-SPIN framework, which uses an alternative keyword separation strategy that is more suitable on road networks. While this strategy was previously thought to entail prohibitive pre-processing costs, we further propose novel techniques to make our framework viable and even light-weight. Thorough experimentation shows that K-SPIN outperforms the state-of-the-art by up to two orders of magnitude on a wide range of settings and real-world datasets.

考虑到本地搜索查询的普遍性和数量，今天的搜索引擎需要在高查询吞吐量下通过空间接近性和文本相关性来查找结果。现有的回答这种空间关键字查询的技术采用关键字聚合策略，当应用于道路网络时存在某些缺点。相反，我们提出了K-SPIN框架，它使用了一种更适合道路网络的替代关键字分离策略。虽然这种策略以前被认为需要高昂的预处理成本，但我们进一步提出了新的技术，使我们的框架可行，甚至是轻量级的。彻底的实验表明，在广泛的设置和现实世界的数据集上，K-SPIN比最先进的技术高出两个数量级。

引用次数: 0

Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Difficulty in Action Sequences 对技能提升的建议:动作序列中的建模技能改进和项目难度

2020 IEEE 36th International Conference on Data Engineering (ICDE)

Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00022

Kazutoshi Umemoto, T. Milo, M. Kitsuregawa

How can recommender systems help people improve their skills? As a first step toward recommendation for the upskilling of users, this paper addresses the problems of modeling the improvement of user skills and the difficulty of items in action sequences where users select items at different times. We propose a progression model that uses latent variables to learn the monotonically non-decreasing progression of user skills. Once this model is trained with the given sequence data, we leverage it to find a statistical solution to the item difficulty estimation problem, where we assume that users usually select items within their skill capacity. Experiments on five datasets (four from real domains, and one generated synthetically) revealed that (1) our model successfully captured the progression of domain-dependent skills; (2) multi-faceted item features helped to learn better models that aligned well with the ground-truth skill and difficulty levels in the synthetic dataset; (3) the learned models were practically useful to predict items and ratings in action sequences; and (4) exploiting the dependency structure of our skill model for parallel computation made the training process more efficient.

推荐系统如何帮助人们提高技能?作为推荐用户技能提升的第一步，本文解决了用户在不同时间选择项目的动作序列中用户技能提升和项目难度的建模问题。我们提出了一种使用潜变量来学习用户技能单调非递减进展的级数模型。一旦用给定的序列数据训练了这个模型，我们就利用它来找到项目难度估计问题的统计解决方案，我们假设用户通常在他们的技能能力范围内选择项目。在5个数据集(4个来自真实领域，1个来自合成领域)上的实验表明:(1)我们的模型成功捕获了领域依赖技能的进展;(2)多面项目特征有助于更好地学习与合成数据集中的真实技能和难度水平相匹配的模型;(3)学习到的模型在预测动作序列中的项目和评分方面具有实际应用价值;(4)利用技能模型的依赖结构进行并行计算，提高了训练过程的效率。

{"title":"Toward Recommendation for Upskilling: Modeling Skill Improvement and Item Difficulty in Action Sequences","authors":"Kazutoshi Umemoto, T. Milo, M. Kitsuregawa","doi":"10.1109/ICDE48307.2020.00022","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00022","url":null,"abstract":"How can recommender systems help people improve their skills? As a first step toward recommendation for the upskilling of users, this paper addresses the problems of modeling the improvement of user skills and the difficulty of items in action sequences where users select items at different times. We propose a progression model that uses latent variables to learn the monotonically non-decreasing progression of user skills. Once this model is trained with the given sequence data, we leverage it to find a statistical solution to the item difficulty estimation problem, where we assume that users usually select items within their skill capacity. Experiments on five datasets (four from real domains, and one generated synthetically) revealed that (1) our model successfully captured the progression of domain-dependent skills; (2) multi-faceted item features helped to learn better models that aligned well with the ground-truth skill and difficulty levels in the synthetic dataset; (3) the learned models were practically useful to predict items and ratings in action sequences; and (4) exploiting the dependency structure of our skill model for parallel computation made the training process more efficient.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"19 1","pages":"169-180"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86634860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

PoisonRec: An Adaptive Data Poisoning Framework for Attacking Black-box Recommender Systems PoisonRec:攻击黑盒推荐系统的自适应数据中毒框架

2020 IEEE 36th International Conference on Data Engineering (ICDE)

Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00021

Junshuai Song, Zhao Li, Zehong Hu, Yucheng Wu, Zhenpeng Li, Jian Li, Jun Gao

Data-driven recommender systems that can help to predict users’ preferences are deployed in many real online service platforms. Several studies show that they are vulnerable to data poisoning attacks, and attackers have the ability to mislead the system to perform as their desires. Considering the realistic scenario, where the recommender system is usually a black-box for attackers and complex algorithms may be deployed in them, how to learn effective attack strategies on such recommender systems is still an under-explored problem. In this paper, we propose an adaptive data poisoning framework, PoisonRec, which can automatically learn effective attack strategies on various recommender systems with very limited knowledge. PoisonRec leverages the reinforcement learning architecture, in which an attack agent actively injects fake data (user behaviors) into the recommender system, and then can improve its attack strategies through reward signals that are available under the strict black-box setting. Specifically, we model the attack behavior trajectory as the Markov Decision Process (MDP) in reinforcement learning. We also design a Biased Complete Binary Tree (BCBT) to reformulate the action space for better attack performance. We adopt 8 widely-used representative recommendation algorithms as our testbeds, and make extensive experiments on 4 different real-world datasets. The results show that PoisonRec has the ability to achieve good attack performance on various recommender systems with limited knowledge.

数据驱动的推荐系统，可以帮助预测用户的偏好，部署在许多真实的在线服务平台。一些研究表明，它们很容易受到数据中毒攻击，攻击者有能力误导系统按照他们的愿望执行。考虑到现实情况，对于攻击者来说，推荐系统通常是一个黑盒子，并且可能会部署复杂的算法，如何在这样的推荐系统上学习有效的攻击策略仍然是一个未被探索的问题。在本文中，我们提出了一个自适应数据中毒框架，PoisonRec，它可以在知识非常有限的情况下自动学习各种推荐系统的有效攻击策略。PoisonRec利用强化学习架构，攻击代理主动将虚假数据(用户行为)注入推荐系统，然后通过在严格的黑盒设置下可用的奖励信号来改进其攻击策略。具体来说，我们将攻击行为轨迹建模为强化学习中的马尔可夫决策过程(MDP)。我们还设计了一个有偏完全二叉树(bbct)来重新制定行动空间，以获得更好的攻击性能。我们采用了8种广泛使用的代表性推荐算法作为我们的测试平台，并在4个不同的真实数据集上进行了大量的实验。结果表明，PoisonRec能够在知识有限的各种推荐系统上取得良好的攻击性能。

{"title":"PoisonRec: An Adaptive Data Poisoning Framework for Attacking Black-box Recommender Systems","authors":"Junshuai Song, Zhao Li, Zehong Hu, Yucheng Wu, Zhenpeng Li, Jian Li, Jun Gao","doi":"10.1109/ICDE48307.2020.00021","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00021","url":null,"abstract":"Data-driven recommender systems that can help to predict users’ preferences are deployed in many real online service platforms. Several studies show that they are vulnerable to data poisoning attacks, and attackers have the ability to mislead the system to perform as their desires. Considering the realistic scenario, where the recommender system is usually a black-box for attackers and complex algorithms may be deployed in them, how to learn effective attack strategies on such recommender systems is still an under-explored problem. In this paper, we propose an adaptive data poisoning framework, PoisonRec, which can automatically learn effective attack strategies on various recommender systems with very limited knowledge. PoisonRec leverages the reinforcement learning architecture, in which an attack agent actively injects fake data (user behaviors) into the recommender system, and then can improve its attack strategies through reward signals that are available under the strict black-box setting. Specifically, we model the attack behavior trajectory as the Markov Decision Process (MDP) in reinforcement learning. We also design a Biased Complete Binary Tree (BCBT) to reformulate the action space for better attack performance. We adopt 8 widely-used representative recommendation algorithms as our testbeds, and make extensive experiments on 4 different real-world datasets. The results show that PoisonRec has the ability to achieve good attack performance on various recommender systems with limited knowledge.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"23 1","pages":"157-168"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88396393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Exploring Finer Granularity within the Cores: Efficient (k,p)-Core Computation 在核心中探索更细的粒度:高效的(k,p)核心计算

2020 IEEE 36th International Conference on Data Engineering (ICDE)

Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00023

Chen Zhang, Fan Zhang, W. Zhang, Boge Liu, Ying Zhang, Lu Qin, Xuemin Lin

In this paper, we propose and study a novel cohesive subgraph model, named (k,p)-core, which is a maximal subgraph where each vertex has at least k neighbours and at least p fraction of its neighbours in the subgraph. The model is motivated by the finding that each user in a community should have at least a certain fraction p of neighbors inside the community to ensure user engagement, especially for users with large degrees. Meanwhile, the uniform degree constraint k, as applied in the k-core model, guarantees a minimum level of user engagement in a community, and is especially effective for users with small degrees. We propose an O(m) algorithm to compute a (k,p)-core with given k and p, and an O(dm) algorithm to decompose a graph by (k,p)-core, where m is the number of edges in the graph G and d is the degeneracy of G. A space efficient index is designed for time-optimal (k,p)-core query processing. Novel techniques are proposed for the maintenance of (k,p)-core index against graph dynamic. Extensive experiments on 8 reallife datasets demonstrate that our (k,p)-core model is effective and the algorithms are efficient.

本文提出并研究了一种新的内聚子图模型(k,p)-core，它是一个极大子图，其中每个顶点在子图中至少有k个邻居和其邻居的至少p个分数。该模型的动机是发现社区中的每个用户都应该在社区中至少拥有一定比例的邻居p，以确保用户粘性，特别是对于拥有高学位的用户。同时，在k-core模型中应用的均匀度约束k保证了社区中用户参与度的最低水平，对于度小的用户尤其有效。我们提出了一种O(m)算法来计算给定k和p的(k,p)核，以及一种O(dm)算法来分解图(k,p)核，其中m是图G中的边数，d是G的简并度。提出了在图动态条件下维持(k,p)核指数的新方法。在8个真实数据集上的大量实验表明，我们的(k,p)核模型是有效的，算法是高效的。

{"title":"Exploring Finer Granularity within the Cores: Efficient (k,p)-Core Computation","authors":"Chen Zhang, Fan Zhang, W. Zhang, Boge Liu, Ying Zhang, Lu Qin, Xuemin Lin","doi":"10.1109/ICDE48307.2020.00023","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00023","url":null,"abstract":"In this paper, we propose and study a novel cohesive subgraph model, named (k,p)-core, which is a maximal subgraph where each vertex has at least k neighbours and at least p fraction of its neighbours in the subgraph. The model is motivated by the finding that each user in a community should have at least a certain fraction p of neighbors inside the community to ensure user engagement, especially for users with large degrees. Meanwhile, the uniform degree constraint k, as applied in the k-core model, guarantees a minimum level of user engagement in a community, and is especially effective for users with small degrees. We propose an O(m) algorithm to compute a (k,p)-core with given k and p, and an O(dm) algorithm to decompose a graph by (k,p)-core, where m is the number of edges in the graph G and d is the degeneracy of G. A space efficient index is designed for time-optimal (k,p)-core query processing. Novel techniques are proposed for the maintenance of (k,p)-core index against graph dynamic. Extensive experiments on 8 reallife datasets demonstrate that our (k,p)-core model is effective and the algorithms are efficient.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"14 1","pages":"181-192"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77833439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2020 IEEE 36th International Conference on Data Engineering (ICDE)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀