首页 > 最新文献

2020 IEEE 36th International Conference on Data Engineering (ICDE)最新文献

英文 中文
HBP: Hotness Balanced Partition for Prioritized Iterative Graph Computations 优先迭代图计算的热度平衡划分
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00209
Shufeng Gong, Yanfeng Zhang, Ge Yu
Existing graph partition methods are designed for round-robin synchronous distributed frameworks. They balance workload without discrimination of vertex importance and fail to consider the characteristics of priority-based scheduling, which may limit the benefit of prioritized graph computation. To accelerate prioritized iterative graph computations, we propose Hotness Balanced Partition (HBP) and a stream-based partition algorithm Pb-HBP. Pb-HBP partitions graph by distributing vertices with discrimination according to their hotness rather than blindly distributing vertices with equal weights, which aims to evenly distribute the hot vertices among workers. Our results show that our proposed partition method outperforms the state-of-the-art partition methods, Fennel and HotGraph. Specifically, Pb-HBP can reduce 40–90% runtime of that by hash partition, 5–75% runtime of that by Fennel, and 22–50% runtime of that by HotGraph.
现有的图划分方法是为循环同步分布式框架设计的。它们在不区分顶点重要性的情况下平衡工作负载,并且没有考虑基于优先级调度的特点,这可能会限制优先级图计算的优势。为了加速优先迭代图的计算,我们提出了热均衡划分算法(HBP)和基于流的划分算法Pb-HBP。Pb-HBP对图进行划分,不是盲目地分配权值相等的顶点,而是根据热度分布有区别的顶点,目的是将热顶点均匀地分布在工人之间。我们的结果表明,我们提出的分区方法优于最先进的分区方法,Fennel和HotGraph。具体来说,Pb-HBP通过散列分区可以减少40-90%的运行时间,通过Fennel可以减少5-75%的运行时间,通过HotGraph可以减少22-50%的运行时间。
{"title":"HBP: Hotness Balanced Partition for Prioritized Iterative Graph Computations","authors":"Shufeng Gong, Yanfeng Zhang, Ge Yu","doi":"10.1109/ICDE48307.2020.00209","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00209","url":null,"abstract":"Existing graph partition methods are designed for round-robin synchronous distributed frameworks. They balance workload without discrimination of vertex importance and fail to consider the characteristics of priority-based scheduling, which may limit the benefit of prioritized graph computation. To accelerate prioritized iterative graph computations, we propose Hotness Balanced Partition (HBP) and a stream-based partition algorithm Pb-HBP. Pb-HBP partitions graph by distributing vertices with discrimination according to their hotness rather than blindly distributing vertices with equal weights, which aims to evenly distribute the hot vertices among workers. Our results show that our proposed partition method outperforms the state-of-the-art partition methods, Fennel and HotGraph. Specifically, Pb-HBP can reduce 40–90% runtime of that by hash partition, 5–75% runtime of that by Fennel, and 22–50% runtime of that by HotGraph.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"13 1","pages":"1942-1945"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86640639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
StructSim: Querying Structural Node Similarity at Billion Scale StructSim:在十亿尺度上查询结构节点相似度
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00211
Xiaoshuang Chen, Longbin Lai, Lu Qin, Xuemin Lin
Structural node similarity is widely used in analyzing complex networks. As one of the structural node similarity metrics, role similarity has the good merit of indicating automorphism (isomorphism). Existing algorithms to compute role similarity (e.g., RoleSim and NED) suffer from severe performance bottlenecks, and thus cannot handle large real-world graphs. In this paper, we propose a new framework StructSim to compute nodes’ role similarity. Under this framework, we prove that StructSim is guaranteed to be an admissible role similarity metric based on the maximum matching. While maximum matching is too costly to scale, we then devise the BinCount matching to speed up the computation. BinCount-based StructSim admits a precomputed index to query one single pair in O(k log D) time, where k is a small user-defined parameter and D is the maximum node degree. Extensive empirical studies show that StructSim is significantly faster than existing works for computing structural node similarities on the real-world graphs, with comparable effectiveness.
结构节点相似性在复杂网络分析中得到了广泛的应用。角色相似度作为结构节点相似度度量之一,具有表示自同构(同构)的优点。现有的计算角色相似度的算法(例如,RoleSim和NED)存在严重的性能瓶颈,因此无法处理现实世界中的大型图形。本文提出了一个新的框架StructSim来计算节点的角色相似度。在此框架下,我们证明了StructSim是基于最大匹配的可接受的角色相似度度量。虽然最大匹配的成本太高,无法扩展,但我们随后设计了BinCount匹配来加快计算速度。基于bincount的StructSim允许预先计算索引在O(k log D)时间内查询单个对,其中k是用户自定义的小参数,D是最大节点度。大量的实证研究表明,StructSim在计算现实世界图上的结构节点相似度方面比现有的工作要快得多,并且具有相当的有效性。
{"title":"StructSim: Querying Structural Node Similarity at Billion Scale","authors":"Xiaoshuang Chen, Longbin Lai, Lu Qin, Xuemin Lin","doi":"10.1109/ICDE48307.2020.00211","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00211","url":null,"abstract":"Structural node similarity is widely used in analyzing complex networks. As one of the structural node similarity metrics, role similarity has the good merit of indicating automorphism (isomorphism). Existing algorithms to compute role similarity (e.g., RoleSim and NED) suffer from severe performance bottlenecks, and thus cannot handle large real-world graphs. In this paper, we propose a new framework StructSim to compute nodes’ role similarity. Under this framework, we prove that StructSim is guaranteed to be an admissible role similarity metric based on the maximum matching. While maximum matching is too costly to scale, we then devise the BinCount matching to speed up the computation. BinCount-based StructSim admits a precomputed index to query one single pair in O(k log D) time, where k is a small user-defined parameter and D is the maximum node degree. Extensive empirical studies show that StructSim is significantly faster than existing works for computing structural node similarities on the real-world graphs, with comparable effectiveness.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"30 1","pages":"1950-1953"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81154043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Indoor Mobility Semantics Annotation Using Coupled Conditional Markov Networks 基于耦合条件马尔可夫网络的室内移动语义标注
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00128
Huan Li, Hua Lu, M. A. Cheema, L. Shou, Gang Chen
Indoor mobility semantics analytics can greatly benefit many pertinent applications. Existing semantic annotation methods mainly focus on outdoor space and require extra knowledge such as POI category or human activity regularity. However, these conditions are difficult to meet in indoor venues with relatively small extents but complex topology. This work studies the annotation of indoor mobility semantics that describe an object’s mobility event (what ) at a semantic indoor region (where ) during a time period (when ). A coupled conditional Markov network (C2MN) is proposed with a set of feature functions carefully designed by incorporating indoor topology and mobility behaviors. C2MN is able to capture probabilistic dependencies among positioning records, semantic regions, and mobility events jointly. Nevertheless, the correlation of regions and events hinders the parameters learning. Therefore, we devise an alternate learning algorithm to enable the parameter learning over correlated variables. The extensive experiments demonstrate that our C2MN-based semantic annotation is efficient and effective on both real and synthetic indoor mobility data.
室内移动语义分析可以极大地有利于许多相关的应用。现有的语义标注方法主要集中在户外空间,需要额外的知识,如POI类别或人类活动的规律性。然而,这些条件在面积相对较小但拓扑结构复杂的室内场馆中很难满足。这项工作研究了室内移动语义的注释,该语义描述了一个时间段(何时)在语义室内区域(何地)中物体的移动事件(什么)。提出了一种耦合条件马尔可夫网络(C2MN),该网络结合室内拓扑和移动行为,精心设计了一组特征函数。C2MN能够联合捕获定位记录、语义区域和移动事件之间的概率依赖关系。然而,区域和事件的相关性阻碍了参数的学习。因此,我们设计了一种替代学习算法来实现对相关变量的参数学习。大量的实验表明,我们基于c2mn的语义标注在真实和合成室内移动数据上都是高效的。
{"title":"Indoor Mobility Semantics Annotation Using Coupled Conditional Markov Networks","authors":"Huan Li, Hua Lu, M. A. Cheema, L. Shou, Gang Chen","doi":"10.1109/ICDE48307.2020.00128","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00128","url":null,"abstract":"Indoor mobility semantics analytics can greatly benefit many pertinent applications. Existing semantic annotation methods mainly focus on outdoor space and require extra knowledge such as POI category or human activity regularity. However, these conditions are difficult to meet in indoor venues with relatively small extents but complex topology. This work studies the annotation of indoor mobility semantics that describe an object’s mobility event (what ) at a semantic indoor region (where ) during a time period (when ). A coupled conditional Markov network (C2MN) is proposed with a set of feature functions carefully designed by incorporating indoor topology and mobility behaviors. C2MN is able to capture probabilistic dependencies among positioning records, semantic regions, and mobility events jointly. Nevertheless, the correlation of regions and events hinders the parameters learning. Therefore, we devise an alternate learning algorithm to enable the parameter learning over correlated variables. The extensive experiments demonstrate that our C2MN-based semantic annotation is efficient and effective on both real and synthetic indoor mobility data.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"19 1","pages":"1441-1452"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81988838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Contribution Maximization in Probabilistic Datalog 概率数据中的贡献最大化
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00076
T. Milo, Y. Moskovitch, Brit Youngmann
The use of probabilistic datalog programs has been recently advocated for applications that involve recursive computation and uncertainty. While using such programs allows for a flexible knowledge derivation, it makes the analysis of query results a challenging task. Particularly, given a set O of output tuples and a number k, one would like to understand which k-size subset of the input tuples have contributed the most to the derivation of O. This is useful for multiple tasks, such as identifying the critical sources of errors and understanding surprising results. Previous works have mainly focused on the quantification of tuples contribution to a query result in non-recursive SQL queries, very often disregarding probabilistic inference. To quantify the contribution in probabilistic datalog programs, one must account for the recursive relations between input and output data, and the uncertainty. To this end, we formalize the Contribution Maximization (CM) problem. We then reduce CM to the well-studied Influence Maximization (IM) problem, showing that we can harness techniques developed for IM to our setting. However, we show that such naïve adoption results in poor performance. To overcome this, we propose an optimized algorithm which injects a refined variant of the classic Magic Sets technique, integrated with a sampling method, into IM algorithms, achieving a significant saving of space and execution time. Our experiments demonstrate the effectiveness of our algorithm, even where the naïve approach is infeasible.
在涉及递归计算和不确定性的应用中,最近提倡使用概率数据程序。虽然使用这样的程序允许灵活的知识派生,但它使查询结果的分析成为一项具有挑战性的任务。特别是,给定一组O个输出元组和一个数字k,人们想要了解输入元组的哪个k大小的子集对O的推导贡献最大。这对于多个任务很有用,例如识别错误的关键来源和理解令人惊讶的结果。以前的工作主要集中在非递归SQL查询中元组对查询结果的贡献的量化上,通常忽略了概率推理。为了量化概率数据程序中的贡献,必须考虑输入和输出数据之间的递归关系以及不确定性。为此,我们将贡献最大化(CM)问题形式化。然后,我们将CM简化为经过充分研究的影响最大化(IM)问题,表明我们可以利用为IM开发的技术来实现我们的设置。然而,我们表明这样的naïve采用导致了较差的性能。为了克服这个问题,我们提出了一种优化算法,该算法将经典Magic Sets技术的改进变体与采样方法集成到IM算法中,从而大大节省了空间和执行时间。我们的实验证明了我们的算法的有效性,即使naïve方法是不可行的。
{"title":"Contribution Maximization in Probabilistic Datalog","authors":"T. Milo, Y. Moskovitch, Brit Youngmann","doi":"10.1109/ICDE48307.2020.00076","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00076","url":null,"abstract":"The use of probabilistic datalog programs has been recently advocated for applications that involve recursive computation and uncertainty. While using such programs allows for a flexible knowledge derivation, it makes the analysis of query results a challenging task. Particularly, given a set O of output tuples and a number k, one would like to understand which k-size subset of the input tuples have contributed the most to the derivation of O. This is useful for multiple tasks, such as identifying the critical sources of errors and understanding surprising results. Previous works have mainly focused on the quantification of tuples contribution to a query result in non-recursive SQL queries, very often disregarding probabilistic inference. To quantify the contribution in probabilistic datalog programs, one must account for the recursive relations between input and output data, and the uncertainty. To this end, we formalize the Contribution Maximization (CM) problem. We then reduce CM to the well-studied Influence Maximization (IM) problem, showing that we can harness techniques developed for IM to our setting. However, we show that such naïve adoption results in poor performance. To overcome this, we propose an optimized algorithm which injects a refined variant of the classic Magic Sets technique, integrated with a sampling method, into IM algorithms, achieving a significant saving of space and execution time. Our experiments demonstrate the effectiveness of our algorithm, even where the naïve approach is infeasible.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"72 1","pages":"817-828"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78747207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
vCBIR: A Verifiable Search Engine for Content-Based Image Retrieval 基于内容的图像检索的可验证搜索引擎vCBIR
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00156
Shangwei Guo, Yang Ji, Ce Zhang, Cheng Xu, Jianliang Xu
We demonstrate vCBIR, a verifiable search engine for Content-Based Image Retrieval. vCBIR allows a small or medium-sized enterprise to outsource its image database to a cloud-based service provider and ensures the integrity of query processing. Like other common data-as-a-service (DaaS) systems, vCBIR consists of three parties: (i) the image owner who outsources its database, (ii) the service provider who executes the authenticated query processing, and (iii) the client who issues search queries. By employing a novel query authentication scheme proposed in our prior work [4], the system not only supports cloud-based image retrieval, but also generates a cryptographic proof for each query, by which the client could verify the integrity of query results. During the demonstration, we will showcase the usage of vCBIR and also provide attendees interactive experience of verifying query results against an untrustworthy service provider through graphical user interface (GUI).
我们演示了vCBIR,一个基于内容的图像检索的可验证搜索引擎。vCBIR允许中小型企业将其图像数据库外包给基于云的服务提供商,并确保查询处理的完整性。与其他常见的数据即服务(DaaS)系统一样,vCBIR由三方组成:(i)将其数据库外包的映像所有者,(ii)执行身份验证查询处理的服务提供商,以及(iii)发出搜索查询的客户端。通过采用我们在之前的工作[4]中提出的一种新的查询认证方案,系统不仅支持基于云的图像检索,而且还为每个查询生成一个加密证明,客户端可以通过该验证查询结果的完整性。在演示过程中,我们将展示vCBIR的使用,并通过图形用户界面(GUI)为与会者提供针对不可信的服务提供商验证查询结果的交互式体验。
{"title":"vCBIR: A Verifiable Search Engine for Content-Based Image Retrieval","authors":"Shangwei Guo, Yang Ji, Ce Zhang, Cheng Xu, Jianliang Xu","doi":"10.1109/ICDE48307.2020.00156","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00156","url":null,"abstract":"We demonstrate vCBIR, a verifiable search engine for Content-Based Image Retrieval. vCBIR allows a small or medium-sized enterprise to outsource its image database to a cloud-based service provider and ensures the integrity of query processing. Like other common data-as-a-service (DaaS) systems, vCBIR consists of three parties: (i) the image owner who outsources its database, (ii) the service provider who executes the authenticated query processing, and (iii) the client who issues search queries. By employing a novel query authentication scheme proposed in our prior work [4], the system not only supports cloud-based image retrieval, but also generates a cryptographic proof for each query, by which the client could verify the integrity of query results. During the demonstration, we will showcase the usage of vCBIR and also provide attendees interactive experience of verifying query results against an untrustworthy service provider through graphical user interface (GUI).","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"5 1","pages":"1730-1733"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75899262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cool, a COhort OnLine analytical processing system Cool,一个队列在线分析处理系统
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00056
Zhongle Xie, Hongbin Ying, Cong Yue, Meihui Zhang, Gang Chen, B. Ooi
With a huge volume and variety of data accumulated over the years, OnLine Analytical Processing (OLAP) systems are facing challenges in query efficiency. Furthermore, the design of OLAP systems cannot serve modern applications well due to their inefficiency in processing complex queries such as cohort queries with low query latency. In this paper, we present Cool, a cohort online analytical processing system. As an integrated system with the support of several newly proposed operators on top of a sophisticated storage layer, it processes both cohort queries and conventional OLAP queries with superb performance. Its distributed design contains minimal load balancing and fault tolerance support and is scalable. Our evaluation results show that Cool outperforms two state-of-the-art systems, MonetDB and Druid, by a wide margin in single-node setting. The multi-node version of Cool can also beat the distributed Druid, as well as SparkSQL, by one order of magnitude in terms of query latency.
联机分析处理(OnLine Analytical Processing, OLAP)系统由于多年来积累的海量数据和各种数据,在查询效率方面面临着挑战。此外,OLAP系统的设计不能很好地服务于现代应用程序,因为它们在处理复杂查询(如具有低查询延迟的队列查询)方面效率低下。在本文中,我们提出了Cool,一个队列在线分析处理系统。作为一个集成系统,它在复杂的存储层上支持几个新提出的运算符,它既可以处理队列查询,也可以处理传统的OLAP查询,性能非常好。它的分布式设计包含最小的负载平衡和容错支持,并且是可扩展的。我们的评估结果表明,在单节点设置中,Cool优于MonetDB和Druid这两个最先进的系统。在查询延迟方面,Cool的多节点版本也可以击败分布式Druid和SparkSQL一个数量级。
{"title":"Cool, a COhort OnLine analytical processing system","authors":"Zhongle Xie, Hongbin Ying, Cong Yue, Meihui Zhang, Gang Chen, B. Ooi","doi":"10.1109/ICDE48307.2020.00056","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00056","url":null,"abstract":"With a huge volume and variety of data accumulated over the years, OnLine Analytical Processing (OLAP) systems are facing challenges in query efficiency. Furthermore, the design of OLAP systems cannot serve modern applications well due to their inefficiency in processing complex queries such as cohort queries with low query latency. In this paper, we present Cool, a cohort online analytical processing system. As an integrated system with the support of several newly proposed operators on top of a sophisticated storage layer, it processes both cohort queries and conventional OLAP queries with superb performance. Its distributed design contains minimal load balancing and fault tolerance support and is scalable. Our evaluation results show that Cool outperforms two state-of-the-art systems, MonetDB and Druid, by a wide margin in single-node setting. The multi-node version of Cool can also beat the distributed Druid, as well as SparkSQL, by one order of magnitude in terms of query latency.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"577-588"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85476654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deciding When to Trade Data Freshness for Performance in MongoDB-as-a-Service 在mongodb即服务中决定何时以数据新鲜度换取性能
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00207
Chenhao Huang, Michael J. Cahill, A. Fekete, Uwe Röhm
MongoDB is a popular document store that is also available as a cloud-hosted service. MongoDB internally deploys primary-copy asynchronous replication, and it allows clients to vary the Read Preference, so reads can deliberately be directed to secondaries rather than the primary site. Doing this can sometimes improve performance, but the returned data might be stale, whereas the primary always returns the freshest data value. While state-of-practice is for programmers to decide where to direct the reads at application development time, they do not have full understanding then of workload or hardware capacity. It should be better to choose the appropriate Read Preference setting at runtime, as we describe in this paper.We show how a system can detect when the primary copy is saturated in MongoDB-as-a-Service, and use this to choose where reads should be done to improve overall performance. Our approach is aimed at a cloud-consumer; it assumes access to only the limited diagnostic data provided to clients of the hosted service.
MongoDB是一个流行的文档存储,也可以作为云托管服务。MongoDB内部部署主副本异步复制,它允许客户端改变Read Preference,因此读取可以故意定向到备用站点而不是主站点。这样做有时可以提高性能,但是返回的数据可能是过时的,而主服务器总是返回最新的数据值。虽然实践状态是由程序员在应用程序开发时决定在哪里读取,但他们并不完全了解工作负载或硬件容量。最好在运行时选择适当的Read Preference设置,正如我们在本文中所描述的那样。我们将展示系统如何在MongoDB-as-a-Service中检测主副本何时饱和,并使用它来选择应该在何处进行读取以提高整体性能。我们的目标客户是云消费者;它假定只能访问提供给托管服务的客户机的有限诊断数据。
{"title":"Deciding When to Trade Data Freshness for Performance in MongoDB-as-a-Service","authors":"Chenhao Huang, Michael J. Cahill, A. Fekete, Uwe Röhm","doi":"10.1109/ICDE48307.2020.00207","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00207","url":null,"abstract":"MongoDB is a popular document store that is also available as a cloud-hosted service. MongoDB internally deploys primary-copy asynchronous replication, and it allows clients to vary the Read Preference, so reads can deliberately be directed to secondaries rather than the primary site. Doing this can sometimes improve performance, but the returned data might be stale, whereas the primary always returns the freshest data value. While state-of-practice is for programmers to decide where to direct the reads at application development time, they do not have full understanding then of workload or hardware capacity. It should be better to choose the appropriate Read Preference setting at runtime, as we describe in this paper.We show how a system can detect when the primary copy is saturated in MongoDB-as-a-Service, and use this to choose where reads should be done to improve overall performance. Our approach is aimed at a cloud-consumer; it assumes access to only the limited diagnostic data provided to clients of the hosted service.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"53 1","pages":"1934-1937"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85713594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Efficiently Answering Span-Reachability Queries in Large Temporal Graphs 大时间图中跨度可达性查询的有效回答
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00104
Dong Wen, Yilun Huang, Ying Zhang, Lu Qin, W. Zhang, Xuemin Lin
Reachability is a fundamental problem in graph analysis. In applications such as social networks and collaboration networks, edges are always associated with timestamps. Most existing works on reachability queries in temporal graphs assume that two vertices are related if they are connected by a path with non-decreasing timestamps (time-respecting) of edges. This assumption fails to capture the relationship between entities involved in the same group or activity with no time-respecting path connecting them. In this paper, we define a new reachability model, called span-reachability, designed to relax the time order dependency and identify the relationship between entities in a given time period. We adopt the idea of two-hop cover and propose an index-based method to answer span-reachability queries. Several optimizations are also given to improve the efficiency of index construction and query processing. We conduct extensive experiments on 17 real-world datasets to show the efficiency of our proposed solution.
可达性是图分析中的一个基本问题。在社交网络和协作网络等应用程序中,边缘总是与时间戳相关联。大多数关于时间图中可达性查询的现有工作都假设两个顶点是相关的,如果它们由具有非递减时间戳(与时间有关)的边的路径连接。这种假设无法捕捉到同一组或活动中涉及的实体之间的关系,因为没有时间相关的路径将它们连接起来。在本文中,我们定义了一个新的可达性模型,称为跨可达性,旨在放松时间顺序依赖,并识别给定时间段内实体之间的关系。我们采用两跳覆盖的思想,提出了一种基于索引的跨可达性查询的回答方法。为了提高索引构建和查询处理的效率,本文还进行了一些优化。我们在17个真实数据集上进行了广泛的实验,以证明我们提出的解决方案的效率。
{"title":"Efficiently Answering Span-Reachability Queries in Large Temporal Graphs","authors":"Dong Wen, Yilun Huang, Ying Zhang, Lu Qin, W. Zhang, Xuemin Lin","doi":"10.1109/ICDE48307.2020.00104","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00104","url":null,"abstract":"Reachability is a fundamental problem in graph analysis. In applications such as social networks and collaboration networks, edges are always associated with timestamps. Most existing works on reachability queries in temporal graphs assume that two vertices are related if they are connected by a path with non-decreasing timestamps (time-respecting) of edges. This assumption fails to capture the relationship between entities involved in the same group or activity with no time-respecting path connecting them. In this paper, we define a new reachability model, called span-reachability, designed to relax the time order dependency and identify the relationship between entities in a given time period. We adopt the idea of two-hop cover and propose an index-based method to answer span-reachability queries. Several optimizations are also given to improve the efficiency of index construction and query processing. We conduct extensive experiments on 17 real-world datasets to show the efficiency of our proposed solution.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"109 1","pages":"1153-1164"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87325816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
UniKV: Toward High-Performance and Scalable KV Storage in Mixed Workloads via Unified Indexing UniKV:通过统一索引在混合工作负载中实现高性能和可扩展的KV存储
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00034
Qiang Zhang, Yongkun Li, P. Lee, Yinlong Xu, Qiu Cui, L. Tang
Persistent key-value (KV) stores are mainly designed based on the Log-Structured Merge-tree (LSM-tree), which suffer from large read and write amplifications, especially when KV stores grow in size. Existing design optimizations for LSM-tree-based KV stores often make certain trade-offs and fail to simultaneously improve both the read and write performance on large KV stores without sacrificing scan performance. We design UniKV, which unifies the key design ideas of hash indexing and the LSM-tree in a single system. Specifically, UniKV leverages data locality to differentiate the indexing management of KV pairs. It also develops multiple techniques to tackle the issues caused by unifying the indexing techniques, so as to simultaneously improve the performance in reads, writes, and scans. Experiments show that UniKV significantly outperforms several state-of-the-art KV stores (e.g., LevelDB, RocksDB, HyperLevelDB, and PebblesDB) in overall throughput under read-write mixed workloads.
持久性键值存储(Persistent key-value, KV)主要基于日志结构合并树(Log-Structured Merge-tree, LSM-tree)进行设计,但这种存储存在较大的读写放大,特别是当KV存储规模增长时。现有的基于lsm树的KV存储的设计优化通常会进行某些权衡,并且无法在不牺牲扫描性能的情况下同时提高大型KV存储的读写性能。我们设计了UniKV,它将哈希索引和lsm树的关键设计思想统一在一个系统中。具体来说,UniKV利用数据局部性来区分KV对的索引管理。它还开发了多种技术来解决由统一索引技术引起的问题,从而同时提高读、写和扫描的性能。实验表明,在读写混合工作负载下,UniKV在总体吞吐量方面明显优于几个最先进的KV存储(例如,LevelDB, RocksDB, HyperLevelDB和pebble)。
{"title":"UniKV: Toward High-Performance and Scalable KV Storage in Mixed Workloads via Unified Indexing","authors":"Qiang Zhang, Yongkun Li, P. Lee, Yinlong Xu, Qiu Cui, L. Tang","doi":"10.1109/ICDE48307.2020.00034","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00034","url":null,"abstract":"Persistent key-value (KV) stores are mainly designed based on the Log-Structured Merge-tree (LSM-tree), which suffer from large read and write amplifications, especially when KV stores grow in size. Existing design optimizations for LSM-tree-based KV stores often make certain trade-offs and fail to simultaneously improve both the read and write performance on large KV stores without sacrificing scan performance. We design UniKV, which unifies the key design ideas of hash indexing and the LSM-tree in a single system. Specifically, UniKV leverages data locality to differentiate the indexing management of KV pairs. It also develops multiple techniques to tackle the issues caused by unifying the indexing techniques, so as to simultaneously improve the performance in reads, writes, and scans. Experiments show that UniKV significantly outperforms several state-of-the-art KV stores (e.g., LevelDB, RocksDB, HyperLevelDB, and PebblesDB) in overall throughput under read-write mixed workloads.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"14 1","pages":"313-324"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88573062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Task Allocation in Dependency-aware Spatial Crowdsourcing 依赖感知空间众包中的任务分配
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00090
Wangze Ni, Peng Cheng, Lei Chen, Xuemin Lin
Ubiquitous smart devices and high-quality wireless networks enable people to participate in spatial crowdsourcing tasks easily, which require workers to physically move to specific locations to conduct their assigned tasks. Spatial crowdsourcing has attracted much attention from both academia and industry. In this paper, we consider a spatial crowdsourcing scenario, where the tasks may have some dependencies among them. Specifically, one task can only be dispatched when its dependent tasks have already been assigned. In fact, task dependencies are quite common in many real-life applications, such as house repairing and holding sports games. We formally define the dependency-aware spatial crowdsourcing (DA-SC), which focuses on finding an optimal worker-and-task assignment under the constraints of dependencies, skills of workers, moving distances and deadlines to maximize the successfully assigned tasks. We prove that the DA-SC problem is NP-hard and thus intractable. Therefore, we propose two approximation algorithms, including a greedy approach and a game-theoretic approach, which can guarantee the approximate bounds of the results in each batch process. Through extensive experiments on both real and synthetic data sets, we demonstrate the efficiency and effectiveness of our DA-SC approaches.
无处不在的智能设备和高质量的无线网络使人们能够轻松地参与空间众包任务,这需要工人实际移动到特定的位置来执行分配的任务。空间众包已经引起了学术界和工业界的广泛关注。在本文中,我们考虑一个空间众包场景,其中任务之间可能存在一些依赖关系。具体来说,一个任务只能在其相关任务已经被分配的情况下才可以被分派。事实上,任务依赖关系在许多现实生活中的应用程序中非常常见,例如房屋维修和举办体育比赛。我们正式定义了依赖感知空间众包(DA-SC),其重点是在依赖关系、工人技能、移动距离和截止日期的约束下找到最佳的工人和任务分配,以最大限度地成功分配任务。我们证明了DA-SC问题是np困难的,因此是难以处理的。因此,我们提出了两种近似算法,即贪心算法和博弈论算法,可以保证每个批处理结果的近似界。通过对真实和合成数据集的大量实验,我们证明了我们的DA-SC方法的效率和有效性。
{"title":"Task Allocation in Dependency-aware Spatial Crowdsourcing","authors":"Wangze Ni, Peng Cheng, Lei Chen, Xuemin Lin","doi":"10.1109/ICDE48307.2020.00090","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00090","url":null,"abstract":"Ubiquitous smart devices and high-quality wireless networks enable people to participate in spatial crowdsourcing tasks easily, which require workers to physically move to specific locations to conduct their assigned tasks. Spatial crowdsourcing has attracted much attention from both academia and industry. In this paper, we consider a spatial crowdsourcing scenario, where the tasks may have some dependencies among them. Specifically, one task can only be dispatched when its dependent tasks have already been assigned. In fact, task dependencies are quite common in many real-life applications, such as house repairing and holding sports games. We formally define the dependency-aware spatial crowdsourcing (DA-SC), which focuses on finding an optimal worker-and-task assignment under the constraints of dependencies, skills of workers, moving distances and deadlines to maximize the successfully assigned tasks. We prove that the DA-SC problem is NP-hard and thus intractable. Therefore, we propose two approximation algorithms, including a greedy approach and a game-theoretic approach, which can guarantee the approximate bounds of the results in each batch process. Through extensive experiments on both real and synthetic data sets, we demonstrate the efficiency and effectiveness of our DA-SC approaches.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"61 1","pages":"985-996"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90996187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
期刊
2020 IEEE 36th International Conference on Data Engineering (ICDE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1