首页 > 最新文献

2020 IEEE 36th International Conference on Data Engineering (ICDE)最新文献

英文 中文
MC-Explorer: Analyzing and Visualizing Motif-Cliques on Large Networks MC-Explorer:分析和可视化大型网络上的主题集团
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00154
Boxuan Li, Reynold Cheng, Jiafeng Hu, Yixiang Fang, Min Ou, Ruibang Luo, K. Chang, Xuemin Lin
Large networks with labeled nodes are prevalent in various applications, such as biological graphs, social networks, and e-commerce graphs. To extract insight from this rich information source, we propose MC-Explorer, which is an advanced analysis and visualization system. A highlight of MC-Explorer is its ability to discover motif-cliques from a graph with labeled nodes. A motif, such as a 3-node triangle, is a fundamental building block of a graph. A motif-clique is a "complete" subgraph in a network with respect to a desired higher-order connection pattern. For example, on a large biological graph, we found out some motif-cliques, which disclose new side effects of a drug, and potential drugs for healing diseases. MC-Explorer includes online and interactive facilities for exploring a large labeled network through the use of motif-cliques. We will demonstrate how MC-Explorer can facilitate the analysis and visualization of a labeled biological network.An online demo video of MC-Explorer can be accessed from https://www.dropbox.com/s/vkalumc28wqp8yl/demo.mov
带有标记节点的大型网络在各种应用中都很流行,例如生物图、社交网络和电子商务图。为了从这个丰富的信息源中提取洞察力,我们提出了MC-Explorer,这是一个先进的分析和可视化系统。MC-Explorer的一个亮点是它能够从带有标记节点的图形中发现主题团。一个主题,如3节点三角形,是图形的基本构建块。基团是网络中相对于期望的高阶连接模式的“完全”子图。例如,在一个大的生物图上,我们发现了一些基序集团,它们揭示了药物的新副作用,以及治疗疾病的潜在药物。MC-Explorer包括在线和交互式设施,用于通过使用主题集团来探索大型标记网络。我们将演示MC-Explorer如何促进标记生物网络的分析和可视化。MC-Explorer的在线演示视频可以从https://www.dropbox.com/s/vkalumc28wqp8yl/demo.mov访问
{"title":"MC-Explorer: Analyzing and Visualizing Motif-Cliques on Large Networks","authors":"Boxuan Li, Reynold Cheng, Jiafeng Hu, Yixiang Fang, Min Ou, Ruibang Luo, K. Chang, Xuemin Lin","doi":"10.1109/ICDE48307.2020.00154","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00154","url":null,"abstract":"Large networks with labeled nodes are prevalent in various applications, such as biological graphs, social networks, and e-commerce graphs. To extract insight from this rich information source, we propose MC-Explorer, which is an advanced analysis and visualization system. A highlight of MC-Explorer is its ability to discover motif-cliques from a graph with labeled nodes. A motif, such as a 3-node triangle, is a fundamental building block of a graph. A motif-clique is a \"complete\" subgraph in a network with respect to a desired higher-order connection pattern. For example, on a large biological graph, we found out some motif-cliques, which disclose new side effects of a drug, and potential drugs for healing diseases. MC-Explorer includes online and interactive facilities for exploring a large labeled network through the use of motif-cliques. We will demonstrate how MC-Explorer can facilitate the analysis and visualization of a labeled biological network.An online demo video of MC-Explorer can be accessed from https://www.dropbox.com/s/vkalumc28wqp8yl/demo.mov","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"15 1","pages":"1722-1725"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87407339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Shortest Path Queries for Indoor Venues with Temporal Variations 具有时间变化的室内场馆最短路径查询
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00227
Tiantian Liu, Zijin Feng, Huan Li, Hua Lu, M. A. Cheema, Hong Cheng, Jianliang Xu
Indoor shortest path query (ISPQ) is of fundamental importance for indoor location-based services (LBS). However, existing ISPQs ignore indoor temporal variations, e.g., the open and close times associated with entities like doors and rooms. In this paper, we define a new type of query called Indoor Temporal-variation aware Shortest Path Query (ITSPQ). It returns the valid shortest path based on the up-to-date indoor topology at the query time. A set of techniques is designed to answer ITSPQ efficiently. We design a graph structure (IT-Graph) that captures indoor temporal variations. To process ITSPQ using IT-Graph, we design two algorithms that check a door’s accessibility synchronously and asynchronously, respectively. We experimentally evaluate the proposed techniques using synthetic data. The results show that our methods are efficient.
室内最短路径查询(ISPQ)是室内定位服务(LBS)的基础。然而,现有的ispq忽略了室内时间变化,例如,与门和房间等实体相关的打开和关闭时间。在本文中,我们定义了一种新的查询类型,称为室内时间变化感知最短路径查询(ITSPQ)。它根据查询时最新的室内拓扑返回有效的最短路径。设计了一套技术来有效地回答ITSPQ问题。我们设计了一个图结构(IT-Graph)来捕捉室内时间变化。为了使用IT-Graph处理ITSPQ,我们设计了两种算法,分别用于同步和异步检查门的可达性。我们用合成数据对所提出的技术进行了实验评估。结果表明,该方法是有效的。
{"title":"Shortest Path Queries for Indoor Venues with Temporal Variations","authors":"Tiantian Liu, Zijin Feng, Huan Li, Hua Lu, M. A. Cheema, Hong Cheng, Jianliang Xu","doi":"10.1109/ICDE48307.2020.00227","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00227","url":null,"abstract":"Indoor shortest path query (ISPQ) is of fundamental importance for indoor location-based services (LBS). However, existing ISPQs ignore indoor temporal variations, e.g., the open and close times associated with entities like doors and rooms. In this paper, we define a new type of query called Indoor Temporal-variation aware Shortest Path Query (ITSPQ). It returns the valid shortest path based on the up-to-date indoor topology at the query time. A set of techniques is designed to answer ITSPQ efficiently. We design a graph structure (IT-Graph) that captures indoor temporal variations. To process ITSPQ using IT-Graph, we design two algorithms that check a door’s accessibility synchronously and asynchronously, respectively. We experimentally evaluate the proposed techniques using synthetic data. The results show that our methods are efficient.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"19 1","pages":"2014-2017"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80092897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Advances in Cryptography and Secure Hardware for Data Outsourcing 用于数据外包的密码学和安全硬件研究进展
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00173
Shantanu Sharma, A. Burtsev, S. Mehrotra
Despite extensive research, secure outsourcing remains an open challenge. This tutorial focuses on recent advances in secure cloud-based data outsourcing based on cryptographic (encryption, secret-sharing, and multi-party computation (MPC)) and hardware-based approaches. We highlight the strengths and weaknesses of state-of-the-art techniques, and conclude that, while no single approach is likely to emerge as a silver bullet. Thus, the key is to merge different hardware and software techniques to work in conjunction using partitioned computing wherein a computation is split across different cryptographic techniques carefully, so as not to compromise security. We highlight some recent work in that direction.
尽管进行了广泛的研究,但安全外包仍然是一个公开的挑战。本教程重点介绍基于密码学(加密、秘密共享和多方计算)和基于硬件的方法的基于云的安全数据外包的最新进展。我们强调了最先进技术的优点和缺点,并得出结论,虽然没有一种方法可能成为灵丹妙药。因此,关键是合并不同的硬件和软件技术,以便使用分区计算来协同工作,其中计算在不同的加密技术之间被小心地分割,以免危及安全性。我们强调在这个方向上最近的一些工作。
{"title":"Advances in Cryptography and Secure Hardware for Data Outsourcing","authors":"Shantanu Sharma, A. Burtsev, S. Mehrotra","doi":"10.1109/ICDE48307.2020.00173","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00173","url":null,"abstract":"Despite extensive research, secure outsourcing remains an open challenge. This tutorial focuses on recent advances in secure cloud-based data outsourcing based on cryptographic (encryption, secret-sharing, and multi-party computation (MPC)) and hardware-based approaches. We highlight the strengths and weaknesses of state-of-the-art techniques, and conclude that, while no single approach is likely to emerge as a silver bullet. Thus, the key is to merge different hardware and software techniques to work in conjunction using partitioned computing wherein a computation is split across different cryptographic techniques carefully, so as not to compromise security. We highlight some recent work in that direction.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"6 1","pages":"1798-1801"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88493407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory 使用RDMA和共享内存的快速DBMS的低延迟通信
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00131
Philipp Fent, Alexander van Renen, Andreas Kipf, Viktor Leis, Thomas Neumann, A. Kemper
While hardware and software improvements greatly accelerated modern database systems’ internal operations, the decades-old stream-based Socket API for external communication is still unchanged. We show experimentally, that for modern high-performance systems networking has become a performance bottleneck. Therefore, we argue that the communication stack needs to be redesigned to fully exploit modern hardware—as has already happened to most other database system components.We propose L5, a high-performance communication layer for database systems. L5 rethinks the flow of data in and out of the database system and is based on direct memory access techniques for intra-datacenter (RDMA) and intra-machine communication (Shared Memory). With L5, we provide a building block to accelerate ODBC-like interfaces with a unified and message-based communication framework. Our results show that using interconnects like RDMA (InfiniBand), RoCE (Ethernet), and Shared Memory (IPC), L5 can largely eliminate the network bottleneck for database systems.
虽然硬件和软件的改进极大地加快了现代数据库系统的内部操作,但几十年前用于外部通信的基于流的Socket API仍然没有改变。我们通过实验证明,对于现代高性能系统来说,网络已经成为性能瓶颈。因此,我们认为通信栈需要重新设计,以充分利用现代硬件——正如大多数其他数据库系统组件已经发生的那样。我们提出L5,一种用于数据库系统的高性能通信层。L5重新考虑进出数据库系统的数据流,它基于数据中心内部(RDMA)和机器内部通信(共享内存)的直接内存访问技术。在L5中,我们提供了一个构建块,通过统一的、基于消息的通信框架来加速类似odbc的接口。我们的结果表明,使用RDMA (InfiniBand)、RoCE(以太网)和共享内存(IPC)等互连,L5可以在很大程度上消除数据库系统的网络瓶颈。
{"title":"Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory","authors":"Philipp Fent, Alexander van Renen, Andreas Kipf, Viktor Leis, Thomas Neumann, A. Kemper","doi":"10.1109/ICDE48307.2020.00131","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00131","url":null,"abstract":"While hardware and software improvements greatly accelerated modern database systems’ internal operations, the decades-old stream-based Socket API for external communication is still unchanged. We show experimentally, that for modern high-performance systems networking has become a performance bottleneck. Therefore, we argue that the communication stack needs to be redesigned to fully exploit modern hardware—as has already happened to most other database system components.We propose L5, a high-performance communication layer for database systems. L5 rethinks the flow of data in and out of the database system and is based on direct memory access techniques for intra-datacenter (RDMA) and intra-machine communication (Shared Memory). With L5, we provide a building block to accelerate ODBC-like interfaces with a unified and message-based communication framework. Our results show that using interconnects like RDMA (InfiniBand), RoCE (Ethernet), and Shared Memory (IPC), L5 can largely eliminate the network bottleneck for database systems.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"1477-1488"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88795461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Demythization of Structural XML Query Processing: Comparison of Holistic and Binary Approaches (Extended Abstract) 结构化XML查询处理的非神话化:整体与二元方法的比较(扩展摘要)
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00234
Petr Lukáš, Radim Bača, M. Krátký, T. Ling
Structural XQuery and XPath queries are often modeled by twig pattern queries (TPQs) specifying predicates on XML nodes and structural relationships to be satisfied between them. This paper considers a TPQ model extended by a specification of output and non-output query nodes since it complies with the XQuery and XPath semantics. There are two types of TPQ processing approaches: binary joins and holistic joins. The binary joins utilize a query plan of interconnected binary operators, whereas the holistic joins are based on one complex operator to process the whole query. In the recent years, the holistic joins have been considered as the state-of-the-art TPQ processing method. However, a thorough analytical and experimental comparison of binary and holistic joins has been missing despite an enormous research effort in this area. In this paper, we try to fill this gap. We introduce several improvements of the binary join operators which enable us to build a so-called fully-pipelined (FP) query plan for any TPQ with the specification of output and non-output query nodes. We analytically show that, for a class of queries, the proposed approach has the same time and space complexity as holistic joins, and we experimentally demonstrate that the proposed approach outperforms holistic joins in many cases.
结构XQuery和XPath查询通常通过细枝模式查询(tpq)建模,该查询指定XML节点上的谓词以及它们之间要满足的结构关系。本文考虑由输出和非输出查询节点规范扩展的TPQ模型,因为它符合XQuery和XPath语义。有两种TPQ处理方法:二元连接和整体连接。二元连接使用相互连接的二元操作符的查询计划,而整体连接则基于一个复杂操作符来处理整个查询。近年来,整体连接被认为是最先进的TPQ处理方法。然而,尽管在这一领域进行了大量的研究,但对二元连接和整体连接进行了全面的分析和实验比较。在本文中,我们试图填补这一空白。我们介绍了对二进制连接操作符的一些改进,这些操作符使我们能够为具有输出和非输出查询节点规范的任何TPQ构建所谓的全流水线(FP)查询计划。我们通过分析表明,对于一类查询,所提出的方法具有与整体连接相同的时间和空间复杂性,并且我们通过实验证明,所提出的方法在许多情况下优于整体连接。
{"title":"Demythization of Structural XML Query Processing: Comparison of Holistic and Binary Approaches (Extended Abstract)","authors":"Petr Lukáš, Radim Bača, M. Krátký, T. Ling","doi":"10.1109/ICDE48307.2020.00234","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00234","url":null,"abstract":"Structural XQuery and XPath queries are often modeled by twig pattern queries (TPQs) specifying predicates on XML nodes and structural relationships to be satisfied between them. This paper considers a TPQ model extended by a specification of output and non-output query nodes since it complies with the XQuery and XPath semantics. There are two types of TPQ processing approaches: binary joins and holistic joins. The binary joins utilize a query plan of interconnected binary operators, whereas the holistic joins are based on one complex operator to process the whole query. In the recent years, the holistic joins have been considered as the state-of-the-art TPQ processing method. However, a thorough analytical and experimental comparison of binary and holistic joins has been missing despite an enormous research effort in this area. In this paper, we try to fill this gap. We introduce several improvements of the binary join operators which enable us to build a so-called fully-pipelined (FP) query plan for any TPQ with the specification of output and non-output query nodes. We analytically show that, for a class of queries, the proposed approach has the same time and space complexity as holistic joins, and we experimentally demonstrate that the proposed approach outperforms holistic joins in many cases.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"124 1","pages":"2030-2031"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77261566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Matrix Profile XVII: Indexing the Matrix Profile to Allow Arbitrary Range Queries 矩阵配置文件XVII:索引矩阵配置文件以允许任意范围查询
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00185
Yan Zhu, Chin-Chia Michael Yeh, Zachary Zimmerman, Eamonn J. Keogh
Since its introduction several years ago, the Matrix Profile has received significant attention for two reasons. First, it is a very general representation, allowing for the discovery of time series motifs, discords, chains, joins, shapelets, segmentations etc. Secondly, it can be computed very efficiently, allowing for fast exact computation and ultra-fast approximate computation. For analysts that use the Matrix Profile frequently, its incremental computability means that they can perform ad-hoc analytics at any time, with almost no delay time. However, they can only issue global queries. That is, queries that consider all the data from time zero to the current time. This is a significant limitation, as they may be interested in localized questions about a contiguous subset of the data. For example, "do we have any unusual motifs that correspond with that unusually cool summer two years ago". Such ad-hoc queries would require recomputing the Matrix Profile for the time period in question. This is not an untenable computation, but it could not be done in interactive time. In this work we introduce a novel indexing framework that allows queries about arbitrary ranges to be answered in quasilinear time, allowing such queries to be interactive for the first time.
自从几年前推出以来,矩阵概要由于两个原因受到了极大的关注。首先,它是一个非常普遍的表示,允许发现时间序列的图案,不和谐,链,连接,小块,分割等。其次,它可以非常高效地计算,允许快速精确计算和超快速近似计算。对于经常使用Matrix Profile的分析人员来说,它的增量可计算性意味着他们可以在任何时候执行特别的分析,几乎没有延迟时间。但是,它们只能发出全局查询。也就是说,查询要考虑从时间0到当前时间的所有数据。这是一个重要的限制,因为他们可能对关于数据的连续子集的本地化问题感兴趣。例如,“我们是否有任何与两年前那个异常凉爽的夏天相对应的不寻常的主题?”这种临时查询需要重新计算所讨论的时间段的Matrix Profile。这不是一个站不住脚的计算,但它不能在交互时间内完成。在这项工作中,我们引入了一种新的索引框架,允许在拟线性时间内回答任意范围的查询,从而首次允许此类查询是交互式的。
{"title":"Matrix Profile XVII: Indexing the Matrix Profile to Allow Arbitrary Range Queries","authors":"Yan Zhu, Chin-Chia Michael Yeh, Zachary Zimmerman, Eamonn J. Keogh","doi":"10.1109/ICDE48307.2020.00185","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00185","url":null,"abstract":"Since its introduction several years ago, the Matrix Profile has received significant attention for two reasons. First, it is a very general representation, allowing for the discovery of time series motifs, discords, chains, joins, shapelets, segmentations etc. Secondly, it can be computed very efficiently, allowing for fast exact computation and ultra-fast approximate computation. For analysts that use the Matrix Profile frequently, its incremental computability means that they can perform ad-hoc analytics at any time, with almost no delay time. However, they can only issue global queries. That is, queries that consider all the data from time zero to the current time. This is a significant limitation, as they may be interested in localized questions about a contiguous subset of the data. For example, \"do we have any unusual motifs that correspond with that unusually cool summer two years ago\". Such ad-hoc queries would require recomputing the Matrix Profile for the time period in question. This is not an untenable computation, but it could not be done in interactive time. In this work we introduce a novel indexing framework that allows queries about arbitrary ranges to be answered in quasilinear time, allowing such queries to be interactive for the first time.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"7 1","pages":"1846-1849"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87082281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks 基于gpu的大型稀疏网络稀疏矩阵乘法优化
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00085
Jong-Seop Lee, Seokwon Kang, Yongseung Yu, Yong-Yeon Jo, Sang-Wook Kim, Yongjun Park
Sparse matrix multiplication (spGEMM) is widely used to analyze the sparse network data, and extract important information based on matrix representation. As it contains a high degree of data parallelism, many efficient implementations using data-parallel programming platforms such as CUDA and OpenCL have been introduced on graphic processing units (GPUs). Several well-known spGEMM techniques, such as cuS- PARSE and CUSP, often do not utilize the GPU resources fully, owing to the load imbalance between threads in the expansion process and high memory contention in the merge process. Furthermore, even though several outer-product-based spGEMM techniques are proposed to solve the load balancing problem on expansion, they still do not utilize the GPU resources fully, because severe computation load variations exist among the multiple thread blocks.To solve these challenges, this paper proposes a new optimization pass called Block Reorganizer, which balances the total computations of each computing unit on target GPUs, based on the outer-product-based expansion process, and reduces the memory pressure during the merge process. For expansion, it first identifies the actual computation amount for each block, and then performs two thread block transformation processes based on their characteristics: 1) B-Splitting to transform a heavy-computation blocks into multiple small blocks and 2) B- Gathering to aggregate multiple small-computation blocks to a larger block. While merging, it improves the overall performance by performing B-Limiting to limit the number of blocks on each computing unit. Experimental results show that it improves the total performance of kernel execution by 1.43x, on an average, when compared to the row-product-based spGEMM, for NVIDIA Titan Xp GPUs on real-world datasets.
稀疏矩阵乘法(spGEMM)被广泛用于分析稀疏网络数据,并基于矩阵表示提取重要信息。由于它包含高度的数据并行性,许多使用数据并行编程平台(如CUDA和OpenCL)的高效实现已经在图形处理单元(gpu)上引入。一些著名的spGEMM技术,如cu - PARSE和CUSP,由于扩展过程中线程之间的负载不平衡和合并过程中内存的高争用,往往不能充分利用GPU资源。此外,尽管提出了几种基于外部产品的spGEMM技术来解决扩展时的负载平衡问题,但它们仍然不能充分利用GPU资源,因为多个线程块之间存在严重的计算负载变化。为了解决这些问题,本文提出了一种新的优化通道Block Reorganizer,该优化通道基于基于外部产品的扩展过程,平衡目标gpu上每个计算单元的总计算量,并减少合并过程中的内存压力。对于扩展,首先确定每个块的实际计算量,然后根据其特点进行两个线程块转换过程:1)B- splitting将一个大计算块转换为多个小计算块;2)B- Gathering将多个小计算块聚合为一个更大的块。在合并时,它通过执行b限制来限制每个计算单元上的块数量,从而提高了整体性能。实验结果表明,与基于行产品的spGEMM相比,NVIDIA Titan Xp gpu在真实数据集上的总内核执行性能平均提高了1.43倍。
{"title":"Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks","authors":"Jong-Seop Lee, Seokwon Kang, Yongseung Yu, Yong-Yeon Jo, Sang-Wook Kim, Yongjun Park","doi":"10.1109/ICDE48307.2020.00085","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00085","url":null,"abstract":"Sparse matrix multiplication (spGEMM) is widely used to analyze the sparse network data, and extract important information based on matrix representation. As it contains a high degree of data parallelism, many efficient implementations using data-parallel programming platforms such as CUDA and OpenCL have been introduced on graphic processing units (GPUs). Several well-known spGEMM techniques, such as cuS- PARSE and CUSP, often do not utilize the GPU resources fully, owing to the load imbalance between threads in the expansion process and high memory contention in the merge process. Furthermore, even though several outer-product-based spGEMM techniques are proposed to solve the load balancing problem on expansion, they still do not utilize the GPU resources fully, because severe computation load variations exist among the multiple thread blocks.To solve these challenges, this paper proposes a new optimization pass called Block Reorganizer, which balances the total computations of each computing unit on target GPUs, based on the outer-product-based expansion process, and reduces the memory pressure during the merge process. For expansion, it first identifies the actual computation amount for each block, and then performs two thread block transformation processes based on their characteristics: 1) B-Splitting to transform a heavy-computation blocks into multiple small blocks and 2) B- Gathering to aggregate multiple small-computation blocks to a larger block. While merging, it improves the overall performance by performing B-Limiting to limit the number of blocks on each computing unit. Experimental results show that it improves the total performance of kernel execution by 1.43x, on an average, when compared to the row-product-based spGEMM, for NVIDIA Titan Xp GPUs on real-world datasets.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"122 1","pages":"925-936"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87691473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
SCLPD: Smart Cargo Loading Plan Decision Framework 智能货物装载计划决策框架
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00163
Jiaye Liu, Jiali Mao, Jiajun Liao, Huiqi Hu, Ye Guo, Aoying Zhou
The rapid development of steel logistics industry still has not effectively address such issues as truck overload and order overdue as well as cargo overstock. One of the reasons lie in limited number of trucks for transporting large scale cargos. More importantly, traditional methods attend to distribute cargos to trucks with the aim of maximizing the loading of each truck. But they ignore the priority level of orders and the expiration date of cargos stored in the warehouses, which have critical influences on profits of steel logistics industry. Hence, it necessitates an appropriate cargo distribution mechanism under the precondition of limited transportation capacity resources, to guarantee the maximization of delivery proportion for high-priority cargos. Recently, tremendous logistics data has been produced and are being in constant increment hourly in steel logistics platform. However, there is no existing solution to transform such data into actionable scheme to improve cargo distributing effectiveness. This paper puts forward a system implementation of smart cargo loading plan decision framework (SCLPD for short) for steel logistics industry. Through analysis on numerous real data cargo loading plan and inventory of warehouse, some important rules related to cargo distribution process are extracted. Additionally, consider that different amounts of trucks arriving in different time periods, based on adaptive time window model, a two- layer searching mechanism consisting of a genetic algorithm and A* algorithm is designed to ensure global optimization of cargo loading plan for the trucks in all time periods. In our demonstration, we illustrate the procedure of matching for cargos and trucks in various time windows, and showcase the comparison experimental results between the traditional method and SCLPD by the measurement of delivery proportion for high- priority cargos. The effectiveness and practicality of SCLPD enables efficient cargo loading plan generation, to meet the real- world requirements from steel logistics platform.
钢铁物流业的快速发展,仍然没有有效解决卡车超载、订单逾期、货物积压等问题。其中一个原因是运输大型货物的卡车数量有限。更重要的是,传统的方法是将货物分配到卡车上,目的是使每辆卡车的载货量最大化。但他们忽略了订单的优先级和仓库中货物的有效期,这对钢铁物流行业的利润有着至关重要的影响。因此,在运输能力资源有限的前提下,需要合理的货物分配机制,以保证高优先级货物的配送比例最大化。近年来,钢铁物流平台产生了大量的物流数据,并且每小时都在不断增加。然而,目前还没有将这些数据转化为可操作的方案来提高货物配送效率的解决方案。提出了一种面向钢铁物流行业的智能装货计划决策框架(简称SCLPD)的系统实现方案。通过对仓库装货计划和库存的大量实际数据的分析,提取出与货物配送过程相关的一些重要规律。此外,考虑到不同时段货车到达量的不同,基于自适应时间窗模型,设计了由遗传算法和a *算法组成的两层搜索机制,确保货车在各个时段的装货计划全局最优。在我们的演示中,我们说明了货物和卡车在不同时间窗口的匹配过程,并通过测量高优先级货物的交付比例,展示了传统方法与SCLPD方法的对比实验结果。该方法的有效性和实用性使其能够高效地生成货物装载计划,以满足钢铁物流平台的实际需求。
{"title":"SCLPD: Smart Cargo Loading Plan Decision Framework","authors":"Jiaye Liu, Jiali Mao, Jiajun Liao, Huiqi Hu, Ye Guo, Aoying Zhou","doi":"10.1109/ICDE48307.2020.00163","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00163","url":null,"abstract":"The rapid development of steel logistics industry still has not effectively address such issues as truck overload and order overdue as well as cargo overstock. One of the reasons lie in limited number of trucks for transporting large scale cargos. More importantly, traditional methods attend to distribute cargos to trucks with the aim of maximizing the loading of each truck. But they ignore the priority level of orders and the expiration date of cargos stored in the warehouses, which have critical influences on profits of steel logistics industry. Hence, it necessitates an appropriate cargo distribution mechanism under the precondition of limited transportation capacity resources, to guarantee the maximization of delivery proportion for high-priority cargos. Recently, tremendous logistics data has been produced and are being in constant increment hourly in steel logistics platform. However, there is no existing solution to transform such data into actionable scheme to improve cargo distributing effectiveness. This paper puts forward a system implementation of smart cargo loading plan decision framework (SCLPD for short) for steel logistics industry. Through analysis on numerous real data cargo loading plan and inventory of warehouse, some important rules related to cargo distribution process are extracted. Additionally, consider that different amounts of trucks arriving in different time periods, based on adaptive time window model, a two- layer searching mechanism consisting of a genetic algorithm and A* algorithm is designed to ensure global optimization of cargo loading plan for the trucks in all time periods. In our demonstration, we illustrate the procedure of matching for cargos and trucks in various time windows, and showcase the comparison experimental results between the traditional method and SCLPD by the measurement of delivery proportion for high- priority cargos. The effectiveness and practicality of SCLPD enables efficient cargo loading plan generation, to meet the real- world requirements from steel logistics platform.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"1758-1761"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87767058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
TransN: Heterogeneous Network Representation Learning by Translating Node Embeddings TransN:通过翻译节点嵌入的异构网络表示学习
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00057
Zijian Li, Wenhao Zheng, Xueling Lin, Ziyuan Zhao, Zhe Wang, Yue Wang, Xun Jian, Lei Chen, Qiang Yan, Tiezheng Mao
Learning network embeddings has attracted growing attention in recent years. However, most of the existing methods focus on homogeneous networks, which cannot capture the important type information in heterogeneous networks. To address this problem, in this paper, we propose TransN, a novel multi-view network embedding framework for heterogeneous networks. Compared with the existing methods, TransN is an unsupervised framework which does not require node labels or user-specified meta-paths as inputs. In addition, TransN is capable of handling more general types of heterogeneous networks than the previous works. Specifically, in our framework TransN, we propose a novel algorithm to capture the proximity information inside each single view. Moreover, to transfer the learned information across views, we propose an algorithm to translate the node embeddings between different views based on the dual-learning mechanism, which can both capture the complex relations between node embeddings in different views, and preserve the proximity information inside each view during the translation. We conduct extensive experiments on real-world heterogeneous networks, whose results demonstrate that the node embeddings generated by TransN outperform those of competitors in various network mining tasks.
学习网络嵌入近年来引起了越来越多的关注。然而,现有的方法大多集中在同构网络上,无法捕获异构网络中的重要类型信息。为了解决这个问题,本文提出了TransN,一种新的异构网络多视图网络嵌入框架。与现有方法相比,TransN是一个无监督框架,不需要节点标签或用户指定的元路径作为输入。此外,TransN能够处理比以前的作品更一般类型的异构网络。具体来说,在我们的TransN框架中,我们提出了一种新的算法来捕获每个单个视图中的接近信息。此外,为了跨视图传递学习到的信息,我们提出了一种基于双学习机制的不同视图间节点嵌入转换算法,该算法既能捕获不同视图中节点嵌入之间的复杂关系,又能在转换过程中保留每个视图内部的接近性信息。我们在真实的异构网络上进行了大量的实验,结果表明TransN生成的节点嵌入在各种网络挖掘任务中优于竞争对手。
{"title":"TransN: Heterogeneous Network Representation Learning by Translating Node Embeddings","authors":"Zijian Li, Wenhao Zheng, Xueling Lin, Ziyuan Zhao, Zhe Wang, Yue Wang, Xun Jian, Lei Chen, Qiang Yan, Tiezheng Mao","doi":"10.1109/ICDE48307.2020.00057","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00057","url":null,"abstract":"Learning network embeddings has attracted growing attention in recent years. However, most of the existing methods focus on homogeneous networks, which cannot capture the important type information in heterogeneous networks. To address this problem, in this paper, we propose TransN, a novel multi-view network embedding framework for heterogeneous networks. Compared with the existing methods, TransN is an unsupervised framework which does not require node labels or user-specified meta-paths as inputs. In addition, TransN is capable of handling more general types of heterogeneous networks than the previous works. Specifically, in our framework TransN, we propose a novel algorithm to capture the proximity information inside each single view. Moreover, to transfer the learned information across views, we propose an algorithm to translate the node embeddings between different views based on the dual-learning mechanism, which can both capture the complex relations between node embeddings in different views, and preserve the proximity information inside each view during the translation. We conduct extensive experiments on real-world heterogeneous networks, whose results demonstrate that the node embeddings generated by TransN outperform those of competitors in various network mining tasks.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"65 1","pages":"589-600"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86465642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
TIDY: Publishing a Time Interval Dataset with Differential Privacy (Extended abstract) 发布具有差分隐私的时间间隔数据集(扩展摘要)
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00229
Woohwan Jung, Suyong Kwon, Kyuseok Shim
Log data from mobile devices usually contain a series of events with time intervals. However, the problem of releasing differentially private time interval data has not been tackled yet. We propose the TIDY (publishing Time Intervals via Differential privacY) algorithm to release time interval data under differential privacy. We use the frequency vectors as a compact representation of the time interval data to reduce the aggregated noise. We also develop a new partitioning method adapted for the frequency vectors to balance the trade-off between the noise and structural errors. Our experiments confirm that TIDY outperforms the existing algorithms for releasing 2D histograms.
来自移动设备的日志数据通常包含一系列具有时间间隔的事件。但是,差分私有时间间隔数据的释放问题还没有得到解决。我们提出了一种基于差分隐私的时间间隔发布算法(publish Time Intervals via Differential privacY)来发布差分隐私下的时间间隔数据。我们使用频率向量作为时间间隔数据的紧凑表示来减少聚合噪声。我们还开发了一种新的适合频率矢量的划分方法,以平衡噪声和结构误差之间的权衡。我们的实验证实,TIDY优于现有的2D直方图释放算法。
{"title":"TIDY: Publishing a Time Interval Dataset with Differential Privacy (Extended abstract)","authors":"Woohwan Jung, Suyong Kwon, Kyuseok Shim","doi":"10.1109/ICDE48307.2020.00229","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00229","url":null,"abstract":"Log data from mobile devices usually contain a series of events with time intervals. However, the problem of releasing differentially private time interval data has not been tackled yet. We propose the TIDY (publishing Time Intervals via Differential privacY) algorithm to release time interval data under differential privacy. We use the frequency vectors as a compact representation of the time interval data to reduce the aggregated noise. We also develop a new partitioning method adapted for the frequency vectors to balance the trade-off between the noise and structural errors. Our experiments confirm that TIDY outperforms the existing algorithms for releasing 2D histograms.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"06 1","pages":"2020-2021"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85974856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2020 IEEE 36th International Conference on Data Engineering (ICDE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1