首页 > 最新文献

2017 IEEE High Performance Extreme Computing Conference (HPEC)最新文献

英文 中文
Database engine integration and performance analysis of the BigDAWG polystore system BigDAWG多存储系统的数据库引擎集成与性能分析
Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091081
Katherine Yu, V. Gadepally, M. Stonebraker
The BigDAWG polystore database system aims to address workloads dealing with large, heterogeneous datasets. The need for such a system is motivated by an increase in Big Data applications dealing with disparate types of data, from large scale analytics to realtime data streams to text-based records, each suited for different storage engines. These applications often perform cross-engine queries on correlated data, resulting in complex query planning, data migration, and execution. One such application is a medical application built by the Intel Science and Technology Center (ISTC) on data collected from an intensive care unit (ICU). We present work done to add support for two commonly used database engines, Vertica and MySQL, to the BigDAWG system, as well as results and analysis from performance evaluation of the system using the TPC-H benchmark.
BigDAWG多存储数据库系统旨在解决处理大型异构数据集的工作负载。对这样一个系统的需求是由处理不同类型数据的大数据应用程序的增加所驱动的,从大规模分析到实时数据流再到基于文本的记录,每种类型都适合不同的存储引擎。这些应用程序经常对相关数据执行跨引擎查询,从而导致复杂的查询规划、数据迁移和执行。一个这样的应用程序是由英特尔科学与技术中心(ISTC)基于从重症监护病房(ICU)收集的数据构建的医疗应用程序。我们介绍了为BigDAWG系统添加对两种常用数据库引擎(Vertica和MySQL)的支持所做的工作,以及使用TPC-H基准对系统进行性能评估的结果和分析。
{"title":"Database engine integration and performance analysis of the BigDAWG polystore system","authors":"Katherine Yu, V. Gadepally, M. Stonebraker","doi":"10.1109/HPEC.2017.8091081","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091081","url":null,"abstract":"The BigDAWG polystore database system aims to address workloads dealing with large, heterogeneous datasets. The need for such a system is motivated by an increase in Big Data applications dealing with disparate types of data, from large scale analytics to realtime data streams to text-based records, each suited for different storage engines. These applications often perform cross-engine queries on correlated data, resulting in complex query planning, data migration, and execution. One such application is a medical application built by the Intel Science and Technology Center (ISTC) on data collected from an intensive care unit (ICU). We present work done to add support for two commonly used database engines, Vertica and MySQL, to the BigDAWG system, as well as results and analysis from performance evaluation of the system using the TPC-H benchmark.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115617053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Efficient parallel streaming algorithms for large-scale inverse problems 大规模反问题的高效并行流算法
Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091033
H. Sundar
Large-scale inverse problems and uncertainty quantification (UQ), i.e., quantifying uncertainties in complex mathematical models and their large-scale computational implementations, is one of the outstanding challenges in computational science and will be a driver for the acquisition of future supercomputers. These methods generate significant amounts of simulation data that is used by other parts of the computation in a complex fashion, requiring either large inmemory storage and/or redundant computations. We present a streaming algorithm for such computation that achieves high performance without requiring additional in-memory storage or additional computations. By reducing the memory footprint of the application we are able to achieve a significant speedup (∼3×) by operating in a more favorable region of the strong scaling curve.
大规模逆问题和不确定性量化(UQ),即量化复杂数学模型及其大规模计算实现中的不确定性,是计算科学中的突出挑战之一,将成为未来超级计算机获取的驱动因素。这些方法产生大量的模拟数据,这些数据以复杂的方式被其他计算部分使用,需要大量的内存存储和/或冗余计算。我们提出了一种用于此类计算的流算法,该算法无需额外的内存存储或额外的计算即可实现高性能。通过减少应用程序的内存占用,我们能够通过在强缩放曲线的更有利区域中操作来实现显着的加速(~ 3倍)。
{"title":"Efficient parallel streaming algorithms for large-scale inverse problems","authors":"H. Sundar","doi":"10.1109/HPEC.2017.8091033","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091033","url":null,"abstract":"Large-scale inverse problems and uncertainty quantification (UQ), i.e., quantifying uncertainties in complex mathematical models and their large-scale computational implementations, is one of the outstanding challenges in computational science and will be a driver for the acquisition of future supercomputers. These methods generate significant amounts of simulation data that is used by other parts of the computation in a complex fashion, requiring either large inmemory storage and/or redundant computations. We present a streaming algorithm for such computation that achieves high performance without requiring additional in-memory storage or additional computations. By reducing the memory footprint of the application we are able to achieve a significant speedup (∼3×) by operating in a more favorable region of the strong scaling curve.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121941324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An introduction to an array memory processor for application specific acceleration 介绍用于特定应用程序加速的阵列存储器处理器
Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091069
G. Pechanek, N. Pitsianis
In this paper, we introduce an Array Memory (AM) processor. The AM processor uses a shared memory network amenable to on-chip 3D stacking. Node couplings use a 1 to K adjacency of connections in each dimension of communication of an array of nodes, such as an R×C array where R ≥ K and C ≥ K and K is a positive odd integer. This design also provides data sharing between processors within sub-arrays of the R × C array to support high-performance programmable application specific processing. A new instruction set architecture is proposed that has arithmetic instructions that do not require the specification of any source or target operand addresses. The source operands and target values are provided by separate load, store, and arithmetic instructions that are appropriately scheduled with the arithmetic instruction to be executed to reduce the storage of temporary variables for lower power implementations.
本文介绍了一种阵列存储器(AM)处理器。AM处理器使用可适应片上3D堆叠的共享存储器网络。节点耦合在节点数组的通信的每个维度上使用1到K的邻接连接,例如R×C数组,其中R≥K, C≥K, K为正奇数。该设计还提供了R × C阵列子阵列内处理器之间的数据共享,以支持高性能可编程应用特定处理。提出了一种新的指令集结构,它的算术指令不需要指定任何源或目标操作数地址。源操作数和目标值由单独的加载、存储和算术指令提供,这些指令与要执行的算术指令进行适当的调度,以减少用于低功耗实现的临时变量的存储。
{"title":"An introduction to an array memory processor for application specific acceleration","authors":"G. Pechanek, N. Pitsianis","doi":"10.1109/HPEC.2017.8091069","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091069","url":null,"abstract":"In this paper, we introduce an Array Memory (AM) processor. The AM processor uses a shared memory network amenable to on-chip 3D stacking. Node couplings use a 1 to K adjacency of connections in each dimension of communication of an array of nodes, such as an R×C array where R ≥ K and C ≥ K and K is a positive odd integer. This design also provides data sharing between processors within sub-arrays of the R × C array to support high-performance programmable application specific processing. A new instruction set architecture is proposed that has arithmetic instructions that do not require the specification of any source or target operand addresses. The source operands and target values are provided by separate load, store, and arithmetic instructions that are appropriately scheduled with the arithmetic instruction to be executed to reduce the storage of temporary variables for lower power implementations.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117031953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Software-defined extreme scale networks for bigdata applications 面向大数据应用的软件定义极端规模网络
Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091087
Haitham Ghalwash, Chun-Hsi Huang
Software-Defined Networking (SDN) is an emerging technology that supports recent network applications. An SDN redefines networks by introducing the concept of decoupling the control plane from the data plane, thus providing centralized management, programmability, and dynamic reconfiguration. In this research, we specifically investigate the significance of using SDNs in support of Big-Data applications. SDN proved to support Big-Data applications through a more efficient use of distributed nodes. With Hadoop as an example of Big-Data application, we investigate the performance in terms of throughput and execution time for the read/write and sorting operations. The experiments take into consideration different network sizes of a Fat-tree topology. A Hadoop multi-node cluster is installed in Docker containers connected through a Fat-tree of OpenFlow switches. The packet forwarding is either by way of an SDN controller or the normal packet switching rules. Experimental results show that using an SDN controller outperforms normal forwarding by the switches. As a result, our research suggests that using SDN controllers has a strong potential to greatly enhance the performance of Big-Data applications on extreme-scale networks.
软件定义网络(SDN)是一种支持最新网络应用的新兴技术。SDN通过引入将控制平面与数据平面解耦的概念来重新定义网络,从而提供集中管理、可编程性和动态重新配置。在本研究中,我们专门研究了使用sdn支持大数据应用的重要性。事实证明,SDN通过更有效地使用分布式节点来支持大数据应用。以Hadoop作为大数据应用的一个例子,我们从吞吐量和执行时间的角度来研究读写和排序操作的性能。实验考虑了不同网络大小的胖树拓扑结构。Hadoop多节点集群安装在通过OpenFlow交换机Fat-tree连接的Docker容器中。报文转发可以通过SDN控制器,也可以通过正常的报文交换规则。实验结果表明,SDN控制器的转发性能优于交换机的正常转发。因此,我们的研究表明,使用SDN控制器具有极大的潜力,可以极大地提高超大规模网络上大数据应用的性能。
{"title":"Software-defined extreme scale networks for bigdata applications","authors":"Haitham Ghalwash, Chun-Hsi Huang","doi":"10.1109/HPEC.2017.8091087","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091087","url":null,"abstract":"Software-Defined Networking (SDN) is an emerging technology that supports recent network applications. An SDN redefines networks by introducing the concept of decoupling the control plane from the data plane, thus providing centralized management, programmability, and dynamic reconfiguration. In this research, we specifically investigate the significance of using SDNs in support of Big-Data applications. SDN proved to support Big-Data applications through a more efficient use of distributed nodes. With Hadoop as an example of Big-Data application, we investigate the performance in terms of throughput and execution time for the read/write and sorting operations. The experiments take into consideration different network sizes of a Fat-tree topology. A Hadoop multi-node cluster is installed in Docker containers connected through a Fat-tree of OpenFlow switches. The packet forwarding is either by way of an SDN controller or the normal packet switching rules. Experimental results show that using an SDN controller outperforms normal forwarding by the switches. As a result, our research suggests that using SDN controllers has a strong potential to greatly enhance the performance of Big-Data applications on extreme-scale networks.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"5 21","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113955708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Scalable stochastic block partition 可伸缩随机块划分
Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091050
Ahsen J. Uppal, Guy Swope, H. H. Huang, The George Washington
The processing of graph data at large scale, though important and useful for real-world applications, continues to be challenging, particularly for problems such as graph partitioning. Optimal graph partitioning is NP-hard, but several methods provide approximate solutions in reasonable time. Yet scaling these approximate algorithms is also challenging. In this paper, we describe our efforts towards improving the scalability of one such technique, stochastic block partition, which is the baseline algorithm for the IEEE HPEC Graph Challenge [1]. Our key contributions are: improvements to the parallelization of the baseline bottom-up algorithm, especially the Markov Chain Monte Carlo (MCMC) nodal updates for Bayesian inference; a new top-down divide and conquer algorithm capable of reducing the algorithmic complexity of static partitioning and also suitable for streaming partitioning; a parallel single-node multi-CPU implementation and a parallel multi-node MPI implementation. Although our focus is on algorithmic scalability, our Python implementation obtains a speedup of 1.65× over the fastest baseline parallel C++ run at a graph size of 100k vertices divided into 8 subgraphs on a multi-CPU single node machine. It achieves a speedup of 61× over itself on a cluster of 4 machines with 256 CPUs for a 20k node graph divided into 4 subgraphs, and 441× speedup over itself on a 50k node graph divided into 8 subgraphs on a multi-CPU single node machine.
大规模图数据的处理虽然对现实世界的应用程序很重要,也很有用,但仍然具有挑战性,特别是对于图分区这样的问题。最优图划分是np困难的,但有几种方法可以在合理的时间内提供近似解。然而,缩放这些近似算法也具有挑战性。在本文中,我们描述了我们为提高这样一种技术的可扩展性所做的努力,随机块分割是IEEE HPEC图挑战的基线算法[1]。我们的主要贡献是:改进了基线自下而上算法的并行化,特别是用于贝叶斯推理的马尔可夫链蒙特卡罗(MCMC)节点更新;一种新的自上而下的分而治之算法,既能降低静态分区的算法复杂度,又适用于流分区;并行单节点多cpu实现和并行多节点MPI实现。虽然我们关注的是算法的可扩展性,但我们的Python实现在多cpu单节点机器上,在图大小为100k个顶点分为8个子图的情况下,比最快的基线并行c++运行速度提高了1.65倍。对于一个分为4个子图的20k节点图,它在4台机器、256个cpu的集群上实现了61x的加速,对于一个分为8个子图的50k节点图,它在多cpu单节点机器上实现了441x的加速。
{"title":"Scalable stochastic block partition","authors":"Ahsen J. Uppal, Guy Swope, H. H. Huang, The George Washington","doi":"10.1109/HPEC.2017.8091050","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091050","url":null,"abstract":"The processing of graph data at large scale, though important and useful for real-world applications, continues to be challenging, particularly for problems such as graph partitioning. Optimal graph partitioning is NP-hard, but several methods provide approximate solutions in reasonable time. Yet scaling these approximate algorithms is also challenging. In this paper, we describe our efforts towards improving the scalability of one such technique, stochastic block partition, which is the baseline algorithm for the IEEE HPEC Graph Challenge [1]. Our key contributions are: improvements to the parallelization of the baseline bottom-up algorithm, especially the Markov Chain Monte Carlo (MCMC) nodal updates for Bayesian inference; a new top-down divide and conquer algorithm capable of reducing the algorithmic complexity of static partitioning and also suitable for streaming partitioning; a parallel single-node multi-CPU implementation and a parallel multi-node MPI implementation. Although our focus is on algorithmic scalability, our Python implementation obtains a speedup of 1.65× over the fastest baseline parallel C++ run at a graph size of 100k vertices divided into 8 subgraphs on a multi-CPU single node machine. It achieves a speedup of 61× over itself on a cluster of 4 machines with 256 CPUs for a 20k node graph divided into 4 subgraphs, and 441× speedup over itself on a 50k node graph divided into 8 subgraphs on a multi-CPU single node machine.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127653851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A quantitative and qualitative analysis of tensor decompositions on spatiotemporal data 时空数据张量分解的定量和定性分析
Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091028
Thomas Henretty, M. Baskaran, J. Ezick, David Bruns-Smith, T. Simon
With the recent explosion of systems capable of generating and storing large quantities of GPS data, there is an opportunity to develop novel techniques for analyzing and gaining meaningful insights into this spatiotemporal data. In this paper we examine the application of tensor decompositions, a high-dimensional data analysis technique, to georeferenced data sets. Guidance is provided on fitting spatiotemporal data into the tensor model and analyzing the results. We find that tensor decompositions provide insight and that future research into spatiotemporal tensor decompositions for pattern detection, clustering, and anomaly detection is warranted.
随着最近能够生成和存储大量GPS数据的系统的爆炸式增长,有机会开发新的技术来分析和获得对这些时空数据的有意义的见解。在本文中,我们研究了张量分解(一种高维数据分析技术)在地理参考数据集上的应用。为将时空数据拟合到张量模型中并分析结果提供了指导。我们发现张量分解为模式检测、聚类和异常检测的时空张量分解提供了见解,未来的研究是有必要的。
{"title":"A quantitative and qualitative analysis of tensor decompositions on spatiotemporal data","authors":"Thomas Henretty, M. Baskaran, J. Ezick, David Bruns-Smith, T. Simon","doi":"10.1109/HPEC.2017.8091028","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091028","url":null,"abstract":"With the recent explosion of systems capable of generating and storing large quantities of GPS data, there is an opportunity to develop novel techniques for analyzing and gaining meaningful insights into this spatiotemporal data. In this paper we examine the application of tensor decompositions, a high-dimensional data analysis technique, to georeferenced data sets. Guidance is provided on fitting spatiotemporal data into the tensor model and analyzing the results. We find that tensor decompositions provide insight and that future research into spatiotemporal tensor decompositions for pattern detection, clustering, and anomaly detection is warranted.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115825239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Distributed-memory fast maximal independent set 分布式内存快速最大独立集
Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091032
Thejaka Amila Kanewala, Marcin Zalewski, A. Lumsdaine
The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby's seminal MIS algorithms, "Luby(A)" and "Luby(B)," to distributed-memory execution, and we evaluate their performance. We compare our results with the "Filtered MIS" implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.
极大独立集(MIS)图问题在计算机视觉、信息论、分子生物学和过程调度等领域有广泛的应用。管理信息系统问题的规模日益扩大,这表明使用分布式内存硬件作为提供必要的计算和内存资源的一种经济有效的方法。Luby提出了四种随机算法来解决MIS问题。所有这些算法都是围绕共享内存机器设计的,并使用PRAM模型进行了分析。这些算法没有直接高效的分布式内存实现。在本文中,我们将Luby的两个开创性的MIS算法“Luby(A)”和“Luby(B)”扩展到分布式内存执行,并评估了它们的性能。对于两种类型的合成图输入,我们将我们的结果与组合BLAS库中的“过滤MIS”实现进行比较。
{"title":"Distributed-memory fast maximal independent set","authors":"Thejaka Amila Kanewala, Marcin Zalewski, A. Lumsdaine","doi":"10.1109/HPEC.2017.8091032","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091032","url":null,"abstract":"The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby's seminal MIS algorithms, \"Luby(A)\" and \"Luby(B),\" to distributed-memory execution, and we evaluate their performance. We compare our results with the \"Filtered MIS\" implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117233116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
xDCI, a data science cyberinfrastructure for interdisciplinary research xDCI,用于跨学科研究的数据科学网络基础设施
Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091022
A. Krishnamurthy, K. Bradford, Chris Calloway, C. Castillo, Mike C. Conway, Jason Coposky, Yue Guo, R. Idaszak, W. Lenhardt, K. Robasky, Terrell G. Russell, Erik Scott, Marcin Sliwowski, M. Stealey, Kelsey Urgo, Hao Xu, H. Yi, S. Ahalt
This paper introduces xDCI, a Data Science Cyber-infrastructure to support research in a number of scientific domains including genomics, environmental science, biomedical and health science, and social science. xDCI leverages open-source software packages such as the integrated Rule Oriented Data System and the CyVerse Discovery Environment to address significant challenges in data storage, sharing, analysis and visualization. We provide three example applications to evaluate xDCI for different domains: analysis of 3D images of mice brains, videos analysis of neonatal resuscitation, and risk analytics. Finally, we conclude with a discussion of potential improvements to xDCI.
本文介绍了xDCI,一种数据科学网络基础设施,用于支持许多科学领域的研究,包括基因组学、环境科学、生物医学和健康科学以及社会科学。xDCI利用开源软件包,如集成的面向规则的数据系统和CyVerse发现环境,来解决数据存储、共享、分析和可视化方面的重大挑战。我们提供了三个例子来评估xDCI在不同领域的应用:小鼠大脑3D图像分析、新生儿复苏视频分析和风险分析。最后,我们将讨论xDCI的潜在改进。
{"title":"xDCI, a data science cyberinfrastructure for interdisciplinary research","authors":"A. Krishnamurthy, K. Bradford, Chris Calloway, C. Castillo, Mike C. Conway, Jason Coposky, Yue Guo, R. Idaszak, W. Lenhardt, K. Robasky, Terrell G. Russell, Erik Scott, Marcin Sliwowski, M. Stealey, Kelsey Urgo, Hao Xu, H. Yi, S. Ahalt","doi":"10.1109/HPEC.2017.8091022","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091022","url":null,"abstract":"This paper introduces xDCI, a Data Science Cyber-infrastructure to support research in a number of scientific domains including genomics, environmental science, biomedical and health science, and social science. xDCI leverages open-source software packages such as the integrated Rule Oriented Data System and the CyVerse Discovery Environment to address significant challenges in data storage, sharing, analysis and visualization. We provide three example applications to evaluate xDCI for different domains: analysis of 3D images of mice brains, videos analysis of neonatal resuscitation, and risk analytics. Finally, we conclude with a discussion of potential improvements to xDCI.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114192964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Memory-efficient parallel tensor decompositions 内存高效并行张量分解
Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091026
M. Baskaran, Thomas Henretty, B. Pradelle, M. H. Langston, David Bruns-Smith, J. Ezick, R. Lethin
Tensor decompositions are a powerful technique for enabling comprehensive and complete analysis of real-world data. Data analysis through tensor decompositions involves intensive computations over large-scale irregular sparse data. Optimizing the execution of such data intensive computations is key to reducing the time-to-solution (or response time) in real-world data analysis applications. As high-performance computing (HPC) systems are increasingly used for data analysis applications, it is becoming increasingly important to optimize sparse tensor computations and execute them efficiently on modern and advanced HPC systems. In addition to utilizing the large processing capability of HPC systems, it is crucial to improve memory performance (memory usage, communication, synchronization, memory reuse, and data locality) in HPC systems. In this paper, we present multiple optimizations that are targeted towards faster and memory-efficient execution of large-scale tensor analysis on HPC systems. We demonstrate that our techniques achieve reduction in memory usage and execution time of tensor decomposition methods when they are applied on multiple datasets of varied size and structure from different application domains. We achieve up to 11× reduction in memory usage and up to 7× improvement in performance. More importantly, we enable the application of large tensor decompositions on some important datasets on a multi-core system that would not have been feasible without our optimization.
张量分解是一种强大的技术,可以对现实世界的数据进行全面和完整的分析。通过张量分解进行数据分析需要对大规模不规则稀疏数据进行密集计算。在实际数据分析应用程序中,优化这种数据密集型计算的执行是减少到解决方案的时间(或响应时间)的关键。随着高性能计算(HPC)系统越来越多地用于数据分析应用,优化稀疏张量计算并在现代和先进的HPC系统上高效执行变得越来越重要。除了利用HPC系统的大型处理能力之外,提高HPC系统中的内存性能(内存使用、通信、同步、内存重用和数据位置)也至关重要。在本文中,我们提出了针对HPC系统上大规模张量分析的更快和内存效率执行的多种优化。我们证明,当我们的技术应用于来自不同应用领域的不同大小和结构的多个数据集时,我们的技术可以减少张量分解方法的内存使用和执行时间。我们实现了高达11倍的内存使用减少和高达7倍的性能提高。更重要的是,我们能够在多核系统上的一些重要数据集上应用大张量分解,如果没有我们的优化,这些数据集是不可行的。
{"title":"Memory-efficient parallel tensor decompositions","authors":"M. Baskaran, Thomas Henretty, B. Pradelle, M. H. Langston, David Bruns-Smith, J. Ezick, R. Lethin","doi":"10.1109/HPEC.2017.8091026","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091026","url":null,"abstract":"Tensor decompositions are a powerful technique for enabling comprehensive and complete analysis of real-world data. Data analysis through tensor decompositions involves intensive computations over large-scale irregular sparse data. Optimizing the execution of such data intensive computations is key to reducing the time-to-solution (or response time) in real-world data analysis applications. As high-performance computing (HPC) systems are increasingly used for data analysis applications, it is becoming increasingly important to optimize sparse tensor computations and execute them efficiently on modern and advanced HPC systems. In addition to utilizing the large processing capability of HPC systems, it is crucial to improve memory performance (memory usage, communication, synchronization, memory reuse, and data locality) in HPC systems. In this paper, we present multiple optimizations that are targeted towards faster and memory-efficient execution of large-scale tensor analysis on HPC systems. We demonstrate that our techniques achieve reduction in memory usage and execution time of tensor decomposition methods when they are applied on multiple datasets of varied size and structure from different application domains. We achieve up to 11× reduction in memory usage and up to 7× improvement in performance. More importantly, we enable the application of large tensor decompositions on some important datasets on a multi-core system that would not have been feasible without our optimization.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126164376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
A distributed algorithm for the efficient computation of the unified model of social influence on massive datasets 一种高效计算海量数据集社会影响统一模型的分布式算法
Pub Date : 2017-09-01 DOI: 10.1109/HPEC.2017.8091084
Alex Popa, M. Frîncu, C. Chelmis
Online social networks offer a rich data source for analyzing diffusion processes including rumor and viral spreading in communities. While many models exist, a unified model which enables analytical computation of complex, nonlinear phenomena while considering multiple factors was only recently proposed. We design an optimized implementation of the unified model of influence for vertex centric graph processing distributed platforms such as Apache Giraph. We validate and test the weak and strong scalability of our implementation on a Google Cloud Platform Hadoop and a Giraph installation using two real datasets. Results show a ∼3.2× performance improvement over the single node runtime on the same platform. We also analyze the cost of achieving this speedup on public clouds as well as the impact of the underlying platform and the requirement of having more distributed nodes to process the same dataset as compared to a shared memory system.
在线社交网络为分析谣言和病毒在社区中的传播过程提供了丰富的数据源。虽然存在许多模型,但一个统一的模型能够在考虑多种因素的情况下对复杂的非线性现象进行分析计算,直到最近才被提出。我们为Apache Giraph等以顶点为中心的分布式图形处理平台设计了统一影响模型的优化实现。我们使用两个真实的数据集在Google Cloud Platform Hadoop和Giraph安装上验证和测试了我们实现的弱和强可扩展性。结果显示,在同一平台上,与单节点运行时相比,性能提高了约3.2倍。我们还分析了在公共云上实现这种加速的成本,以及底层平台的影响,以及与共享内存系统相比,拥有更多分布式节点来处理相同数据集的需求。
{"title":"A distributed algorithm for the efficient computation of the unified model of social influence on massive datasets","authors":"Alex Popa, M. Frîncu, C. Chelmis","doi":"10.1109/HPEC.2017.8091084","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091084","url":null,"abstract":"Online social networks offer a rich data source for analyzing diffusion processes including rumor and viral spreading in communities. While many models exist, a unified model which enables analytical computation of complex, nonlinear phenomena while considering multiple factors was only recently proposed. We design an optimized implementation of the unified model of influence for vertex centric graph processing distributed platforms such as Apache Giraph. We validate and test the weak and strong scalability of our implementation on a Google Cloud Platform Hadoop and a Giraph installation using two real datasets. Results show a ∼3.2× performance improvement over the single node runtime on the same platform. We also analyze the cost of achieving this speedup on public clouds as well as the impact of the underlying platform and the requirement of having more distributed nodes to process the same dataset as compared to a shared memory system.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125980887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2017 IEEE High Performance Extreme Computing Conference (HPEC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1