首页 > 最新文献

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)最新文献

英文 中文
Indexing Blocks to Reduce Space and Time Requirements for Searching Large Data Files 索引块减少搜索大数据文件的空间和时间要求
Tzu-Hsien Wu, Hao Shyng, J. Chou, Bin Dong, Kesheng Wu
Scientific discoveries are increasingly relying on analysis of massive amounts of data generated from scientific experiments, observations, and simulations. The ability to directly access the most relevant data records, without shifting through all of them becomes essential. While many indexing techniques have been developed to quickly locate the selected data records, the time and space required for building and storing these indexes are often too expensive to meet the demands of in situ or real-time data analysis. Existing indexing methods generally capture information about each individual data record, however, when reading a data record, the I/O system typically has to access a block or a page of data. In this work, we postulate that indexing blocks instead of individual data records could significantly reduce index size and index building time without increasing the I/O time for accessing the selected data records. Our experiments using multiple real datasets on a supercomputer show that block index can reduce query time by a factor of 2 to 50 over other existing methods, including SciDB and FastQuery. But the size of block index is almost negligible comparing to the data size, and the time of building index can reach the peak I/O speed.
科学发现越来越依赖于对科学实验、观察和模拟产生的大量数据的分析。直接访问最相关的数据记录的能力变得至关重要,而不需要在所有这些记录之间进行切换。虽然已经开发了许多索引技术来快速定位选定的数据记录,但是构建和存储这些索引所需的时间和空间往往过于昂贵,无法满足现场或实时数据分析的需求。现有的索引方法通常捕获关于每个单独数据记录的信息,但是,在读取数据记录时,I/O系统通常必须访问数据块或数据页。在这项工作中,我们假设索引块而不是单个数据记录可以显著减少索引大小和索引构建时间,而不会增加访问所选数据记录的I/O时间。我们在超级计算机上使用多个真实数据集进行的实验表明,块索引比其他现有方法(包括SciDB和FastQuery)可以减少2到50倍的查询时间。但是块索引的大小与数据大小相比几乎可以忽略不计,并且索引的构建时间可以达到峰值I/O速度。
{"title":"Indexing Blocks to Reduce Space and Time Requirements for Searching Large Data Files","authors":"Tzu-Hsien Wu, Hao Shyng, J. Chou, Bin Dong, Kesheng Wu","doi":"10.1109/CCGrid.2016.18","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.18","url":null,"abstract":"Scientific discoveries are increasingly relying on analysis of massive amounts of data generated from scientific experiments, observations, and simulations. The ability to directly access the most relevant data records, without shifting through all of them becomes essential. While many indexing techniques have been developed to quickly locate the selected data records, the time and space required for building and storing these indexes are often too expensive to meet the demands of in situ or real-time data analysis. Existing indexing methods generally capture information about each individual data record, however, when reading a data record, the I/O system typically has to access a block or a page of data. In this work, we postulate that indexing blocks instead of individual data records could significantly reduce index size and index building time without increasing the I/O time for accessing the selected data records. Our experiments using multiple real datasets on a supercomputer show that block index can reduce query time by a factor of 2 to 50 over other existing methods, including SciDB and FastQuery. But the size of block index is almost negligible comparing to the data size, and the time of building index can reach the peak I/O speed.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126227386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics 面向大数据分析的内存优化数据洗牌模式
Bogdan Nicolae, Carlos H. A. Costa, Claudia Misale, K. Katrinis, Yoonho Park
Big data analytics is an indispensable tool in transforming science, engineering, medicine, healthcare, finance and ultimately business itself. With the explosion of data sizes and need for shorter time-to-solution, in-memory platforms such as Apache Spark gain increasing popularity. However, this introduces important challenges, among which data shuffling is particularly difficult: on one hand it is a key part of the computation that has a major impact on the overall performance and scalability so its efficiency is paramount, while on the other hand it needs to operate with scarce memory in order to leave as much memory available for data caching. In this context, efficient scheduling of data transfers such that it addresses both dimensions of the problem simultaneously is non-trivial. State-of-the-art solutions often rely on simple approaches that yield sub optimal performance and resource usage. This paper contributes a novel shuffle data transfer strategy that dynamically adapts to the computation with minimal memory utilization, which we briefly underline as a series of design principles.
大数据分析是改变科学、工程、医学、医疗保健、金融乃至商业本身的不可或缺的工具。随着数据大小的爆炸式增长和对更短的解决方案时间的需求,内存平台(如Apache Spark)越来越受欢迎。然而,这带来了重要的挑战,其中数据变换尤其困难:一方面,它是对整体性能和可伸缩性有重大影响的计算的关键部分,因此其效率至关重要,而另一方面,它需要使用稀缺的内存进行操作,以便为数据缓存留下尽可能多的可用内存。在这种情况下,有效地调度数据传输以同时解决问题的两个方面是非常重要的。最先进的解决方案通常依赖于产生次优性能和资源使用的简单方法。本文提出了一种新的随机数据传输策略,该策略可以动态适应最小内存利用率的计算,我们简要地强调了一系列设计原则。
{"title":"Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics","authors":"Bogdan Nicolae, Carlos H. A. Costa, Claudia Misale, K. Katrinis, Yoonho Park","doi":"10.1109/CCGrid.2016.85","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.85","url":null,"abstract":"Big data analytics is an indispensable tool in transforming science, engineering, medicine, healthcare, finance and ultimately business itself. With the explosion of data sizes and need for shorter time-to-solution, in-memory platforms such as Apache Spark gain increasing popularity. However, this introduces important challenges, among which data shuffling is particularly difficult: on one hand it is a key part of the computation that has a major impact on the overall performance and scalability so its efficiency is paramount, while on the other hand it needs to operate with scarce memory in order to leave as much memory available for data caching. In this context, efficient scheduling of data transfers such that it addresses both dimensions of the problem simultaneously is non-trivial. State-of-the-art solutions often rely on simple approaches that yield sub optimal performance and resource usage. This paper contributes a novel shuffle data transfer strategy that dynamically adapts to the computation with minimal memory utilization, which we briefly underline as a series of design principles.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114893915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Spatial Support Vector Regression to Detect Silent Errors in the Exascale Era 空间支持向量回归在百亿亿次时代检测无声错误
Omer Subasi, S. Di, L. Bautista-Gomez, Prasanna Balaprakash, O. Unsal, Jesús Labarta, A. Cristal, F. Cappello
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions (SDCs) or silent errors are one of the major sources that corrupt the executionresults of HPC applications without being detected. In this work, we explore a low-memory-overhead SDC detector, by leveraging epsilon-insensitive support vector machine regression, to detect SDCs that occur in HPC applications that can be characterized by an impact error bound. The key contributions are three fold. (1) Our design takes spatialfeatures (i.e., neighbouring data values for each data point in a snapshot) into training data, such that little memory overhead (less than 1%) is introduced. (2) We provide an in-depth study on the detection ability and performance with different parameters, and we optimize the detection range carefully. (3) Experiments with eight real-world HPC applications show thatour detector can achieve the detection sensitivity (i.e., recall) up to 99% yet suffer a less than 1% of false positive rate for most cases. Our detector incurs low performance overhead, 5% on average, for all benchmarks studied in the paper. Compared with other state-of-the-art techniques, our detector exhibits the best tradeoff considering the detection ability and overheads.
随着百亿亿次时代的临近,具有目标功率和能源预算目标的高性能计算(HPC)系统的容量不断增加,对可靠性提出了重大挑战。静默数据损坏(sdc)或静默错误是破坏HPC应用程序执行结果而不被检测到的主要来源之一。在这项工作中,我们探索了一种低内存开销的SDC检测器,通过利用对epsilon不敏感的支持向量机回归,来检测HPC应用中发生的SDC,这些SDC可以以影响错误界限为特征。主要贡献有三个方面。(1)我们的设计将空间特征(即快照中每个数据点的相邻数据值)纳入训练数据,这样就引入了很少的内存开销(小于1%)。(2)对不同参数下的检测能力和性能进行了深入研究,并对检测距离进行了精心优化。(3) 8个实际HPC应用的实验表明,我们的检测器可以实现高达99%的检测灵敏度(即召回率),并且大多数情况下的假阳性率小于1%。对于本文研究的所有基准测试,我们的检测器产生的性能开销很低,平均为5%。与其他最先进的技术相比,考虑到检测能力和开销,我们的检测器表现出最佳的权衡。
{"title":"Spatial Support Vector Regression to Detect Silent Errors in the Exascale Era","authors":"Omer Subasi, S. Di, L. Bautista-Gomez, Prasanna Balaprakash, O. Unsal, Jesús Labarta, A. Cristal, F. Cappello","doi":"10.1109/CCGrid.2016.33","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.33","url":null,"abstract":"As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions (SDCs) or silent errors are one of the major sources that corrupt the executionresults of HPC applications without being detected. In this work, we explore a low-memory-overhead SDC detector, by leveraging epsilon-insensitive support vector machine regression, to detect SDCs that occur in HPC applications that can be characterized by an impact error bound. The key contributions are three fold. (1) Our design takes spatialfeatures (i.e., neighbouring data values for each data point in a snapshot) into training data, such that little memory overhead (less than 1%) is introduced. (2) We provide an in-depth study on the detection ability and performance with different parameters, and we optimize the detection range carefully. (3) Experiments with eight real-world HPC applications show thatour detector can achieve the detection sensitivity (i.e., recall) up to 99% yet suffer a less than 1% of false positive rate for most cases. Our detector incurs low performance overhead, 5% on average, for all benchmarks studied in the paper. Compared with other state-of-the-art techniques, our detector exhibits the best tradeoff considering the detection ability and overheads.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114644263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A Hybrid Simulation Model for Data Grids 数据网格的混合仿真模型
M. Barisits, E. Kühn, M. Lassnig
Data grids are used in large scale scientific experiments to access and store nontrivial amounts of data by combining the storage resources from multiple data centers in one system. This enables users and automated services to use the storage resources in a common and efficient way. However, as data grids grow it becomes a hard problem for developers and operators to estimate how modifications in policy, hardware, and software affect the performance metrics of the data grid. In this paper we address the modeling of operational data grids. We first analyze the data grid middleware system of the ATLAS experiment at the Large Hadron Collider to identify components relevant to the data grid performance. We describe existing modeling approaches for pre-transfer, network, storage, and validation components, and build black-box models for these components. Consequently, we present a novel hybrid model, which unifies these separate component models, and we evaluate the model using an event simulator. The evaluation is based on historic workloads extracted from the ATLAS data grid. The median evaluation error of the hybrid model is at 22%.
在大规模科学实验中,通过将多个数据中心的存储资源组合在一个系统中,数据网格被用于访问和存储大量数据。使用户和自动化业务能够更通用、更高效地使用存储资源。然而,随着数据网格的增长,开发人员和操作人员很难估计策略、硬件和软件的修改如何影响数据网格的性能指标。在本文中,我们讨论了操作数据网格的建模。首先对大型强子对撞机ATLAS实验的数据网格中间件系统进行分析,找出与数据网格性能相关的组件。我们描述了用于预传输、网络、存储和验证组件的现有建模方法,并为这些组件构建了黑盒模型。因此,我们提出了一种新的混合模型,将这些独立的组件模型统一起来,并使用事件模拟器对模型进行评估。评估基于从ATLAS数据网格中提取的历史工作负载。混合模型的估计误差中位数为22%。
{"title":"A Hybrid Simulation Model for Data Grids","authors":"M. Barisits, E. Kühn, M. Lassnig","doi":"10.1109/CCGrid.2016.36","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.36","url":null,"abstract":"Data grids are used in large scale scientific experiments to access and store nontrivial amounts of data by combining the storage resources from multiple data centers in one system. This enables users and automated services to use the storage resources in a common and efficient way. However, as data grids grow it becomes a hard problem for developers and operators to estimate how modifications in policy, hardware, and software affect the performance metrics of the data grid. In this paper we address the modeling of operational data grids. We first analyze the data grid middleware system of the ATLAS experiment at the Large Hadron Collider to identify components relevant to the data grid performance. We describe existing modeling approaches for pre-transfer, network, storage, and validation components, and build black-box models for these components. Consequently, we present a novel hybrid model, which unifies these separate component models, and we evaluate the model using an event simulator. The evaluation is based on historic workloads extracted from the ATLAS data grid. The median evaluation error of the hybrid model is at 22%.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123017274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Management of Distributed Big Data for Social Networks 面向社交网络的分布式大数据管理
C. Leung, Hao Zhang
In the current era of Big Data, high volumes of a wide variety of valuable data can be easily collected and generated from a broad range of data sources of different veracities at a high velocity. Due to the well-known 5V's of these Big Data, many traditional data management approaches may not be suitable for handling the Big Data. Over the past few years, several applications and systems have developed to use cluster, cloud or grid computing to manage Big Data so as to support data science, Big Data analytics, as well as knowledge discovery and data mining. In this paper, we focus on distributed Big Data management. Specifically, we present our method for Big Data representation and management of distributed Big Data from social networks. We represent such big graph data in distributed settings so as to support big data mining of frequently occurring patterns from social networks.
在当前的大数据时代,可以很容易地从不同真实性的广泛数据源中以高速度收集和生成大量、种类繁多的有价值数据。由于这些大数据众所周知的5V,许多传统的数据管理方法可能不适合处理大数据。在过去的几年中,已经开发了一些应用程序和系统来使用集群,云或网格计算来管理大数据,以支持数据科学,大数据分析以及知识发现和数据挖掘。本文主要研究分布式大数据管理。具体来说,我们提出了大数据表示和管理来自社交网络的分布式大数据的方法。我们将这种大图数据在分布式环境中表示出来,从而支持对社交网络中频繁出现的模式进行大数据挖掘。
{"title":"Management of Distributed Big Data for Social Networks","authors":"C. Leung, Hao Zhang","doi":"10.1109/CCGrid.2016.107","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.107","url":null,"abstract":"In the current era of Big Data, high volumes of a wide variety of valuable data can be easily collected and generated from a broad range of data sources of different veracities at a high velocity. Due to the well-known 5V's of these Big Data, many traditional data management approaches may not be suitable for handling the Big Data. Over the past few years, several applications and systems have developed to use cluster, cloud or grid computing to manage Big Data so as to support data science, Big Data analytics, as well as knowledge discovery and data mining. In this paper, we focus on distributed Big Data management. Specifically, we present our method for Big Data representation and management of distributed Big Data from social networks. We represent such big graph data in distributed settings so as to support big data mining of frequently occurring patterns from social networks.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123474310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Sensor Data Air Pollution Prediction by Kernel Models 基于核模型的传感器数据大气污染预测
P. Vidnerová, Roman Neruda
Kernel-based neural networks are popular machine learning approach with many successful applications. Regularization networks represent a their special subclass with solid theoretical background and a variety of learning possibilities. In this paper, we focus on single and multi-kernel units, in particular, we describe the architecture of a product unit network, and describe an evolutionary learning algorithm for setting its parameters including different kernels from a dictionary, and optimal split of inputs into individual products. The approach is tested on real-world data from calibration of air-pollution sensor networks, and the performance is compared to several different regression tools.
基于核的神经网络是一种流行的机器学习方法,有许多成功的应用。正则化网络是其特殊的子类,具有坚实的理论背景和多种学习可能性。本文重点研究了单核和多核单元,特别描述了产品单元网络的结构,并描述了一种进化学习算法,用于设置其参数,包括来自字典的不同核,以及将输入最佳分割为单个产品。该方法在来自空气污染传感器网络校准的实际数据上进行了测试,并将其性能与几种不同的回归工具进行了比较。
{"title":"Sensor Data Air Pollution Prediction by Kernel Models","authors":"P. Vidnerová, Roman Neruda","doi":"10.1109/CCGrid.2016.80","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.80","url":null,"abstract":"Kernel-based neural networks are popular machine learning approach with many successful applications. Regularization networks represent a their special subclass with solid theoretical background and a variety of learning possibilities. In this paper, we focus on single and multi-kernel units, in particular, we describe the architecture of a product unit network, and describe an evolutionary learning algorithm for setting its parameters including different kernels from a dictionary, and optimal split of inputs into individual products. The approach is tested on real-world data from calibration of air-pollution sensor networks, and the performance is compared to several different regression tools.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128081459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Faster: A Low Overhead Framework for Massive Data Analysis 更快:用于大规模数据分析的低开销框架
Matheus Santos, Wagner Meira Jr, D. Guedes, Virgílio A. F. Almeida
With the recent accelerated increase in the amount of social data available in the Internet, several big data distributed processing frameworks have been proposed and implemented. Hadoop has been used widely to process all kinds of data, not only from social media. Spark is gaining popularity for offering a more flexible, object-functional, programming interface, and also by improving performance in many cases. However, not all data analysis algorithms perform well on Hadoop or Spark. For instance, graph algorithms tend to generate large amounts of messages between processing elements, which may result in poor performance even in Spark. We introduce Faster, a low latency distributed processing framework, designed to explore data locality to reduce processing costs in such algorithms. It offers an API similar to Spark, but with a slightly different execution model and new operators. Our results show that it can significantly outperform Spark on large graphs, being up to one orders of magnitude faster when running PageRank in a partial Google+ friendship graph with more than one billion edges.
随着近年来互联网上可用的社会数据量的加速增长,一些大数据分布式处理框架被提出并实现。Hadoop被广泛用于处理各种数据,而不仅仅是来自社交媒体的数据。Spark因为提供更灵活的、对象函数式的编程接口,以及在许多情况下提高性能而越来越受欢迎。然而,并不是所有的数据分析算法在Hadoop或Spark上都表现良好。例如,图算法倾向于在处理元素之间生成大量消息,这可能导致即使在Spark中性能也很差。我们介绍了Faster,一个低延迟的分布式处理框架,旨在探索数据局部性以降低此类算法的处理成本。它提供了一个类似于Spark的API,但执行模型和新的操作符略有不同。我们的结果表明,它在大型图上的表现明显优于Spark,当在超过10亿个边的部分Google+友谊图中运行PageRank时,速度要快一个数量级。
{"title":"Faster: A Low Overhead Framework for Massive Data Analysis","authors":"Matheus Santos, Wagner Meira Jr, D. Guedes, Virgílio A. F. Almeida","doi":"10.1109/CCGrid.2016.90","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.90","url":null,"abstract":"With the recent accelerated increase in the amount of social data available in the Internet, several big data distributed processing frameworks have been proposed and implemented. Hadoop has been used widely to process all kinds of data, not only from social media. Spark is gaining popularity for offering a more flexible, object-functional, programming interface, and also by improving performance in many cases. However, not all data analysis algorithms perform well on Hadoop or Spark. For instance, graph algorithms tend to generate large amounts of messages between processing elements, which may result in poor performance even in Spark. We introduce Faster, a low latency distributed processing framework, designed to explore data locality to reduce processing costs in such algorithms. It offers an API similar to Spark, but with a slightly different execution model and new operators. Our results show that it can significantly outperform Spark on large graphs, being up to one orders of magnitude faster when running PageRank in a partial Google+ friendship graph with more than one billion edges.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121114188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards Fast Overlapping Community Detection 快速重叠社团检测研究
I. El-Helw, Rutger F. H. Hofman, H. Bal
Accelerating sequential algorithms in order to achieve high performance is often a nontrivial task. However, there are certain properties that can exacerbate this process and make it particularly daunting. For example, building an efficient parallel solution for a data-intensive algorithm requires a deep analysis of the memory access patterns and data reuse potential. Attempting to scale out the computations on clusters of machines introduces further complications due to network speed limitations. In this context, the optimization landscape can be extremely complex owing to the large number of trade-off decisions. In this paper, we discuss our experience designing two parallel implementations of an existing data-intensive machine learning algorithm that detects overlapping communities in graphs. The first design uses a single GPU to accelerate the computations of small data sets. We employed a code generation strategy in order to test and identify the best performing combination of optimizations. The second design uses a cluster of machines to scale out the computations for larger problem sizes. We used a mixture of MPI, RDMA and pipelining in order to circumvent networking overhead. Both these efforts bring us closer to understanding the complex relationships hidden within networks of entities.
加速顺序算法以获得高性能通常是一项非常重要的任务。然而,某些属性会加剧这一过程,并使其特别令人生畏。例如,为数据密集型算法构建有效的并行解决方案需要对内存访问模式和数据重用潜力进行深入分析。由于网络速度限制,尝试在机器集群上扩展计算会带来进一步的复杂性。在这种情况下,由于大量的权衡决策,优化环境可能非常复杂。在本文中,我们讨论了我们设计现有数据密集型机器学习算法的两个并行实现的经验,该算法可以检测图中的重叠社区。第一个设计使用单个GPU来加速小数据集的计算。我们采用了代码生成策略来测试和确定最佳性能的优化组合。第二种设计使用一组机器来扩展计算以解决更大的问题。为了避免网络开销,我们混合使用了MPI、RDMA和流水线。这两种努力都使我们更接近于理解隐藏在实体网络中的复杂关系。
{"title":"Towards Fast Overlapping Community Detection","authors":"I. El-Helw, Rutger F. H. Hofman, H. Bal","doi":"10.1109/CCGrid.2016.98","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.98","url":null,"abstract":"Accelerating sequential algorithms in order to achieve high performance is often a nontrivial task. However, there are certain properties that can exacerbate this process and make it particularly daunting. For example, building an efficient parallel solution for a data-intensive algorithm requires a deep analysis of the memory access patterns and data reuse potential. Attempting to scale out the computations on clusters of machines introduces further complications due to network speed limitations. In this context, the optimization landscape can be extremely complex owing to the large number of trade-off decisions. In this paper, we discuss our experience designing two parallel implementations of an existing data-intensive machine learning algorithm that detects overlapping communities in graphs. The first design uses a single GPU to accelerate the computations of small data sets. We employed a code generation strategy in order to test and identify the best performing combination of optimizations. The second design uses a cluster of machines to scale out the computations for larger problem sizes. We used a mixture of MPI, RDMA and pipelining in order to circumvent networking overhead. Both these efforts bring us closer to understanding the complex relationships hidden within networks of entities.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125927097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Online Power Estimation of Graphics Processing Units 图形处理单元在线功率估计
Vignesh Adhinarayanan, Balaji Subramaniam, Wu-chun Feng
Accurate power estimation at runtime is essential for the efficient functioning of a power management system. While years of research have yielded accurate power models for the online prediction of instantaneous power for CPUs, such power models for graphics processing units (GPUs) are lacking. GPUs rely on low-resolution power meters that only nominally support basic power management. To address this, we propose an instantaneous power model, and in turn, a power estimator, that uses performance counters in a novel way so as to deliver accurate power estimation at runtime. Our power estimator runs on two real NVIDIA GPUs to show that accurate runtime estimation is possible without the need for the high-fidelity details that are assumed on simulation-based power models. To construct our power model, we first use correlation analysis to identify a concise set of performance counters that work well despite GPU device limitations. Next, we explore several statistical regression techniques and identify the best one. Then, to improve the prediction accuracy, we propose a novel application-dependent modeling technique, where the model is constructed online at runtime, based on the readings from a low-resolution, built-in GPU power meter. Our quantitative results show that a multi-linear model, which produces a mean absolute error of 6%, works the best in practice. An application-specific quadratic model reduces the error to nearly 1%. We show that this model can be constructed with low overhead and high accuracy at runtime. To the best of our knowledge, this is the first work attempting to model the instantaneous power of a real GPU system, earlier related work focused on average power.
准确的运行时功率估计对于电源管理系统的有效运行至关重要。虽然多年的研究已经为在线预测cpu的瞬时功耗提供了准确的功耗模型,但图形处理单元(gpu)的功耗模型仍然缺乏。gpu依赖于低分辨率的功耗表,它只在名义上支持基本的电源管理。为了解决这个问题,我们提出了一个瞬时功率模型,以及一个功率估计器,它以一种新颖的方式使用性能计数器,以便在运行时提供准确的功率估计。我们的功率估计器在两个真正的NVIDIA gpu上运行,以显示准确的运行时估计是可能的,而不需要在基于仿真的功率模型上假设的高保真细节。为了构建我们的功率模型,我们首先使用相关分析来确定一组简洁的性能计数器,这些计数器在GPU设备限制下仍能很好地工作。接下来,我们探讨几种统计回归技术,并确定最佳的一种。然后,为了提高预测精度,我们提出了一种新的基于应用的建模技术,该技术基于低分辨率内置GPU功率计的读数在运行时在线构建模型。我们的定量结果表明,在实际应用中,平均绝对误差为6%的多线性模型效果最好。特定于应用程序的二次型模型将误差降低到近1%。结果表明,该模型可以在运行时以低开销和高精度构建。据我们所知,这是第一个试图模拟真实GPU系统的瞬时功率的工作,早期的相关工作集中在平均功率上。
{"title":"Online Power Estimation of Graphics Processing Units","authors":"Vignesh Adhinarayanan, Balaji Subramaniam, Wu-chun Feng","doi":"10.1109/CCGrid.2016.93","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.93","url":null,"abstract":"Accurate power estimation at runtime is essential for the efficient functioning of a power management system. While years of research have yielded accurate power models for the online prediction of instantaneous power for CPUs, such power models for graphics processing units (GPUs) are lacking. GPUs rely on low-resolution power meters that only nominally support basic power management. To address this, we propose an instantaneous power model, and in turn, a power estimator, that uses performance counters in a novel way so as to deliver accurate power estimation at runtime. Our power estimator runs on two real NVIDIA GPUs to show that accurate runtime estimation is possible without the need for the high-fidelity details that are assumed on simulation-based power models. To construct our power model, we first use correlation analysis to identify a concise set of performance counters that work well despite GPU device limitations. Next, we explore several statistical regression techniques and identify the best one. Then, to improve the prediction accuracy, we propose a novel application-dependent modeling technique, where the model is constructed online at runtime, based on the readings from a low-resolution, built-in GPU power meter. Our quantitative results show that a multi-linear model, which produces a mean absolute error of 6%, works the best in practice. An application-specific quadratic model reduces the error to nearly 1%. We show that this model can be constructed with low overhead and high accuracy at runtime. To the best of our knowledge, this is the first work attempting to model the instantaneous power of a real GPU system, earlier related work focused on average power.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134110803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
cuART: Fine-Grained Algebraic Reconstruction Technique for Computed Tomography Images on GPUs gpu上计算机断层扫描图像的细粒度代数重建技术
Xiaodong Yu, Hao Wang, Wu-chun Feng, H. Gong, Guohua Cao
Algebraic reconstruction technique (ART) is an iterative algorithm for computed tomography (CT) image reconstruction. Due to the high computational cost, researchers turn to modern HPC systems with GPUs to accelerate the ART algorithm. However, the existing proposals suffer from inefficient designs of compressed data structure and computational kernel on GPUs. In this paper, we identify the computational patterns in the ART as the product of a sparse matrix (and its transpose) with multiple vectors (SpMV and SpMV_T). Because the implementations with well-tuned libraries, including cuSPARSE, BRC, and CSR5, underperform the expectations, we propose cuART, a complete compression and parallelization solution for the ART-based CT on GPUs. Based on the physical characteristics, i.e., the symmetries in the system matrix, we propose the symmetry-based CSR format (SCSR), which can further compress data storage by removing symmetric but redundant non-zero elements. Leveraging the sparsity patterns of X-ray projection, wetransform the CSR format to multiple dense sub-matrices in SCSR. We then design a transposition-free kernel to optimize the data access for both SpMV and SpMV_T. The experimental results illustrate that our mechanism can reduce memory usage significantly and make practical datasets fit into a single GPU. Our results also illustrate the superior performance of cuART compared to the existing methods on CPU and GPU.
代数重建技术(ART)是一种用于计算机断层扫描(CT)图像重建的迭代算法。由于计算成本高,研究人员转向带有gpu的现代高性能计算系统来加速ART算法。然而,现有的方案存在压缩数据结构设计和gpu计算内核设计效率低下的问题。在本文中,我们将ART中的计算模式识别为稀疏矩阵(及其转置)与多个向量(SpMV和SpMV_T)的乘积。由于包括cuSPARSE、BRC和CSR5在内的优化库的实现低于预期,因此我们提出了基于gpu的基于art的CT的完整压缩和并行化解决方案cuART。基于系统矩阵的对称性这一物理特性,提出了基于对称性的CSR格式(SCSR),该格式通过去除对称但冗余的非零元素进一步压缩数据存储。利用x射线投影的稀疏模式,我们将CSR格式转换为SCSR中的多个密集子矩阵。然后,我们设计了一个无转置的内核来优化SpMV和SpMV_T的数据访问。实验结果表明,该机制可以显著降低内存使用,并使实际数据集适合单个GPU。我们的结果也表明,与现有的CPU和GPU上的方法相比,cuART的性能更优越。
{"title":"cuART: Fine-Grained Algebraic Reconstruction Technique for Computed Tomography Images on GPUs","authors":"Xiaodong Yu, Hao Wang, Wu-chun Feng, H. Gong, Guohua Cao","doi":"10.1109/CCGrid.2016.96","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.96","url":null,"abstract":"Algebraic reconstruction technique (ART) is an iterative algorithm for computed tomography (CT) image reconstruction. Due to the high computational cost, researchers turn to modern HPC systems with GPUs to accelerate the ART algorithm. However, the existing proposals suffer from inefficient designs of compressed data structure and computational kernel on GPUs. In this paper, we identify the computational patterns in the ART as the product of a sparse matrix (and its transpose) with multiple vectors (SpMV and SpMV_T). Because the implementations with well-tuned libraries, including cuSPARSE, BRC, and CSR5, underperform the expectations, we propose cuART, a complete compression and parallelization solution for the ART-based CT on GPUs. Based on the physical characteristics, i.e., the symmetries in the system matrix, we propose the symmetry-based CSR format (SCSR), which can further compress data storage by removing symmetric but redundant non-zero elements. Leveraging the sparsity patterns of X-ray projection, wetransform the CSR format to multiple dense sub-matrices in SCSR. We then design a transposition-free kernel to optimize the data access for both SpMV and SpMV_T. The experimental results illustrate that our mechanism can reduce memory usage significantly and make practical datasets fit into a single GPU. Our results also illustrate the superior performance of cuART compared to the existing methods on CPU and GPU.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116109076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
期刊
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1