2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)最新文献_第6页

QAMEM: Query Aware Memory Energy Management QAMEM:查询感知内存能量管理

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00068

Srinivasan Chandrasekharan, C. Gniady

As memory becomes cheaper, use of it has become more prominent in computer systems. This increase in number of memory modules increases the ratio of energy consumption by memory to the overall energy consumption of a computer system. As Database Systems become more memory centric and put more pressure on the memory subsystem, managing energy consumption of main memory is becoming critical. Therefore, it is important to take advantage of all memory idle times and lower power states provided by newer memory architectures by placing memory in low power modes using application level cues. While there have been studies on CPU power consumption in Database Systems, only limited research has been done on the role of memory in Database Systems with respect to energy management. We propose Query Aware Memory Energy Management (QAMEM) where the Database System provides application level cues to the memory controller to switch to lower power states using query information and performance counters. Our results show that by using QAMEM on TPC-H workloads one can save 25% of total system energy in comparison to the state of the art memory energy management mechanisms.

随着内存变得越来越便宜，它在计算机系统中的应用也越来越突出。内存模块数量的增加增加了内存能耗与计算机系统总能耗的比例。随着数据库系统越来越以内存为中心，并给内存子系统带来更大的压力，管理主内存的能耗变得至关重要。因此，通过使用应用程序级线索将内存置于低功耗模式，利用新内存架构提供的所有内存空闲时间和低功耗状态非常重要。虽然已经有关于数据库系统中CPU功耗的研究，但是关于数据库系统中内存在能量管理方面的作用的研究非常有限。我们提出了查询感知内存能量管理(QAMEM)，其中数据库系统通过查询信息和性能计数器向内存控制器提供应用程序级别的提示，以切换到较低的功耗状态。我们的结果表明，通过在TPC-H工作负载上使用QAMEM，与最先进的内存能量管理机制相比，可以节省25%的系统总能量。

{"title":"QAMEM: Query Aware Memory Energy Management","authors":"Srinivasan Chandrasekharan, C. Gniady","doi":"10.1109/CCGRID.2018.00068","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00068","url":null,"abstract":"As memory becomes cheaper, use of it has become more prominent in computer systems. This increase in number of memory modules increases the ratio of energy consumption by memory to the overall energy consumption of a computer system. As Database Systems become more memory centric and put more pressure on the memory subsystem, managing energy consumption of main memory is becoming critical. Therefore, it is important to take advantage of all memory idle times and lower power states provided by newer memory architectures by placing memory in low power modes using application level cues. While there have been studies on CPU power consumption in Database Systems, only limited research has been done on the role of memory in Database Systems with respect to energy management. We propose Query Aware Memory Energy Management (QAMEM) where the Database System provides application level cues to the memory controller to switch to lower power states using query information and performance counters. Our results show that by using QAMEM on TPC-H workloads one can save 25% of total system energy in comparison to the state of the art memory energy management mechanisms.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126241649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Network-Aware Scheduler in Data-Parallel Clusters for High Performance 面向高性能的数据并行集群中的网络感知调度器

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00015

Zhuozhao Li, Haiying Shen, Ankur Sarker

In spite of many shuffle-heavy jobs in current commercial data-parallel clusters, few previous studies have considered the network traffic in the shuffle phase, which contains a large amount of data transfers and may adversely affect the cluster performance. In this paper, we propose a network-aware scheduler (NAS) that handles two main challenges associated with the shuffle phase for high performance: i) balancing cross-node network load, and ii) avoiding and reducing cross-rack network congestion. NAS consists of three main mechanisms: i) map task scheduling (MTS), ii) congestion-avoidance reduce task scheduling (CA-RTS) and iii) congestion-reduction reduce task scheduling (CR-RTS). MTS constrains the shuffle data on each node when scheduling the map tasks to balance the cross-node network load. CA-RTS distributes the reduce tasks for each job based on the distribution of its shuffle data among the racks in order to minimize cross-rack traffic. When the network is congested, CR-RTS schedules reduce tasks that generate negligible shuffle traffic to reduce the congestion. We implemented NAS in Hadoop on a cluster. Our trace-driven simulation and real cluster experiment demonstrate the superior performance of NAS on improving the throughput (up to 62%), reducing the average job execution time (up to 44%) and reducing the cross-rack traffic (up to 40%) compared with state-of-the-art schedulers.

尽管目前的商业数据并行集群中存在许多重shuffle任务，但很少有研究考虑shuffle阶段的网络流量，因为shuffle阶段包含大量的数据传输，可能会对集群性能产生不利影响。在本文中，我们提出了一个网络感知调度程序(NAS)，它可以处理与shuffle阶段相关的两个主要挑战，以实现高性能:i)平衡跨节点网络负载，ii)避免和减少跨机架网络拥塞。NAS包括三种主要机制:1)映射任务调度(MTS)， 2)避免拥塞减少任务调度(CA-RTS)和3)减少拥塞减少任务调度(CR-RTS)。MTS在调度map任务时对每个节点上的shuffle数据进行约束，以平衡跨节点的网络负载。CA-RTS根据其shuffle数据在机架之间的分布为每个作业分配reduce任务，以最小化跨机架流量。当网络拥塞时，CR-RTS调度会减少产生可以忽略不计的shuffle流量的任务，以减少拥塞。我们在Hadoop集群上实现了NAS。我们的跟踪驱动模拟和真实集群实验证明，与最先进的调度程序相比，NAS在提高吞吐量(高达62%)、减少平均作业执行时间(高达44%)和减少跨机架流量(高达40%)方面具有卓越的性能。

{"title":"A Network-Aware Scheduler in Data-Parallel Clusters for High Performance","authors":"Zhuozhao Li, Haiying Shen, Ankur Sarker","doi":"10.1109/CCGRID.2018.00015","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00015","url":null,"abstract":"In spite of many shuffle-heavy jobs in current commercial data-parallel clusters, few previous studies have considered the network traffic in the shuffle phase, which contains a large amount of data transfers and may adversely affect the cluster performance. In this paper, we propose a network-aware scheduler (NAS) that handles two main challenges associated with the shuffle phase for high performance: i) balancing cross-node network load, and ii) avoiding and reducing cross-rack network congestion. NAS consists of three main mechanisms: i) map task scheduling (MTS), ii) congestion-avoidance reduce task scheduling (CA-RTS) and iii) congestion-reduction reduce task scheduling (CR-RTS). MTS constrains the shuffle data on each node when scheduling the map tasks to balance the cross-node network load. CA-RTS distributes the reduce tasks for each job based on the distribution of its shuffle data among the racks in order to minimize cross-rack traffic. When the network is congested, CR-RTS schedules reduce tasks that generate negligible shuffle traffic to reduce the congestion. We implemented NAS in Hadoop on a cluster. Our trace-driven simulation and real cluster experiment demonstrate the superior performance of NAS on improving the throughput (up to 62%), reducing the average job execution time (up to 44%) and reducing the cross-rack traffic (up to 40%) compared with state-of-the-art schedulers.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128477530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Secure and Dynamic Core and Cache Partitioning for Safe and Efficient Server Consolidation 安全、动态的核心和缓存分区，用于安全、高效的服务器整合

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00046

Myeonggyun Han, Seongdae Yu, Woongki Baek

With server consolidation, latency-critical and batch workloads are collocated on the same physical servers. The resource manager dynamically allocates the hardware resources to the workloads to maximize the overall throughput while providing the service-level objective (SLO) guarantees for the latency-critical workloads. As the hardware resources are dynamically allocated across the workloads on the same physical server, information leakage can be established, making them vulnerable to micro-architectural side-channel attacks. Despite extensive prior works, it remains unexplored to investigate the efficient design and implementation of the dynamic resource management system that maximizes resource efficiency without compromising the SLO and security guarantees. To bridge this gap, this work proposes SDCP, secure and dynamic core and cache partitioning for safe and efficient server consolidation. In line with the state-of-the-art dynamic server consolidation techniques, SDCP dynamically allocates the hardware resources (i.e., cores and caches) to maximize the resource utilization with the SLO guarantees. In contrast to the existing techniques, however, SDCP dynamically sanitizes the hardware resources to ensure that no micro-architectural side channel is established between different security domains. Our experimental results demonstrate that SDCP provides high resource sanitization quality, incurs small performance overheads, and achieves high resource efficiency with the SLO and security guarantees.

通过服务器整合，延迟关键型工作负载和批处理工作负载被放在相同的物理服务器上。资源管理器动态地将硬件资源分配给工作负载，以最大限度地提高总体吞吐量，同时为延迟关键型工作负载提供服务水平目标(SLO)保证。由于硬件资源是在同一物理服务器上跨工作负载动态分配的，因此可能会造成信息泄漏，使其容易受到微体系结构侧通道攻击。尽管之前已经进行了大量的工作，但动态资源管理系统的有效设计和实施仍有待探索，该系统可以在不影响SLO和安全保证的情况下最大化资源效率。为了弥补这一差距，本工作提出了SDCP，安全和动态的核心和缓存分区，以实现安全有效的服务器整合。与最先进的动态服务器整合技术一致，SDCP动态分配硬件资源(即核心和缓存)，以最大限度地利用SLO保证的资源。然而，与现有技术相比，SDCP动态地对硬件资源进行消毒，以确保在不同的安全域之间不建立微体系结构侧通道。实验结果表明，SDCP提供了较高的资源消毒质量，带来较小的性能开销，并在SLO和安全保证的情况下实现了较高的资源效率。

{"title":"Secure and Dynamic Core and Cache Partitioning for Safe and Efficient Server Consolidation","authors":"Myeonggyun Han, Seongdae Yu, Woongki Baek","doi":"10.1109/CCGRID.2018.00046","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00046","url":null,"abstract":"With server consolidation, latency-critical and batch workloads are collocated on the same physical servers. The resource manager dynamically allocates the hardware resources to the workloads to maximize the overall throughput while providing the service-level objective (SLO) guarantees for the latency-critical workloads. As the hardware resources are dynamically allocated across the workloads on the same physical server, information leakage can be established, making them vulnerable to micro-architectural side-channel attacks. Despite extensive prior works, it remains unexplored to investigate the efficient design and implementation of the dynamic resource management system that maximizes resource efficiency without compromising the SLO and security guarantees. To bridge this gap, this work proposes SDCP, secure and dynamic core and cache partitioning for safe and efficient server consolidation. In line with the state-of-the-art dynamic server consolidation techniques, SDCP dynamically allocates the hardware resources (i.e., cores and caches) to maximize the resource utilization with the SLO guarantees. In contrast to the existing techniques, however, SDCP dynamically sanitizes the hardware resources to ensure that no micro-architectural side channel is established between different security domains. Our experimental results demonstrate that SDCP provides high resource sanitization quality, incurs small performance overheads, and achieves high resource efficiency with the SLO and security guarantees.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128969063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Distributed Cloud Cache 分布式云缓存

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00-33

J. Al-Jaroodi, N. Mohamed

The concept of cache has proven to be very beneficial in various domains. This paper introduces the concept of distributed cloud cache. It uses small caches available on servers or fog nodes over the Internet to reduce the load on the data centers. Dual-direction parallel transfer is used to improve download times of new released software, games, and movies from the release point to the clients.

缓存的概念已被证明在许多领域都是非常有益的。本文介绍了分布式云缓存的概念。它使用服务器上可用的小型缓存或互联网上的雾节点来减少数据中心的负载。双向并行传输用于提高新发布的软件、游戏和电影从发布点到客户端的下载时间。

引用次数: 2

Addressing the Challenges of Executing a Massive Computational Cluster in the Cloud 解决在云中执行大规模计算集群的挑战

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00040

Brandon Posey, Christopher Gropp, Boyd Wilson, Boyd McGeachie, S. Padhi, Alexander Herzog, A. Apon

A major limitation for time-to-science can be the lack of available computing resources. Depending on the capacity of resources, executing an application suite with hundreds of thousands of jobs can take weeks when resources are in high demand. We describe how we dynamically provision a large scale high performance computing cluster of more than one million cores utilizing Amazon Web Services (AWS). We discuss the trade-offs, challenges, and solutions associated with creating such a large scale cluster with commercial cloud resources. We utilize our large scale cluster to study a parameter sweep workflow composed of message-passing parallel topic modeling jobs on multiple datasets. At peak, we achieve a simultaneous core count of 1,119,196 vCPUs across nearly 50,000 instances, and are able to execute almost half a million jobs within two hours utilizing AWS Spot Instances in a single AWS region. Our solutions to the challenges and trade-offs have broad application to the lifecycle management of similar clusters on other commercial clouds.

研究时间的一个主要限制可能是缺乏可用的计算资源。根据资源的容量，当资源需求量很大时，执行具有数十万个作业的应用程序套件可能需要数周时间。我们描述了如何利用Amazon Web Services (AWS)动态地提供超过一百万核的大规模高性能计算集群。我们将讨论与使用商业云资源创建如此大规模集群相关的权衡、挑战和解决方案。我们利用我们的大规模集群研究了一个由多个数据集上的消息传递并行主题建模作业组成的参数扫描工作流。在峰值时，我们在近50,000个实例中实现了1,119,196个vcpu的同时核心计数，并且能够在单个AWS区域中利用AWS Spot实例在两小时内执行近50万个作业。我们针对挑战和权衡的解决方案广泛应用于其他商业云上类似集群的生命周期管理。

{"title":"Addressing the Challenges of Executing a Massive Computational Cluster in the Cloud","authors":"Brandon Posey, Christopher Gropp, Boyd Wilson, Boyd McGeachie, S. Padhi, Alexander Herzog, A. Apon","doi":"10.1109/CCGRID.2018.00040","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00040","url":null,"abstract":"A major limitation for time-to-science can be the lack of available computing resources. Depending on the capacity of resources, executing an application suite with hundreds of thousands of jobs can take weeks when resources are in high demand. We describe how we dynamically provision a large scale high performance computing cluster of more than one million cores utilizing Amazon Web Services (AWS). We discuss the trade-offs, challenges, and solutions associated with creating such a large scale cluster with commercial cloud resources. We utilize our large scale cluster to study a parameter sweep workflow composed of message-passing parallel topic modeling jobs on multiple datasets. At peak, we achieve a simultaneous core count of 1,119,196 vCPUs across nearly 50,000 instances, and are able to execute almost half a million jobs within two hours utilizing AWS Spot Instances in a single AWS region. Our solutions to the challenges and trade-offs have broad application to the lifecycle management of similar clusters on other commercial clouds.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127410631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Service-Oriented Architecture for Big Data Analytics in Smart Cities 面向服务的智慧城市大数据分析体系结构

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00052

J. Al-Jaroodi, N. Mohamed

A smart city has recently become an aspiration for many cities around the world. These cities are looking to apply the smart city concept to improve sustainability, quality of life for residents, and economic development. The smart city concept depends on employing a wide range of advanced technologies to improve the performance of various services and activities such as transportation, energy, healthcare, and education, while at the same time improve the city's resources utilization and initiate new business opportunities. One of the promising technologies to support such efforts is the big data technology. Effective and intelligent use of big data accumulated over time in various sectors can offer many advantages to enhance decision making in smart cities. In this paper we identify the different types of decision making processes involved in smart cities. Then we propose a service-oriented architecture to support big data analytics for decision making in smart cities. This architecture allows for integrating different technologies such as fog and cloud computing to support different types of analytics and decision-making operations needed to effectively utilize available big data. It provides different functions and capabilities to use big data and provide smart capabilities as services that the architecture supports. As a result, different big data applications will be able to access and use these services for varying proposes within the smart city.

智慧城市最近成为世界上许多城市的愿望。这些城市正在寻求应用智慧城市的概念来提高可持续性、居民的生活质量和经济发展。智慧城市的概念依赖于采用广泛的先进技术来改善交通、能源、医疗和教育等各种服务和活动的绩效，同时提高城市的资源利用率并开创新的商业机会。支持这一努力的一项有前景的技术是大数据技术。有效和智能地利用各行各业长期积累的大数据，可以为提高智慧城市的决策能力提供许多优势。在本文中，我们确定了智能城市中涉及的不同类型的决策过程。然后，我们提出了一个面向服务的架构，以支持大数据分析在智慧城市的决策。这种架构允许集成不同的技术，如雾计算和云计算，以支持有效利用可用大数据所需的不同类型的分析和决策操作。它提供了不同的功能和能力来使用大数据，并提供智能功能作为架构支持的服务。因此，不同的大数据应用程序将能够访问和使用这些服务，以满足智慧城市中不同的需求。

{"title":"Service-Oriented Architecture for Big Data Analytics in Smart Cities","authors":"J. Al-Jaroodi, N. Mohamed","doi":"10.1109/CCGRID.2018.00052","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00052","url":null,"abstract":"A smart city has recently become an aspiration for many cities around the world. These cities are looking to apply the smart city concept to improve sustainability, quality of life for residents, and economic development. The smart city concept depends on employing a wide range of advanced technologies to improve the performance of various services and activities such as transportation, energy, healthcare, and education, while at the same time improve the city's resources utilization and initiate new business opportunities. One of the promising technologies to support such efforts is the big data technology. Effective and intelligent use of big data accumulated over time in various sectors can offer many advantages to enhance decision making in smart cities. In this paper we identify the different types of decision making processes involved in smart cities. Then we propose a service-oriented architecture to support big data analytics for decision making in smart cities. This architecture allows for integrating different technologies such as fog and cloud computing to support different types of analytics and decision-making operations needed to effectively utilize available big data. It provides different functions and capabilities to use big data and provide smart capabilities as services that the architecture supports. As a result, different big data applications will be able to access and use these services for varying proposes within the smart city.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127446479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Toward Scalable and Asynchronous Object-Centric Data Management for HPC 面向HPC的可伸缩和异步对象中心数据管理

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00026

Houjun Tang, S. Byna, François Tessier, Teng Wang, Bin Dong, Jingqing Mu, Q. Koziol, Jérome Soumagne, V. Vishwanath, Jialin Liu, R. Warren

Emerging high performance computing (HPC) systems are expected to be deployed with an unprecedented level of complexity due to a deep system memory and storage hierarchy. Efficient and scalable methods of data management and movement through this hierarchy is critical for scientific applications using exascale systems. Moving toward new paradigms for scalable I/O in the extreme-scale era, we introduce novel object-centric data abstractions and storage mechanisms that take advantage of the deep storage hierarchy, named Proactive Data Containers (PDC). In this paper, we formulate object-centric PDCs and their mappings in different levels of the storage hierarchy. PDC adopts a client-server architecture with a set of servers managing data movement across storage layers. To demonstrate the effectiveness of the proposed PDC system, we have measured performance of benchmarks and I/O kernels from scientific simulation and analysis applications using PDC programming interface, and compared the results with existing highly tuned I/O libraries. Using asynchronous I/O along with data and metadata optimizations, PDC demonstrates up to 23× speedup over HDF5 and PLFS in writing and reading data from a plasma physics simulation. PDC achieves comparable performance with HDF5 and PLFS in reading and writing data of a single timestep at small scale, and outperforms them at a scale of larger than 10K cores. In contrast to existing storage systems, PDC offers user-space data management with the flexibility to allocate the number of PDC servers depending on the workload.

由于系统内存和存储层次结构较深，新兴的高性能计算(HPC)系统预计将以前所未有的复杂性部署。高效和可扩展的数据管理方法以及通过该层次结构的移动对于使用百亿亿级系统的科学应用程序至关重要。在极端规模时代，为了实现可扩展I/O的新范式，我们引入了新的以对象为中心的数据抽象和存储机制，这些机制利用了深度存储层次结构，称为主动数据容器(PDC)。在本文中，我们制定了以对象为中心的pdc及其在不同存储层次中的映射。PDC采用客户机-服务器架构，由一组服务器管理跨存储层的数据移动。为了证明所提出的PDC系统的有效性，我们使用PDC编程接口测量了来自科学模拟和分析应用程序的基准测试和I/O内核的性能，并将结果与现有的高度调优的I/O库进行了比较。PDC使用异步I/O以及数据和元数据优化，在从等离子体物理模拟中写入和读取数据时，速度比HDF5和PLFS提高了23倍。PDC在小尺度下单时间步长的数据读写性能与HDF5和PLFS相当，在大于10K核的规模下性能优于HDF5和PLFS。与现有的存储系统相比，PDC提供用户空间的数据管理，可以根据工作负载灵活地分配PDC服务器的数量。

{"title":"Toward Scalable and Asynchronous Object-Centric Data Management for HPC","authors":"Houjun Tang, S. Byna, François Tessier, Teng Wang, Bin Dong, Jingqing Mu, Q. Koziol, Jérome Soumagne, V. Vishwanath, Jialin Liu, R. Warren","doi":"10.1109/CCGRID.2018.00026","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00026","url":null,"abstract":"Emerging high performance computing (HPC) systems are expected to be deployed with an unprecedented level of complexity due to a deep system memory and storage hierarchy. Efficient and scalable methods of data management and movement through this hierarchy is critical for scientific applications using exascale systems. Moving toward new paradigms for scalable I/O in the extreme-scale era, we introduce novel object-centric data abstractions and storage mechanisms that take advantage of the deep storage hierarchy, named Proactive Data Containers (PDC). In this paper, we formulate object-centric PDCs and their mappings in different levels of the storage hierarchy. PDC adopts a client-server architecture with a set of servers managing data movement across storage layers. To demonstrate the effectiveness of the proposed PDC system, we have measured performance of benchmarks and I/O kernels from scientific simulation and analysis applications using PDC programming interface, and compared the results with existing highly tuned I/O libraries. Using asynchronous I/O along with data and metadata optimizations, PDC demonstrates up to 23× speedup over HDF5 and PLFS in writing and reading data from a plasma physics simulation. PDC achieves comparable performance with HDF5 and PLFS in reading and writing data of a single timestep at small scale, and outperforms them at a scale of larger than 10K cores. In contrast to existing storage systems, PDC offers user-space data management with the flexibility to allocate the number of PDC servers depending on the workload.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128166385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Information Centric Networking for Sharing and Accessing Digital Objects with Persistent Identifiers on Data Infrastructures 数据基础设施上具有持久标识符的数字对象共享和访问的信息中心网络

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00098

Spiros Koulouzis, Rahaf Mousa, Andreas Karakannas, C. D. Laat, Zhiming Zhao

Persistent identifiers (PIDs) such as Digital Object Identifiers (DOIs) provide a unique and persistent way to identify and cite digital objects such as publications, media content and research data. They are widely used by data producers to catalogue and publish digital assets and research data. Nowadays, research infrastructures (RIs) offer services not only for accessing and publishing data objects, but also for processing data based on user demands, e.g., via scientific workflows or third party virtual research environments. However, efficiently retrieving and sharing digital objects in a shared data processing environment requires knowledge of application access patterns as well as the underlying network level distribution. As the number and size of data objects increases, optimizing data discovery and access among distributed partners on shared infrastructure emerges as an important challenge for infrastructure operators to maintain quality of service and user experience. In this paper, we propose a novel approach that utilizes Information Centric Networking (ICN) to retrieve content based on PIDs while optimizing data access on shared infrastructure.

持久标识符(pid)，例如数字对象标识符(doi)，提供了一种独特且持久的方式来标识和引用数字对象，如出版物、媒体内容和研究数据。它们被数据生产者广泛用于编目和发布数字资产和研究数据。如今，研究基础设施(RIs)不仅提供访问和发布数据对象的服务，而且还提供根据用户需求处理数据的服务，例如，通过科学工作流或第三方虚拟研究环境。然而，要在共享数据处理环境中高效地检索和共享数字对象，需要了解应用程序访问模式以及底层网络级分布。随着数据对象的数量和大小的增加，优化共享基础设施上分布式合作伙伴之间的数据发现和访问成为基础设施运营商保持服务质量和用户体验的重要挑战。在本文中，我们提出了一种利用信息中心网络(ICN)检索基于pid的内容的新方法，同时优化共享基础设施上的数据访问。

{"title":"Information Centric Networking for Sharing and Accessing Digital Objects with Persistent Identifiers on Data Infrastructures","authors":"Spiros Koulouzis, Rahaf Mousa, Andreas Karakannas, C. D. Laat, Zhiming Zhao","doi":"10.1109/CCGRID.2018.00098","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00098","url":null,"abstract":"Persistent identifiers (PIDs) such as Digital Object Identifiers (DOIs) provide a unique and persistent way to identify and cite digital objects such as publications, media content and research data. They are widely used by data producers to catalogue and publish digital assets and research data. Nowadays, research infrastructures (RIs) offer services not only for accessing and publishing data objects, but also for processing data based on user demands, e.g., via scientific workflows or third party virtual research environments. However, efficiently retrieving and sharing digital objects in a shared data processing environment requires knowledge of application access patterns as well as the underlying network level distribution. As the number and size of data objects increases, optimizing data discovery and access among distributed partners on shared infrastructure emerges as an important challenge for infrastructure operators to maintain quality of service and user experience. In this paper, we propose a novel approach that utilizes Information Centric Networking (ICN) to retrieve content based on PIDs while optimizing data access on shared infrastructure.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116347520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Pedigree-ing Your Big Data: Data-Driven Big Data Privacy in Distributed Environments 大数据谱系:分布式环境中数据驱动的大数据隐私

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00100

A. Cuzzocrea, E. Damiani

This paper introduces a general framework for supporting data-driven privacy-preserving big data management in distributed environments, such as emerging Cloud settings. The proposed framework can be viewed as an alternative to classical approaches where the privacy of big data is ensured via security-inspired protocols that check several (protocol) layers in order to achieve the desired privacy. Unfortunately, this injects considerable computational overheads in the overall process, thus introducing relevant challenges to be considered. Our approach instead tries to recognize the "pedigree" of suitable summary data representatives computed on top of the target big data repositories, hence avoiding computational overheads due to protocol checking. We also provide a relevant realization of the framework above, the so-called Data-dRIven aggregate-PROvenance privacypreserving big Multidimensional data (DRIPROM) framework, which specifically considers multidimensional data as the case of interest.

本文介绍了一个通用框架，用于支持分布式环境(如新兴的云环境)中数据驱动的隐私保护大数据管理。所提出的框架可以被视为经典方法的替代方案，其中通过安全启发的协议来确保大数据的隐私，该协议检查多个(协议)层，以实现所需的隐私。不幸的是，这在整个过程中注入了相当大的计算开销，因此引入了需要考虑的相关挑战。相反，我们的方法试图识别在目标大数据存储库之上计算的合适汇总数据代表的“谱系”，从而避免了由于协议检查而产生的计算开销。我们还提供了上述框架的相关实现，即所谓的数据驱动聚合-来源隐私保护大多维数据(DRIPROM)框架，该框架特别考虑了多维数据作为感兴趣的情况。

引用次数: 10

Uncertainty-Aware Elastic Virtual Machine Scheduling for Stream Processing Systems 流处理系统的不确定性感知弹性虚拟机调度

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Pub Date : 2018-05-01 DOI: 10.1109/CCGRID.2018.00021

Shigeru Imai, S. Patterson, Carlos A. Varela

Stream processing systems deployed on the cloud need to be elastic to effectively accommodate workload variations over time. Performance models can predict maximum sustainable throughput (MST) as a function of the number of VMs allocated. We present a scheduling framework that incorporates three statistical techniques to improve Quality of Service (QoS) of cloud stream processing systems: (i) uncertainty quantification to consider variance in the MST model; (ii) online learning to update MST model as new performance metrics are gathered; and (iii) workload models to predict input data stream rates assuming regular patterns occur over time. Our framework can be parameterized by a QoS satisfaction target that statistically finds the best performance/cost tradeoff. Our results illustrate that each of the three techniques alone significantly improves QoS, from 52% to 73-81% QoS satisfaction rates on average for eight benchmark applications. Furthermore, applying all three techniques allows us to reach 98.62% QoS satisfaction rate with a cost less than twice the cost of the optimal (in hindsight) VM allocations, and half of the cost of allocating VMs for the peak demand in the workload.

部署在云上的流处理系统需要具有弹性，以有效地适应工作负载随时间的变化。性能模型可以预测最大可持续吞吐量(MST)作为分配的虚拟机数量的函数。我们提出了一个调度框架，该框架结合了三种统计技术来提高云流处理系统的服务质量(QoS):(i)不确定性量化，以考虑MST模型中的方差;(ii)在线学习，在收集到新的绩效指标时更新MST模型;(iii)工作负载模型，用于预测输入数据流速率，假设随时间发生规律模式。我们的框架可以通过QoS满意度目标进行参数化，该目标可以在统计上找到最佳的性能/成本权衡。我们的研究结果表明，这三种技术中的每一种都能显著提高QoS，在8个基准应用程序中，QoS满意率平均从52%提高到73-81%。此外，应用所有三种技术使我们能够达到98.62%的QoS满意度，而成本不到最佳VM分配成本的两倍，并且为工作负载中的峰值需求分配VM的成本降低了一半。

{"title":"Uncertainty-Aware Elastic Virtual Machine Scheduling for Stream Processing Systems","authors":"Shigeru Imai, S. Patterson, Carlos A. Varela","doi":"10.1109/CCGRID.2018.00021","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00021","url":null,"abstract":"Stream processing systems deployed on the cloud need to be elastic to effectively accommodate workload variations over time. Performance models can predict maximum sustainable throughput (MST) as a function of the number of VMs allocated. We present a scheduling framework that incorporates three statistical techniques to improve Quality of Service (QoS) of cloud stream processing systems: (i) uncertainty quantification to consider variance in the MST model; (ii) online learning to update MST model as new performance metrics are gathered; and (iii) workload models to predict input data stream rates assuming regular patterns occur over time. Our framework can be parameterized by a QoS satisfaction target that statistically finds the best performance/cost tradeoff. Our results illustrate that each of the three techniques alone significantly improves QoS, from 52% to 73-81% QoS satisfaction rates on average for eight benchmark applications. Furthermore, applying all three techniques allows us to reach 98.62% QoS satisfaction rate with a cost less than twice the cost of the optimal (in hindsight) VM allocations, and half of the cost of allocating VMs for the peak demand in the workload.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116621462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18