Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00-69
Xiaoyang Zhang, C. Wu, Liudong Zuo, Aiqin Hou, Yongqiang Wang
Modern data-intensive applications require the transfer of big data over high-performance networks (HPNs) through bandwidth reservation for various purposes such as data storage and analysis. The key performance metrics for bandwidth scheduling include the utilization of network resources and the satisfaction of user requests. In this paper, for a given batch of Deadline-Constrained Bandwidth Reservation Requests (DCBRRs), we attempt to maximize the number of satisfied requests with flexible scheduling options over link-disjoint paths in an HPN while achieving the best average Earliest Completion Time (ECT) or Shortest Duration (SD) of scheduled requests. We further consider this problem from two bandwidth-oriented principles: (i) Minimum Bandwidth Principle (MINBP), and (ii) Maximum Bandwidth Principle (MAXBP). We show that both of these problem variants are NP-complete, and propose two heuristic algorithms with polynomial-time complexity for each. We conduct bandwidth scheduling experiments on both small-and large-scale DCBRRs in a real-life HPN topology for performance comparison. Extensive results show the superiority of the proposed algorithms over existing ones in comparison.
{"title":"Bandwidth Scheduling with Flexible Multi-paths in High-Performance Networks","authors":"Xiaoyang Zhang, C. Wu, Liudong Zuo, Aiqin Hou, Yongqiang Wang","doi":"10.1109/CCGRID.2018.00-69","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00-69","url":null,"abstract":"Modern data-intensive applications require the transfer of big data over high-performance networks (HPNs) through bandwidth reservation for various purposes such as data storage and analysis. The key performance metrics for bandwidth scheduling include the utilization of network resources and the satisfaction of user requests. In this paper, for a given batch of Deadline-Constrained Bandwidth Reservation Requests (DCBRRs), we attempt to maximize the number of satisfied requests with flexible scheduling options over link-disjoint paths in an HPN while achieving the best average Earliest Completion Time (ECT) or Shortest Duration (SD) of scheduled requests. We further consider this problem from two bandwidth-oriented principles: (i) Minimum Bandwidth Principle (MINBP), and (ii) Maximum Bandwidth Principle (MAXBP). We show that both of these problem variants are NP-complete, and propose two heuristic algorithms with polynomial-time complexity for each. We conduct bandwidth scheduling experiments on both small-and large-scale DCBRRs in a real-life HPN topology for performance comparison. Extensive results show the superiority of the proposed algorithms over existing ones in comparison.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126204385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00090
Kevin Beineke, Stefan Nothaas, M. Schöttner
Big data and large-scale Java applications often aggregate the resources of many servers. Low-latency and high-throughput network communication is important, if the applications have to process many concurrent interactive queries. We designed DXNet to address these challenges providing fast object de-/serialization, automatic connection management and zero-copy messaging. The latter includes sending of asynchronous messages as well as synchronous requests/responses and an event-driven message receiving approach. DXNet is optimized for small messages (< 64 bytes) in order to support highly interactive web applications, e.g., graph-based information retrieval, but works well with larger messages (e.g., 8 MB) as well. DXNet is available as standalone component on Github and its modular design is open for different transports currently supporting Ethernet and InfiniBand. The evaluation with micro benchmarks and YCSB using Ethernet and InfiniBand shows request-response latencies sub 10 µs (round-trip) including object de-/serialization, as well as a maximum throughput of more than 9 GByte/s.
{"title":"Efficient Messaging for Java Applications Running in Data Centers","authors":"Kevin Beineke, Stefan Nothaas, M. Schöttner","doi":"10.1109/CCGRID.2018.00090","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00090","url":null,"abstract":"Big data and large-scale Java applications often aggregate the resources of many servers. Low-latency and high-throughput network communication is important, if the applications have to process many concurrent interactive queries. We designed DXNet to address these challenges providing fast object de-/serialization, automatic connection management and zero-copy messaging. The latter includes sending of asynchronous messages as well as synchronous requests/responses and an event-driven message receiving approach. DXNet is optimized for small messages (< 64 bytes) in order to support highly interactive web applications, e.g., graph-based information retrieval, but works well with larger messages (e.g., 8 MB) as well. DXNet is available as standalone component on Github and its modular design is open for different transports currently supporting Ethernet and InfiniBand. The evaluation with micro benchmarks and YCSB using Ethernet and InfiniBand shows request-response latencies sub 10 µs (round-trip) including object de-/serialization, as well as a maximum throughput of more than 9 GByte/s.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129483147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00067
Thibaut Stimpfling, J. Langlois, N. Bélanger, Y. Savaria
The emergence of 5G networks and real-time applications across networks has a strong impact on the performance requirements of IP lookup engines. These engines must support not only high-bandwidth but also low-latency lookup operations. This paper presents the hardware architecture of a low-latency IPv6 lookup engine capable of supporting the bandwidth of current Ethernet links. The engine implements the SHIP lookup algorithm, which exploits prefix characteristics to build a compact and scalable data structure. The proposed hardware architecture leverages the characteristics of the data structure to support low-latency lookup operations, while making efficient use of memory. The architecture is described in C++, synthesized with a high-level synthesis tool, then implemented on a Virtex-7 FPGA. Compared to other well-known approaches, the proposed IPvThe emergence of 5G networks and real-time applications across networks has a strong impact on the performance requirements of IP lookup engines. These engines must support not only high-bandwidth but also low-latency lookup operations. This paper presents the hardware architecture of a low-latency IPv6 lookup engine capable of supporting the bandwidth of current Ethernet links. The engine implements the SHIP lookup algorithm, which exploits prefix characteristics to build a compact and scalable data structure. The proposed hardware architecture leverages the characteristics of the data structure to support lowlatency lookup operations, while making efficient use of memory. The architecture is described in C++, synthesized with a highlevel synthesis tool, then implemented on a Virtex-7 FPGA. Compared to the proposed IPv6 lookup architecture, other wellknown approaches use at least 87% more memory per prefix, while increasing the lookup latency by a factor of 2.3×.6 lookup architecture reduces lookup latency by a factor of 2.3x and uses as much as 46% less memory per prefix for a synthetic prefix table holding 580 k entries.
{"title":"A Low-Latency Memory-Efficient IPv6 Lookup Engine Implemented on FPGA Using High-Level Synthesis","authors":"Thibaut Stimpfling, J. Langlois, N. Bélanger, Y. Savaria","doi":"10.1109/CCGRID.2018.00067","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00067","url":null,"abstract":"The emergence of 5G networks and real-time applications across networks has a strong impact on the performance requirements of IP lookup engines. These engines must support not only high-bandwidth but also low-latency lookup operations. This paper presents the hardware architecture of a low-latency IPv6 lookup engine capable of supporting the bandwidth of current Ethernet links. The engine implements the SHIP lookup algorithm, which exploits prefix characteristics to build a compact and scalable data structure. The proposed hardware architecture leverages the characteristics of the data structure to support low-latency lookup operations, while making efficient use of memory. The architecture is described in C++, synthesized with a high-level synthesis tool, then implemented on a Virtex-7 FPGA. Compared to other well-known approaches, the proposed IPvThe emergence of 5G networks and real-time applications across networks has a strong impact on the performance requirements of IP lookup engines. These engines must support not only high-bandwidth but also low-latency lookup operations. This paper presents the hardware architecture of a low-latency IPv6 lookup engine capable of supporting the bandwidth of current Ethernet links. The engine implements the SHIP lookup algorithm, which exploits prefix characteristics to build a compact and scalable data structure. The proposed hardware architecture leverages the characteristics of the data structure to support lowlatency lookup operations, while making efficient use of memory. The architecture is described in C++, synthesized with a highlevel synthesis tool, then implemented on a Virtex-7 FPGA. Compared to the proposed IPv6 lookup architecture, other wellknown approaches use at least 87% more memory per prefix, while increasing the lookup latency by a factor of 2.3×.6 lookup architecture reduces lookup latency by a factor of 2.3x and uses as much as 46% less memory per prefix for a synthetic prefix table holding 580 k entries.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134535923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As more and more systems are migrating to cloud environment, the cloud native system becomes a trend. This paper presents the challenges and implications when diagnosing root causes for cloud native systems by analyzing some real incidents occurred in IBM Bluemix (a large commercial cloud). To tackle these challenges, we propose CloudRanger, a novel system dedicated for cloud native systems. To make our system more general, we propose a dynamic causal relationship analysis approach to construct impact graphs amongst applications without given the topology. A heuristic investigation algorithm based on second-order random walk is proposed to identify the culprit services which are responsible for cloud incidents. Experimental results in both simulation environment and IBM Bluemix platform show that CloudRanger outperforms some state-of-the-art approaches with a 10% improvement in accuracy. It offers a fast identification of culprit services when an anomaly occurs. Moreover, this system can be deployed rapidly and easily in multiple kinds of cloud native systems without any predefined knowledge.
{"title":"CloudRanger: Root Cause Identification for Cloud Native Systems","authors":"Ping Wang, Jingmin Xu, Meng Ma, Weilan Lin, Disheng Pan, Y. Wang, Pengfei Chen","doi":"10.1109/CCGRID.2018.00076","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00076","url":null,"abstract":"As more and more systems are migrating to cloud environment, the cloud native system becomes a trend. This paper presents the challenges and implications when diagnosing root causes for cloud native systems by analyzing some real incidents occurred in IBM Bluemix (a large commercial cloud). To tackle these challenges, we propose CloudRanger, a novel system dedicated for cloud native systems. To make our system more general, we propose a dynamic causal relationship analysis approach to construct impact graphs amongst applications without given the topology. A heuristic investigation algorithm based on second-order random walk is proposed to identify the culprit services which are responsible for cloud incidents. Experimental results in both simulation environment and IBM Bluemix platform show that CloudRanger outperforms some state-of-the-art approaches with a 10% improvement in accuracy. It offers a fast identification of culprit services when an anomaly occurs. Moreover, this system can be deployed rapidly and easily in multiple kinds of cloud native systems without any predefined knowledge.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130277859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
State-of-the-art synchronous graph processing frameworks face both inefficiency and imbalance issues that cause their performance to be suboptimal. These issues include the inefficiency of communication and the imbalanced graph computation/communication costs in an iteration. We propose to replace their conventional two-sided communication model with the one-sided counterpart. Accordingly, we design SHMEMGraph, an efficient and balanced graph processing framework that is formulated across a global memory space and takes advantage of the flexibility and efficiency of one-sided communication for graph processing. Through an efficient one-sided communication channel, SHMEMGraph utilizes the high-performance operations with RDMA while minimizing the resource contention within a computer node. In addition, SHMEMGraph synthesizes a number of optimizations to address both computation imbalance and communication imbalance. By using a graph of 1 billion edges, our evaluation shows that compared to the state-of-the-art Gemini framework, SHMEMGraph achieves an average improvement of 35.5% in terms of job completion time for five representative graph algorithms.
{"title":"SHMEMGraph: Efficient and Balanced Graph Processing Using One-Sided Communication","authors":"Huansong Fu, Manjunath Gorentla Venkata, Shaeke Salman, N. Imam, Weikuan Yu","doi":"10.1109/CCGRID.2018.00078","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00078","url":null,"abstract":"State-of-the-art synchronous graph processing frameworks face both inefficiency and imbalance issues that cause their performance to be suboptimal. These issues include the inefficiency of communication and the imbalanced graph computation/communication costs in an iteration. We propose to replace their conventional two-sided communication model with the one-sided counterpart. Accordingly, we design SHMEMGraph, an efficient and balanced graph processing framework that is formulated across a global memory space and takes advantage of the flexibility and efficiency of one-sided communication for graph processing. Through an efficient one-sided communication channel, SHMEMGraph utilizes the high-performance operations with RDMA while minimizing the resource contention within a computer node. In addition, SHMEMGraph synthesizes a number of optimizations to address both computation imbalance and communication imbalance. By using a graph of 1 billion edges, our evaluation shows that compared to the state-of-the-art Gemini framework, SHMEMGraph achieves an average improvement of 35.5% in terms of job completion time for five representative graph algorithms.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132445599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00033
Wei Zhang, Yong Chen, Dong Dai
Many graph-related applications face the challenge of managing excessive and ever-growing graph data in a distributed environment. Therefore, it is necessary to consider a graph partitioning algorithm to distribute graph data onto multiple machines as the data comes in. Balancing data distribution and minimizing edge-cut ratio are two basic pursuits of the graph partitioning problem. While achieving balanced partitions for streaming graphs is easy, existing graph partitioning algorithms either fail to work on streaming workloads, or leave edge-cut ratio to be further improved. Our research aims to provide a better solution that fits the need of streaming graph partitioning in a distributed system, which further reduces the edge-cut ratio while maintaining rough balance among all partitions. We exploit the similarity measure on the degree of vertices to gather structuralrelated vertices in the same partition as much as possible, this reduces the edge-cut ratio even further as compared to the state-of-the-art streaming graph partitioning algorithm - FENNEL. Our evaluation shows that our streaming graph partitioning algorithm is able to achieve better partitioning quality in terms of edge-cut ratio (up to 20% reduction as compared to FENNEL) while maintaining decent balance between all partitions, and such improvement applies to various real-life graphs.
{"title":"AKIN: A Streaming Graph Partitioning Algorithm for Distributed Graph Storage Systems","authors":"Wei Zhang, Yong Chen, Dong Dai","doi":"10.1109/CCGRID.2018.00033","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00033","url":null,"abstract":"Many graph-related applications face the challenge of managing excessive and ever-growing graph data in a distributed environment. Therefore, it is necessary to consider a graph partitioning algorithm to distribute graph data onto multiple machines as the data comes in. Balancing data distribution and minimizing edge-cut ratio are two basic pursuits of the graph partitioning problem. While achieving balanced partitions for streaming graphs is easy, existing graph partitioning algorithms either fail to work on streaming workloads, or leave edge-cut ratio to be further improved. Our research aims to provide a better solution that fits the need of streaming graph partitioning in a distributed system, which further reduces the edge-cut ratio while maintaining rough balance among all partitions. We exploit the similarity measure on the degree of vertices to gather structuralrelated vertices in the same partition as much as possible, this reduces the edge-cut ratio even further as compared to the state-of-the-art streaming graph partitioning algorithm - FENNEL. Our evaluation shows that our streaming graph partitioning algorithm is able to achieve better partitioning quality in terms of edge-cut ratio (up to 20% reduction as compared to FENNEL) while maintaining decent balance between all partitions, and such improvement applies to various real-life graphs.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122249496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00091
Baoquan Zhang, R. Rajachandrasekar, Alireza Haghdoost, Lance Evans, D. Du
The T10 DIF (Data Integrity Field) and DIX (Data Integrity Extension) specifications provide mechanisms to guarantee end-to-end data integrity and protection in the face of silent data corruption in modern storage systems. However, the Multiple Devices (MD) software RAID driver in Linux does not fully leverage these capabilities to provide such end-to-end guarantees with widely-used RAID modes such as 5 and 6, thereby causing an "integrity gap" in the Linux I/O stack. This paper describes the design and performance characteristics of a DIX-aware MD module that plugs this integrity gap with minimal overhead to client applications. A PI (Protection Information) operator is added in MD to handle the PI-related operations, and dedicated buffers for PI are allocated and managed in MD RAID-5/6 personality's stripe structures to generate, store, and verify the PI. This allows seamless exchange of PI information among end-applications running in user mode, file systems, the linux block layer, and PI-capable HBAs and drives. Our evaluations show that the DIX-aware MD module has the capability of detecting SDC with the tolerable performance penalty.
{"title":"Improving Data Integrity in Linux Software RAID with Protection Information (T10-PI)","authors":"Baoquan Zhang, R. Rajachandrasekar, Alireza Haghdoost, Lance Evans, D. Du","doi":"10.1109/CCGRID.2018.00091","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00091","url":null,"abstract":"The T10 DIF (Data Integrity Field) and DIX (Data Integrity Extension) specifications provide mechanisms to guarantee end-to-end data integrity and protection in the face of silent data corruption in modern storage systems. However, the Multiple Devices (MD) software RAID driver in Linux does not fully leverage these capabilities to provide such end-to-end guarantees with widely-used RAID modes such as 5 and 6, thereby causing an \"integrity gap\" in the Linux I/O stack. This paper describes the design and performance characteristics of a DIX-aware MD module that plugs this integrity gap with minimal overhead to client applications. A PI (Protection Information) operator is added in MD to handle the PI-related operations, and dedicated buffers for PI are allocated and managed in MD RAID-5/6 personality's stripe structures to generate, store, and verify the PI. This allows seamless exchange of PI information among end-applications running in user mode, file systems, the linux block layer, and PI-capable HBAs and drives. Our evaluations show that the DIX-aware MD module has the capability of detecting SDC with the tolerable performance penalty.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127116171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00034
Xiaojia Song, T. Xie, Wen Pan
Existing in-storage processing (ISP) techniques mainly focus on maximizing data processing rate by always utilizing total storage data processing resources for all applications. We find that this "always running in full gear" strategy wastes energy for some applications with a low data processing complexity. In this paper we propose RISP (Reconfigurable ISP), an energy-aware reconfigurable ISP framework that employs FPGA as data processing cells and NVM controllers. It can reconfigure storage data processing resources to achieve a high energy-efficiency without any performance degradation for big data analysis applications. RISP is modeled and then validated on an FPGA board. Experimental results show that compared with traditional host-CPU based computing RISP (with 16 channels or more) improves performance by 1.6-25.4× while saving energy by a factor of 2.2-161. Further, its reconfigurability can provide up to 77.2% additional energy saving by judiciously enabling data processing resources that are sufficient for an application.
{"title":"RISP: A Reconfigurable In-Storage Processing Framework with Energy-Awareness","authors":"Xiaojia Song, T. Xie, Wen Pan","doi":"10.1109/CCGRID.2018.00034","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00034","url":null,"abstract":"Existing in-storage processing (ISP) techniques mainly focus on maximizing data processing rate by always utilizing total storage data processing resources for all applications. We find that this \"always running in full gear\" strategy wastes energy for some applications with a low data processing complexity. In this paper we propose RISP (Reconfigurable ISP), an energy-aware reconfigurable ISP framework that employs FPGA as data processing cells and NVM controllers. It can reconfigure storage data processing resources to achieve a high energy-efficiency without any performance degradation for big data analysis applications. RISP is modeled and then validated on an FPGA board. Experimental results show that compared with traditional host-CPU based computing RISP (with 16 channels or more) improves performance by 1.6-25.4× while saving energy by a factor of 2.2-161. Further, its reconfigurability can provide up to 77.2% additional energy saving by judiciously enabling data processing resources that are sufficient for an application.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129931795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00059
Nnamdi Ekwe-Ekwe, A. Barker
Cloud computing is a ubiquitous part of the computing landscape. For many companies today, moving their computing infrastructure to the cloud reduces time to deployment and saves money. Spot Instances, a subset of Amazon's cloud computing infrastructure (EC2), expands upon this. They allow a user to bid on spare compute capacity in EC2 at heavily discounted prices. If other bids for the spare capacity exceeds the user's own, the user's instance is terminated. In this paper, we conduct one of the first detailed analyses of how location affects the overall cost of deployment of a Spot Instance. We analyse pricing data across all available AWS regions for 60 days for a variety of Spot Instances. We relate the pricing data we find to the overall AWS region and examine any patterns we see across the week. We find that location plays a critical role in Spot Instance pricing and that pricing differs, sometimes markedly, from region to region. We conclude by showing that it is indeed possible to run workloads on Spot Instances with low risk of termination and a low overall cost.
{"title":"Location, Location, Location: Exploring Amazon EC2 Spot Instance Pricing Across Geographical Regions","authors":"Nnamdi Ekwe-Ekwe, A. Barker","doi":"10.1109/CCGRID.2018.00059","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00059","url":null,"abstract":"Cloud computing is a ubiquitous part of the computing landscape. For many companies today, moving their computing infrastructure to the cloud reduces time to deployment and saves money. Spot Instances, a subset of Amazon's cloud computing infrastructure (EC2), expands upon this. They allow a user to bid on spare compute capacity in EC2 at heavily discounted prices. If other bids for the spare capacity exceeds the user's own, the user's instance is terminated. In this paper, we conduct one of the first detailed analyses of how location affects the overall cost of deployment of a Spot Instance. We analyse pricing data across all available AWS regions for 60 days for a variety of Spot Instances. We relate the pricing data we find to the overall AWS region and examine any patterns we see across the week. We find that location plays a critical role in Spot Instance pricing and that pricing differs, sometimes markedly, from region to region. We conclude by showing that it is indeed possible to run workloads on Spot Instances with low risk of termination and a low overall cost.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117066068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00072
Pierre Matri, María S. Pérez, Alexandru Costan, Gabriel Antoniu
Small files are known to pose major performance challenges for file systems. Yet, such workloads are increasingly common in a number of Big Data Analytics workflows or large-scale HPC simulations. These challenges are mainly caused by the common architecture of most state-of-the-art file systems needing one or multiple metadata requests before being able to read from a file. Small input file size causes the overhead of this metadata management to gain relative importance as the size of each file decreases. In this paper we propose a set of techniques leveraging consistent hashing and dynamic metadata replication to significantly reduce this metadata overhead. We implement such techniques inside a new file system named TýrFS, built as a thin layer above the Týr object store. We prove that TýrFS increases small file access performance up to one order of magnitude compared to other state-of-the-art file systems, while only causing a minimal impact on file write throughput.
{"title":"TýrFS: Increasing Small Files Access Performance with Dynamic Metadata Replication","authors":"Pierre Matri, María S. Pérez, Alexandru Costan, Gabriel Antoniu","doi":"10.1109/CCGRID.2018.00072","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00072","url":null,"abstract":"Small files are known to pose major performance challenges for file systems. Yet, such workloads are increasingly common in a number of Big Data Analytics workflows or large-scale HPC simulations. These challenges are mainly caused by the common architecture of most state-of-the-art file systems needing one or multiple metadata requests before being able to read from a file. Small input file size causes the overhead of this metadata management to gain relative importance as the size of each file decreases. In this paper we propose a set of techniques leveraging consistent hashing and dynamic metadata replication to significantly reduce this metadata overhead. We implement such techniques inside a new file system named TýrFS, built as a thin layer above the Týr object store. We prove that TýrFS increases small file access performance up to one order of magnitude compared to other state-of-the-art file systems, while only causing a minimal impact on file write throughput.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124521707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}