Shiping Chen, Danan Thilakanathan, Donna Xu, S. Nepal, R. Calvo
Although content sharing provides many benefits, content owners lose full control of their content once they are given away. Existing solutions provide limited capabilities of content access control as they are vendor-specific, non-structured and non-flexible. In this paper, we present an open and flexible software solution called SelfProtect Object (SPO). SPO bundles content and policy files in an object that can protect its contents by itself anywhere and anytime. Our policy is based on XACML, a generic policy language allowing fine-grain access with rules and conditions. We also design and implement a prototype of SPO and demonstrate its capability through examples. Our solution is flexible to express a variety of access control rules and open to integrate into different applications on different platforms.
{"title":"Self Protecting Data Sharing Using Generic Policies","authors":"Shiping Chen, Danan Thilakanathan, Donna Xu, S. Nepal, R. Calvo","doi":"10.1109/CCGrid.2015.84","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.84","url":null,"abstract":"Although content sharing provides many benefits, content owners lose full control of their content once they are given away. Existing solutions provide limited capabilities of content access control as they are vendor-specific, non-structured and non-flexible. In this paper, we present an open and flexible software solution called SelfProtect Object (SPO). SPO bundles content and policy files in an object that can protect its contents by itself anywhere and anytime. Our policy is based on XACML, a generic policy language allowing fine-grain access with rules and conditions. We also design and implement a prototype of SPO and demonstrate its capability through examples. Our solution is flexible to express a variety of access control rules and open to integrate into different applications on different platforms.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"31 1","pages":"1197-1200"},"PeriodicalIF":0.0,"publicationDate":"2015-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75029535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The widespread use of mass spectrometry for protein identification has created an urgent demand for improving computational efficiency of matching mass spectrometry data to protein databases. With the rapid development of chip technology and parallel computing technique, such as multi-core processor, many-core coprocessor and cluster of multi-node, the speed and performance of the major mass spectral search engines are continuously improving. In recent ten years, X!Tandem as a popular and representative open-source program in searching mass spectral has extended several parallel versions and obtains considerable speedups. However, because these parallel strategies are mainly based on cluster of nodes, higher costs (e.g., charge of electricity and maintenance) is needed to get limited speedups. Fortunately, Intel Many Integrated Core (MIC) architecture and Graphics Processing Unit (GPU) are ideal for this problem. In this paper, we present and implement a parallel strategy to X!Tandem using MIC called MIC-Tandem, That shows excellent speedups on commodity hardware and produces the same results as the original program.
{"title":"MIC-Tandem: Parallel X!Tandem Using MIC on Tandem Mass Spectrometry Based Proteomics Data","authors":"Pinjie He, Kenli Li","doi":"10.1109/CCGrid.2015.31","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.31","url":null,"abstract":"The widespread use of mass spectrometry for protein identification has created an urgent demand for improving computational efficiency of matching mass spectrometry data to protein databases. With the rapid development of chip technology and parallel computing technique, such as multi-core processor, many-core coprocessor and cluster of multi-node, the speed and performance of the major mass spectral search engines are continuously improving. In recent ten years, X!Tandem as a popular and representative open-source program in searching mass spectral has extended several parallel versions and obtains considerable speedups. However, because these parallel strategies are mainly based on cluster of nodes, higher costs (e.g., charge of electricity and maintenance) is needed to get limited speedups. Fortunately, Intel Many Integrated Core (MIC) architecture and Graphics Processing Unit (GPU) are ideal for this problem. In this paper, we present and implement a parallel strategy to X!Tandem using MIC called MIC-Tandem, That shows excellent speedups on commodity hardware and produces the same results as the original program.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"42 1","pages":"717-720"},"PeriodicalIF":0.0,"publicationDate":"2015-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90329808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feroz Zahid, Ernst Gunnar Gran, Bartosz Bogdanski, Bjørn Dag Johnsen, T. Skeie
InfiniBand (IB) is a widely used network interconnect for modern high-performance computing systems. In large IB fabrics, isolation of nodes is provided through partitioning. The routing algorithm, however, is unaware of these partitions in the network, Traffic flows belonging to different partitions might share links inside the network fabric. This sharing of intermediate links creates interference, which is particularly critical to avoid in multi-tenant environments like a cloud. In such systems, each tenant should experience predictable network performance, unaffected by the workload of other tenants. In addition, using current routing schemes, routes crossing partition boundaries are considered when distributing routes onto links in the network, despite the fact that these routes will never be used. The result is degraded load-balancing. In this paper, we present a novel partition-aware fat-tree routing algorithm, pFTree. The pFTree algorithm utilizes several mechanisms to provide network-wide isolation of partitions belonging to different tenant groups. Given the available network resources, pFTree starts by isolating partitions at the physical link level, and then moves on to utilize virtual lanes, if needed. Our experiments and simulations show that pFTree is able to significantly reduce the affect of inter-partition interference without any additional functional overhead. Furthermore, pFTree also provides improved load-balancing over the de facto standard IB fat-tree routing algorithm.
{"title":"Partition-Aware Routing to Improve Network Isolation in Infiniband Based Multi-tenant Clusters","authors":"Feroz Zahid, Ernst Gunnar Gran, Bartosz Bogdanski, Bjørn Dag Johnsen, T. Skeie","doi":"10.1109/CCGrid.2015.96","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.96","url":null,"abstract":"InfiniBand (IB) is a widely used network interconnect for modern high-performance computing systems. In large IB fabrics, isolation of nodes is provided through partitioning. The routing algorithm, however, is unaware of these partitions in the network, Traffic flows belonging to different partitions might share links inside the network fabric. This sharing of intermediate links creates interference, which is particularly critical to avoid in multi-tenant environments like a cloud. In such systems, each tenant should experience predictable network performance, unaffected by the workload of other tenants. In addition, using current routing schemes, routes crossing partition boundaries are considered when distributing routes onto links in the network, despite the fact that these routes will never be used. The result is degraded load-balancing. In this paper, we present a novel partition-aware fat-tree routing algorithm, pFTree. The pFTree algorithm utilizes several mechanisms to provide network-wide isolation of partitions belonging to different tenant groups. Given the available network resources, pFTree starts by isolating partitions at the physical link level, and then moves on to utilize virtual lanes, if needed. Our experiments and simulations show that pFTree is able to significantly reduce the affect of inter-partition interference without any additional functional overhead. Furthermore, pFTree also provides improved load-balancing over the de facto standard IB fat-tree routing algorithm.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"28 1","pages":"189-198"},"PeriodicalIF":0.0,"publicationDate":"2015-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88473759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Flash memory SSD has emerged as a promising storage media and fits naturally as a cache between the system RAM and the disk due to its performance/cost characteristics. Managing such an SSD cache is challenging and traditional cache replacements do not work well because of SSDs asymmetric read/write performances and wearing issues. This paper presents a new cache replacement algorithm referred to as F/M-CIP that accelerates disk I/O greatly. The idea is dividing the traditional LRU list into 4 parts: candidate-list, SSD-list, RAM-list and eviction-buffer-list. Upon a cache miss, the metadata of the missed block is conservatively inserted into the candidate-list but the data itself is not cached. The block in the candidate-list is then conservatively promoted to the RAM-list upon the k-th miss. At the bottom of the RAM-list, the eviction-buffer accumulates LRU blocks to be written into the SSD cache in batches to exploit the internal parallelism of SSD. The SSD-list is managed using a combination of regency and frequency replacement policies by means of conservative promotion upon hits. To quantitatively evaluate the performance of F/M-CIP, a prototype has been built on Linux kernel at the generic block layer. Experimental results on standard benchmarks and real world traces have shown that F/M-CIP accelerates disk I/O performance up to an order of magnitude compared to the traditional hard disk storage and up to a factor of 3 compared to the traditional SSD cache algorithm in terms of application execution time. Furthermore, F/M-CIP substantially reduces write operations to the SSD implying prolonged durability.
{"title":"F/M-CIP: Implementing Flash Memory Cache Using Conservative Insertion and Promotion","authors":"J. Yang, Q. Yang","doi":"10.1109/CCGrid.2015.119","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.119","url":null,"abstract":"Flash memory SSD has emerged as a promising storage media and fits naturally as a cache between the system RAM and the disk due to its performance/cost characteristics. Managing such an SSD cache is challenging and traditional cache replacements do not work well because of SSDs asymmetric read/write performances and wearing issues. This paper presents a new cache replacement algorithm referred to as F/M-CIP that accelerates disk I/O greatly. The idea is dividing the traditional LRU list into 4 parts: candidate-list, SSD-list, RAM-list and eviction-buffer-list. Upon a cache miss, the metadata of the missed block is conservatively inserted into the candidate-list but the data itself is not cached. The block in the candidate-list is then conservatively promoted to the RAM-list upon the k-th miss. At the bottom of the RAM-list, the eviction-buffer accumulates LRU blocks to be written into the SSD cache in batches to exploit the internal parallelism of SSD. The SSD-list is managed using a combination of regency and frequency replacement policies by means of conservative promotion upon hits. To quantitatively evaluate the performance of F/M-CIP, a prototype has been built on Linux kernel at the generic block layer. Experimental results on standard benchmarks and real world traces have shown that F/M-CIP accelerates disk I/O performance up to an order of magnitude compared to the traditional hard disk storage and up to a factor of 3 compared to the traditional SSD cache algorithm in terms of application execution time. Furthermore, F/M-CIP substantially reduces write operations to the SSD implying prolonged durability.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"256 1","pages":"81-90"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73554132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anna Giannakou, Louis Rilling, Jean-Louis Pazat, Frédéric Majorczyk, C. Morin
Traditional intrusion detection systems are not adaptive enough to cope with the dynamic characteristics of cloud-hosted virtual infrastructures. This makes them unable to address new cloud-oriented security issues. In this paper we introduce SAIDS, a self-adaptable intrusion detection system tailored for cloud environments. SAIDS is designed to re-configure its components based on environmental changes. A prototype of SAIDS is described.
{"title":"Towards Self Adaptable Security Monitoring in IaaS Clouds","authors":"Anna Giannakou, Louis Rilling, Jean-Louis Pazat, Frédéric Majorczyk, C. Morin","doi":"10.1109/CCGrid.2015.133","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.133","url":null,"abstract":"Traditional intrusion detection systems are not adaptive enough to cope with the dynamic characteristics of cloud-hosted virtual infrastructures. This makes them unable to address new cloud-oriented security issues. In this paper we introduce SAIDS, a self-adaptable intrusion detection system tailored for cloud environments. SAIDS is designed to re-configure its components based on environmental changes. A prototype of SAIDS is described.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"81 1","pages":"737-740"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74173136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computing models provide the parallel and distributed algorithms for cloud. The ability to estimate the performance of parallel computing models for efficient resource scheduling is critical. Current techniques for predicting the performance are mostly based on analyzing and simulating. The behavior of parallel computing model directly leads to the diversity of mathematical model. Without a general prediction model, it is very hard to compare fairly different parallel computing models in several critical aspects, including computing capacity, resource configuration, scalability, fault tolerance and so on. In this paper, we design a mathematical model for predicting the performance by using queuing system. We make various computing models as a service system for shielding the diversity. The performance can be accurately estimated with the job waiting time and the job performing time. The heterogeneity of computing nodes may also be considered.
{"title":"Predicting the Performance of Parallel Computing Models Using Queuing System","authors":"Chao Shen, W. Tong, Samina Kausar","doi":"10.1109/CCGrid.2015.92","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.92","url":null,"abstract":"Computing models provide the parallel and distributed algorithms for cloud. The ability to estimate the performance of parallel computing models for efficient resource scheduling is critical. Current techniques for predicting the performance are mostly based on analyzing and simulating. The behavior of parallel computing model directly leads to the diversity of mathematical model. Without a general prediction model, it is very hard to compare fairly different parallel computing models in several critical aspects, including computing capacity, resource configuration, scalability, fault tolerance and so on. In this paper, we design a mathematical model for predicting the performance by using queuing system. We make various computing models as a service system for shielding the diversity. The performance can be accurately estimated with the job waiting time and the job performing time. The heterogeneity of computing nodes may also be considered.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"60 1","pages":"757-760"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75643744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Liu, N. Rameshan, Enric Monte-Moreno, Vladimir Vlassov, Leandro Navarro-Moldes
Provisioning tasteful services in the Cloud that guarantees high quality of service with reduced hosting cost is challenging to achieve. There are two typical auto-scaling approaches: predictive and reactive. A prediction based controller leaves the system enough time to react to workload changes while a feedback based controller scales the system with better accuracy. In this paper, we show the limitations of using a proactive or reactive approach in isolation to scale a tasteful system and the overhead involved. To overcome the limitations, we implement an elasticity controller, ProRenaTa, which combines both reactive and proactive approaches to leverage on their respective advantages and also implements a data migration model to handle the scaling overhead. We show that the combination of reactive and proactive approaches outperforms the state of the art approaches. Our experiments with Wikipedia workload trace indicate that ProRenaTa guarantees a high level of SLA commitments while improving the overall resource utilization.
{"title":"ProRenaTa: Proactive and Reactive Tuning to Scale a Distributed Storage System","authors":"Y. Liu, N. Rameshan, Enric Monte-Moreno, Vladimir Vlassov, Leandro Navarro-Moldes","doi":"10.1109/CCGrid.2015.26","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.26","url":null,"abstract":"Provisioning tasteful services in the Cloud that guarantees high quality of service with reduced hosting cost is challenging to achieve. There are two typical auto-scaling approaches: predictive and reactive. A prediction based controller leaves the system enough time to react to workload changes while a feedback based controller scales the system with better accuracy. In this paper, we show the limitations of using a proactive or reactive approach in isolation to scale a tasteful system and the overhead involved. To overcome the limitations, we implement an elasticity controller, ProRenaTa, which combines both reactive and proactive approaches to leverage on their respective advantages and also implements a data migration model to handle the scaling overhead. We show that the combination of reactive and proactive approaches outperforms the state of the art approaches. Our experiments with Wikipedia workload trace indicate that ProRenaTa guarantees a high level of SLA commitments while improving the overall resource utilization.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"108 1","pages":"453-464"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75877063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large scale multiprocessor array suffers from frequent hardware defects or soft faults due to overheating, overload or occupancy by other running applications. To obtain fault-free logical array, reconfiguration techniques are proposed to reuse the fault-free PEs by changing the interconnection among PEs. Previous research has worked on this topic but assume that switches and links are fault-free. In this paper, we consider faults not only on the processing elements (PEs) but also on the switches and links, and develop efficient algorithms to construct as large as possible logical arrays with optimized networks length. To deal with the faults on switches and links, an efficient pre-processing procedure is designed, in which switch faults are transformed into link faults, and then faulty links are classified into several categories to handle. Then, we propose an efficient algorithm, A-MLA, to produce as many as possible logical columns which are then combined to form a two dimensional processor array. After that, we propose an algorithm A-TMLA to reduce the interconnection length of the logical array obtained by algorithm A-MLA, as short interconnect leads to small communication latency and power consumption. Extensive experimental results show that, even with switch faults and link faults, our approach can produce larger logical fault-free arrays with shorter interconnection length, compared to the state-of-the-art.
{"title":"Reconfigurations for Processor Arrays with Faulty Switches and Links","authors":"W. Jigang, Longting Zhu, Peilan He, Guiyuan Jiang","doi":"10.1109/CCGrid.2015.47","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.47","url":null,"abstract":"Large scale multiprocessor array suffers from frequent hardware defects or soft faults due to overheating, overload or occupancy by other running applications. To obtain fault-free logical array, reconfiguration techniques are proposed to reuse the fault-free PEs by changing the interconnection among PEs. Previous research has worked on this topic but assume that switches and links are fault-free. In this paper, we consider faults not only on the processing elements (PEs) but also on the switches and links, and develop efficient algorithms to construct as large as possible logical arrays with optimized networks length. To deal with the faults on switches and links, an efficient pre-processing procedure is designed, in which switch faults are transformed into link faults, and then faulty links are classified into several categories to handle. Then, we propose an efficient algorithm, A-MLA, to produce as many as possible logical columns which are then combined to form a two dimensional processor array. After that, we propose an algorithm A-TMLA to reduce the interconnection length of the logical array obtained by algorithm A-MLA, as short interconnect leads to small communication latency and power consumption. Extensive experimental results show that, even with switch faults and link faults, our approach can produce larger logical fault-free arrays with shorter interconnection length, compared to the state-of-the-art.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"86 1","pages":"141-148"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74536934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ning Liu, Xi Yang, Xian-He Sun, John Jenkins, R. Ross
Despite the popularity of the Apache Hadoop system, its success has been limited by issues such as single points of failure, centralized job/task management, and lack of support for programming models other than MapReduce. The next generation of Hadoop, Apache Hadoop YARN, is designed to address these issues. In this paper, we propose YARNsim, a simulation system for Hadoop YARN. YARNsim is based on parallel discrete event simulation and provides protocol-level accuracy in simulating key components of YARN. YARNsim provides a virtual platform on which system architects can evaluate the design and implementation of Hadoop YARN systems. Also, application developers can tune job performance and understand the tradeoffs between different configurations, and Hadoop YARN system vendors can evaluate system efficiency under limited budgets. To demonstrate the validity of YARNsim, we use it to model two real systems and compare the experimental results from YARNsim and the real systems. The experiments include standard Hadoop benchmarks, synthetic workloads, and a bioinformatics application. The results show that the error rate is within 10% for the majority of test cases. The experiments prove that YARNsim can provide what-if analysis for system designers in a timely manner and at minimal cost compared with testing and evaluating on a real system.
{"title":"YARNsim: Simulating Hadoop YARN","authors":"Ning Liu, Xi Yang, Xian-He Sun, John Jenkins, R. Ross","doi":"10.1109/CCGrid.2015.61","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.61","url":null,"abstract":"Despite the popularity of the Apache Hadoop system, its success has been limited by issues such as single points of failure, centralized job/task management, and lack of support for programming models other than MapReduce. The next generation of Hadoop, Apache Hadoop YARN, is designed to address these issues. In this paper, we propose YARNsim, a simulation system for Hadoop YARN. YARNsim is based on parallel discrete event simulation and provides protocol-level accuracy in simulating key components of YARN. YARNsim provides a virtual platform on which system architects can evaluate the design and implementation of Hadoop YARN systems. Also, application developers can tune job performance and understand the tradeoffs between different configurations, and Hadoop YARN system vendors can evaluate system efficiency under limited budgets. To demonstrate the validity of YARNsim, we use it to model two real systems and compare the experimental results from YARNsim and the real systems. The experiments include standard Hadoop benchmarks, synthetic workloads, and a bioinformatics application. The results show that the error rate is within 10% for the majority of test cases. The experiments prove that YARNsim can provide what-if analysis for system designers in a timely manner and at minimal cost compared with testing and evaluating on a real system.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"21 1","pages":"637-646"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73403525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The successive over-relaxation (SOR) algorithm is an important method to solve the linear equations in the numerical marine forecasting model, but it is not appropriate for parallelism. In this paper, the red-black ordering and the method to avoid cost of communication are used to implement the parallelism of the SOR algorithm and improve the performance of it. The experiments show that the performance of the parallel SOR algorithm with the red-black ordering and communication optimization is high, but the errors between the serial SOR algorithm and the parallel SOR algorithm are bigger and bigger with the increase of computing time steps. Based on the characteristics of the numerical marine forecasting model, a four-step parallel SOR algorithm is designed to solve the error problem.
{"title":"Parallel Solving Method of SOR Based on the Numerical Marine Forecasting Model","authors":"Renbo Pang, Jianliang Xu, Yunquan Zhang","doi":"10.1109/CCGrid.2015.117","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.117","url":null,"abstract":"The successive over-relaxation (SOR) algorithm is an important method to solve the linear equations in the numerical marine forecasting model, but it is not appropriate for parallelism. In this paper, the red-black ordering and the method to avoid cost of communication are used to implement the parallelism of the SOR algorithm and improve the performance of it. The experiments show that the performance of the parallel SOR algorithm with the red-black ordering and communication optimization is high, but the errors between the serial SOR algorithm and the parallel SOR algorithm are bigger and bigger with the increase of computing time steps. Based on the characteristics of the numerical marine forecasting model, a four-step parallel SOR algorithm is designed to solve the error problem.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"24 1","pages":"733-736"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75323330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}