This paper describes an extension of the Consensus Service proposed by Guerraoui and Schiper. The objective is to provide a standard way to implement agreement protocols resilient to Byzantine faults using an intrusion tolerant service built upon virtual machines technology. This is achieved through the implementation of a Generic Consensus Service (GCS). GCS separates specificities of different agreement problems from consensus in a clear way, using client server interaction, allowing total independence between consensus protocols used and problem specific specializations. Besides that, the framework provides a set of properties and guarantees. It will be shown how the GCS works, its general properties and how it may be used to solve some agreement problems, for instance, reliable and atomic broadcast.
{"title":"Consensus Service to Solve Agreement Problems","authors":"G. Pieri, J. Fraga, L. Lung","doi":"10.1109/ICPADS.2010.81","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.81","url":null,"abstract":"This paper describes an extension of the Consensus Service proposed by Guerraoui and Schiper. The objective is to provide a standard way to implement agreement protocols resilient to Byzantine faults using an intrusion tolerant service built upon virtual machines technology. This is achieved through the implementation of a Generic Consensus Service (GCS). GCS separates specificities of different agreement problems from consensus in a clear way, using client server interaction, allowing total independence between consensus protocols used and problem specific specializations. Besides that, the framework provides a set of properties and guarantees. It will be shown how the GCS works, its general properties and how it may be used to solve some agreement problems, for instance, reliable and atomic broadcast.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125831003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In extremely low-duty-cycle wireless sensor networks, a sender has to wait for a certain period of time to forward a packet until its receiver becomes active, which will result in longer end-to-end delay than ever. Many works have been done to improve delivery ratio but lack of the consideration on energy efficient delivery delay. In addition, unreliable links is another challenge in wireless sensor networks. Redundancy and multiple paths can be used to cope with unreliability, but neither of them is energy efficient. Even worse, both of them have poor performance on delivery delay. In this work, we introduce a novel way of allocating erasure coded blocks over multiple paths to improve energy efficient delivery delay while achieving comparably high delivery ratio. We evaluate our algorithm with extensive simulations. Evaluations show that our design decreases delivery delay greatly with slight decrease in delivery ratio.
{"title":"Achieving Lower Delay with Energy Efficiency in Extremely Low-Duty-Cycle and Unreliable WSN","authors":"Yubo Yan, Panlong Yang, Lei Zhang, Xiaoming Tang","doi":"10.1109/ICPADS.2010.61","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.61","url":null,"abstract":"In extremely low-duty-cycle wireless sensor networks, a sender has to wait for a certain period of time to forward a packet until its receiver becomes active, which will result in longer end-to-end delay than ever. Many works have been done to improve delivery ratio but lack of the consideration on energy efficient delivery delay. In addition, unreliable links is another challenge in wireless sensor networks. Redundancy and multiple paths can be used to cope with unreliability, but neither of them is energy efficient. Even worse, both of them have poor performance on delivery delay. In this work, we introduce a novel way of allocating erasure coded blocks over multiple paths to improve energy efficient delivery delay while achieving comparably high delivery ratio. We evaluate our algorithm with extensive simulations. Evaluations show that our design decreases delivery delay greatly with slight decrease in delivery ratio.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116629674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The traditional client-server architecture widely adopted on the Internet is not adequate to meet the increasing user loads and bandwidth demands in live streaming systems especially for multimedia content delivery. Peer-to-peer P2P) overlay networks provide excellent system scalability and high resource utilization, which make it an attractive solution to this problem. This paper considers a hybrid hierarchical P2P overlay network structure that consists of both super and normal peers. The media streaming architecture is built upon a tree structured network of super peers and the tree construction process has a significant impact on the overall system performance. We construct network cost models and formulate a Bandwidth Constrained Tree (BCT) construction problem, which aims at maximizing the number of peers that satisfy a specified bandwidth constraint. We prove that BCT is NP-complete and propose optimal algorithms in two special cases and a heuristic approach in a general case. The performance superiority of the proposed method is illustrated by an extensive set of experiments on simulated networks of various sizes in comparison with existing greedy and degree constrained algorithms.
{"title":"Bandwidth Constrained Tree Construction for Live Streaming Systems in P2P Networks","authors":"Yunyue Lin, C. Wu, Xukang Lu, Yi Gu","doi":"10.1109/ICPADS.2010.39","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.39","url":null,"abstract":"The traditional client-server architecture widely adopted on the Internet is not adequate to meet the increasing user loads and bandwidth demands in live streaming systems especially for multimedia content delivery. Peer-to-peer P2P) overlay networks provide excellent system scalability and high resource utilization, which make it an attractive solution to this problem. This paper considers a hybrid hierarchical P2P overlay network structure that consists of both super and normal peers. The media streaming architecture is built upon a tree structured network of super peers and the tree construction process has a significant impact on the overall system performance. We construct network cost models and formulate a Bandwidth Constrained Tree (BCT) construction problem, which aims at maximizing the number of peers that satisfy a specified bandwidth constraint. We prove that BCT is NP-complete and propose optimal algorithms in two special cases and a heuristic approach in a general case. The performance superiority of the proposed method is illustrated by an extensive set of experiments on simulated networks of various sizes in comparison with existing greedy and degree constrained algorithms.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130042191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The emerging participatory sensing applications have brought a privacy risk where users expose their location information. Most of the existing solutions preserve location privacy by generalizing a precise user location to a coarse-grained location, and hence they cannot be applied in those applications requiring fine-grained location information. To address this issue, in this paper we propose a novel method to preserve location privacy by anonymizing coarse-grained locations and retaining fine-grained locations using Attribute Based Encryption (ABE). In addition, we do not assume the service provider is an trustworthy entity, making our solution more feasible to practical applications. We present and analyze our security model, and evaluate the performance and scalability of our system.
{"title":"Privacy Protection in Participatory Sensing Applications Requiring Fine-Grained Locations","authors":"Kai Dong, Tao Gu, Xianping Tao, Jian Lu","doi":"10.1109/ICPADS.2010.127","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.127","url":null,"abstract":"The emerging participatory sensing applications have brought a privacy risk where users expose their location information. Most of the existing solutions preserve location privacy by generalizing a precise user location to a coarse-grained location, and hence they cannot be applied in those applications requiring fine-grained location information. To address this issue, in this paper we propose a novel method to preserve location privacy by anonymizing coarse-grained locations and retaining fine-grained locations using Attribute Based Encryption (ABE). In addition, we do not assume the service provider is an trustworthy entity, making our solution more feasible to practical applications. We present and analyze our security model, and evaluate the performance and scalability of our system.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114895015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human movement pattern can be a valuable information for rehabilitation therapy, sport medicine and elderly people monitoring, but acquisition of them through multi-citeacceler or meters would result in uncomfortable wearing and complex data processing. In this paper, method of using a single waist-fixed accelerometer to detect human movement pattern was investigated and evaluated. 10 subjects were asked to run or walk on a treadmill in a regular way. A 5th order Butterworth low pass filter with cutoff frequency 20Hz was designed to filter the acceleration data and denoise the sample. By collecting the velocity from treadmill as label data and the individual’s waist acceleration data, training data set was established. A Bayesian network classifier trained by EM learning algorithm was developed for human movement pattern assessing. Experiment showed that the method could predict the human walking and running state with a considerable accuracy more than 90%. Such accuracy could also be achieved even with a single superior-inferior acceleration feature. The classification of fast speed walking and normal speed one also achieved satisfying result. This indicated that in some application in which walking and running state were only needed to classify could employ the low power, low computational complexity uniaxial accelerometeras the human movement detector.
{"title":"A Pervasive Simplified Method for Human Movement Pattern Assessing","authors":"Mianbo Huang, Guoru Zhao, Lei Wang, Feng Yang","doi":"10.1109/ICPADS.2010.65","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.65","url":null,"abstract":"Human movement pattern can be a valuable information for rehabilitation therapy, sport medicine and elderly people monitoring, but acquisition of them through multi-citeacceler or meters would result in uncomfortable wearing and complex data processing. In this paper, method of using a single waist-fixed accelerometer to detect human movement pattern was investigated and evaluated. 10 subjects were asked to run or walk on a treadmill in a regular way. A 5th order Butterworth low pass filter with cutoff frequency 20Hz was designed to filter the acceleration data and denoise the sample. By collecting the velocity from treadmill as label data and the individual’s waist acceleration data, training data set was established. A Bayesian network classifier trained by EM learning algorithm was developed for human movement pattern assessing. Experiment showed that the method could predict the human walking and running state with a considerable accuracy more than 90%. Such accuracy could also be achieved even with a single superior-inferior acceleration feature. The classification of fast speed walking and normal speed one also achieved satisfying result. This indicated that in some application in which walking and running state were only needed to classify could employ the low power, low computational complexity uniaxial accelerometeras the human movement detector.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114647337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A Vehicular Ad Hoc Network (VANET) not only experiences highly mobile and frequently disconnected, but may also have to deal with rapid changes of network topologies, especially when accidents and road traffic jams happen. In this paper, we propose a novel approach to address this issue. Due to the intermittent connectivity in VANET, we adopt the idea of carry and forward, where a moving vehicle carries the message until forwarding the message to a new vehicle. Different from existing carry and forward solutions in VANET, we make use of the distributed real-time evaluations of data traffic statistics of all roads for each vehicle. Based on the evaluation of current message delivery delay along each road, each vehicle can find the routing path to forward the message to reduce the delay. We propose a distributed real-time data traffic statistics assisted routing protocol (DRTAR) to forward the message to the appropriate road. Experimental results show that the proposed DRTAR protocol outperforms other solutions.
车辆自组织网络(VANET)不仅具有高度移动性和频繁断开连接的特点,而且可能还必须应对网络拓扑的快速变化,特别是在发生事故和道路交通堵塞时。在本文中,我们提出了一种新的方法来解决这个问题。由于VANET中的间歇性连接,我们采用了携带和转发的想法,其中移动的车辆携带消息,直到将消息转发给新车辆。与现有的VANET carry and forward解决方案不同,我们利用了对每辆车的所有道路数据交通统计的分布式实时评估。基于对当前每条道路上的消息传递延迟的评估,每辆车可以找到路由路径来转发消息,以减少延迟。我们提出了一种分布式实时数据流量统计辅助路由协议(DRTAR),将消息转发到适当的道路。实验结果表明,所提出的DRTAR协议优于其他解决方案。
{"title":"Distributed Real-Time Data Traffic Statistics Assisted Routing Protocol for Vehicular Networks","authors":"Xiaoming Wang, Chao Song","doi":"10.1109/ICPADS.2010.57","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.57","url":null,"abstract":"A Vehicular Ad Hoc Network (VANET) not only experiences highly mobile and frequently disconnected, but may also have to deal with rapid changes of network topologies, especially when accidents and road traffic jams happen. In this paper, we propose a novel approach to address this issue. Due to the intermittent connectivity in VANET, we adopt the idea of carry and forward, where a moving vehicle carries the message until forwarding the message to a new vehicle. Different from existing carry and forward solutions in VANET, we make use of the distributed real-time evaluations of data traffic statistics of all roads for each vehicle. Based on the evaluation of current message delivery delay along each road, each vehicle can find the routing path to forward the message to reduce the delay. We propose a distributed real-time data traffic statistics assisted routing protocol (DRTAR) to forward the message to the appropriate road. Experimental results show that the proposed DRTAR protocol outperforms other solutions.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134395281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peer-to-Peer (P2P) technologies are developing rapidly and have gained popularity. The application of P2P to areas such as file sharing, collaborative business environment, and distributed computing requires secure communication among the nodes. Cluster-based P2P structure provides an efficient way to do file sharing and distributed computing. A key agreement protocol is a set of communication rules whereby two users establish a shared common key. The shared key is used by users in future secure communications. Supervising services for governing communications between two nodes is an important topic, especially in the area of government affairs. The proposed paper provides a framework for supporting a supervising mechanism in cluster-based P2P networks based on the concept of two-key agreement protocol. This mechanism uses the idea of hash-based two-key agreement protocol to help the nodes in higher level supervise the nodes in lower level for cluster-based P2P communication environment. In the proposed paper, a global cluster head supervises the whole network, cluster heads in each cluster supervise their own clusters’ communications. Security analyses show that the proposed mechanism is secure enough for P2P. Any two nodes within the same cluster generate their common session key by themselves. In the same cluster, no nodes gain this session key except the cluster head. Moreover, there are only two kinds of operations, hash operation and XOR operation, in the proposed mechanism. Hence, the proposed mechanism provides an efficient way to supervise the P2P network.
{"title":"A Two-Key Agreement Based Supervising Mechanism for Cluster-Based Peer-to-Peer Applications","authors":"Chun-Chieh Huang, Chi-Chun Lo","doi":"10.1109/ICPADS.2010.51","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.51","url":null,"abstract":"Peer-to-Peer (P2P) technologies are developing rapidly and have gained popularity. The application of P2P to areas such as file sharing, collaborative business environment, and distributed computing requires secure communication among the nodes. Cluster-based P2P structure provides an efficient way to do file sharing and distributed computing. A key agreement protocol is a set of communication rules whereby two users establish a shared common key. The shared key is used by users in future secure communications. Supervising services for governing communications between two nodes is an important topic, especially in the area of government affairs. The proposed paper provides a framework for supporting a supervising mechanism in cluster-based P2P networks based on the concept of two-key agreement protocol. This mechanism uses the idea of hash-based two-key agreement protocol to help the nodes in higher level supervise the nodes in lower level for cluster-based P2P communication environment. In the proposed paper, a global cluster head supervises the whole network, cluster heads in each cluster supervise their own clusters’ communications. Security analyses show that the proposed mechanism is secure enough for P2P. Any two nodes within the same cluster generate their common session key by themselves. In the same cluster, no nodes gain this session key except the cluster head. Moreover, there are only two kinds of operations, hash operation and XOR operation, in the proposed mechanism. Hence, the proposed mechanism provides an efficient way to supervise the P2P network.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123441666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As the complexity of current computer architecture increases, domain-specific program generators are extensively used to implement performance portable libraries. Dynamic programming is a performance-critical kernel in many applications including engineering operations and bioinformatics. In this paper, we propose an Automatically Tuned Dynamic Programming (ATDP) to optimize performance of dynamic programming algorithm across various architectures. First, an algorithm-by-blocks for dynamic programming is designed to facilitate optimizing with well-known techniques including cache and register tiling. Further, the parameterized algorithm-by-blocks is cooperative with an auto-tuning framework and leverages a hill climbing algorithm to search the possible best program on a given platform. The experiments on two x86 processors demonstrate that (i) the generated scalar programs improve performance by over 10 times, (ii) the vector programs further speedup the scalar ones by a factor of 4 and 2 for single-precision and double-precision, respectively.
{"title":"Automatically Tuned Dynamic Programming with an Algorithm-by-Blocks","authors":"Jiajia Li, Guangming Tan, Mingyu Chen","doi":"10.1109/ICPADS.2010.117","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.117","url":null,"abstract":"As the complexity of current computer architecture increases, domain-specific program generators are extensively used to implement performance portable libraries. Dynamic programming is a performance-critical kernel in many applications including engineering operations and bioinformatics. In this paper, we propose an Automatically Tuned Dynamic Programming (ATDP) to optimize performance of dynamic programming algorithm across various architectures. First, an algorithm-by-blocks for dynamic programming is designed to facilitate optimizing with well-known techniques including cache and register tiling. Further, the parameterized algorithm-by-blocks is cooperative with an auto-tuning framework and leverages a hill climbing algorithm to search the possible best program on a given platform. The experiments on two x86 processors demonstrate that (i) the generated scalar programs improve performance by over 10 times, (ii) the vector programs further speedup the scalar ones by a factor of 4 and 2 for single-precision and double-precision, respectively.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122217347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An appropriate automatic thread decomposition approach is critical for pipelined multithreading (PMT) to maximize pipeline performance with balanced thread size on target multi-core processor. This paper presents an automatic thread decomposition approach, which maps the pipeline thread decomposition problem onto a graph-theoretic framework to construct an optimized DAG with minimal bottleneck node size and balanced node size under constrained core number. In this approach, control dependence is treated as special data dependence and then an effective mechanism is proposed to remove redundant control dependences. A heuristic decomposition algorithm is given to generate an optimized pipeline. The algorithm has been evaluated on a commodity multi-core processor, and experimental results show that it has achieved speedup ranging from 113% to 174% on several SPEC CPU 2000 benchmark programs.
为了在目标多核处理器上平衡线程大小,最大化流水线性能,合适的自动线程分解方法是实现流水线多线程的关键。本文提出了一种自动线程分解方法,将流水线线程分解问题映射到图论框架上,构造在核数约束下瓶颈节点大小最小、节点大小均衡的优化DAG。该方法将控制依赖关系视为特殊的数据依赖关系,提出了一种有效的去除冗余控制依赖关系的机制。给出了一种启发式分解算法来生成优化的管道。该算法已在商用多核处理器上进行了测试,实验结果表明,该算法在几个SPEC CPU 2000基准程序上的加速幅度在113% ~ 174%之间。
{"title":"Automatic Thread Decomposition for Pipelined Multithreading","authors":"Yuanming Zhang, K. Ootsu, T. Yokota, T. Baba","doi":"10.1109/ICPADS.2010.18","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.18","url":null,"abstract":"An appropriate automatic thread decomposition approach is critical for pipelined multithreading (PMT) to maximize pipeline performance with balanced thread size on target multi-core processor. This paper presents an automatic thread decomposition approach, which maps the pipeline thread decomposition problem onto a graph-theoretic framework to construct an optimized DAG with minimal bottleneck node size and balanced node size under constrained core number. In this approach, control dependence is treated as special data dependence and then an effective mechanism is proposed to remove redundant control dependences. A heuristic decomposition algorithm is given to generate an optimized pipeline. The algorithm has been evaluated on a commodity multi-core processor, and experimental results show that it has achieved speedup ranging from 113% to 174% on several SPEC CPU 2000 benchmark programs.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122251155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantinos I. Karantasis, E. D. Polychronopoulos, J. Ekaterinaris
The recent advent of multicore processors, and especially the introduction of many-core GPUs, opens new horizons to large-scale, high-resolution, simulations for a broad range of scientific fields. Among them, the scientific area of CFD appears to be one of the candidates that could significantly benefit from the utilization of many-core GPUs. In o rder to investigate such a potential, we evaluate the performance of a high-order accurate method for the simulation of compressible flows. Current implementation is taking place on a GPU cluster. Nevertheless, a novel approach is followed concerning the utilization of GPU clusters that does not involve explicit message passing. Instead, the presented implementation resides on Software Distributed Shared Memory (SDSM) to propagate changes across the simulation phases. The first results prove to be emboldening and lay grounds for further research along the use of shared memory abstraction in order to utilize future GPU clusters.
{"title":"Acceleration of a High Order Accurate Method for Compressible Flows on SDSM Based GPU Clusters","authors":"Konstantinos I. Karantasis, E. D. Polychronopoulos, J. Ekaterinaris","doi":"10.1109/ICPADS.2010.107","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.107","url":null,"abstract":"The recent advent of multicore processors, and especially the introduction of many-core GPUs, opens new horizons to large-scale, high-resolution, simulations for a broad range of scientific fields. Among them, the scientific area of CFD appears to be one of the candidates that could significantly benefit from the utilization of many-core GPUs. In o rder to investigate such a potential, we evaluate the performance of a high-order accurate method for the simulation of compressible flows. Current implementation is taking place on a GPU cluster. Nevertheless, a novel approach is followed concerning the utilization of GPU clusters that does not involve explicit message passing. Instead, the presented implementation resides on Software Distributed Shared Memory (SDSM) to propagate changes across the simulation phases. The first results prove to be emboldening and lay grounds for further research along the use of shared memory abstraction in order to utilize future GPU clusters.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122320126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}