L. Weng, G. Agrawal, Ümit V. Çatalyürek, T. Kurç, S. Narayanan, J. Saltz
Analysis of large and/or geographically distributed scientific datasets is emerging as a key component of grid computing. One challenge in this area is that scientific datasets are typically stored as binary or character flat-files, which makes specification of processing much harder. In view of this, there has been recent interest in data virtualization, and data services to support such virtualization. This paper presents an approach for automatically creating data services to support data virtualization. Specifically, we show how a relational table like data abstraction can be supported for complex multidimensional scientific datasets that are resident on a cluster. We have designed and implemented a tool that processes SQL queries (with select and where statements) on multi-dimensional datasets. We have designed a meta-data description language that is used for specifying the data layout. From such description, our tool automatically generates efficient data subsetting and access functions. We have extensively evaluated our system. The key observations from our experiments are as follows. First, our tool can correctly and efficiently handle a variety of different data layouts. Second, our system scales well as the number of nodes or the amount of data is scaled. Third, the performance of the automatically generated code for indexing and contracting functions is quite comparable to the performance of hand-written codes.
{"title":"An approach for automatic data virtualization","authors":"L. Weng, G. Agrawal, Ümit V. Çatalyürek, T. Kurç, S. Narayanan, J. Saltz","doi":"10.1109/HPDC.2004.2","DOIUrl":"https://doi.org/10.1109/HPDC.2004.2","url":null,"abstract":"Analysis of large and/or geographically distributed scientific datasets is emerging as a key component of grid computing. One challenge in this area is that scientific datasets are typically stored as binary or character flat-files, which makes specification of processing much harder. In view of this, there has been recent interest in data virtualization, and data services to support such virtualization. This paper presents an approach for automatically creating data services to support data virtualization. Specifically, we show how a relational table like data abstraction can be supported for complex multidimensional scientific datasets that are resident on a cluster. We have designed and implemented a tool that processes SQL queries (with select and where statements) on multi-dimensional datasets. We have designed a meta-data description language that is used for specifying the data layout. From such description, our tool automatically generates efficient data subsetting and access functions. We have extensively evaluated our system. The key observations from our experiments are as follows. First, our tool can correctly and efficiently handle a variety of different data layouts. Second, our system scales well as the number of nodes or the amount of data is scaled. Third, the performance of the automatically generated code for indexing and contracting functions is quite comparable to the performance of hand-written codes.","PeriodicalId":446429,"journal":{"name":"Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004.","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124466466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xi Zhang, T. Kurç, T. Pan, Ümit V. Çatalyürek, S. Narayanan, P. Wyckoff, J. Saltz
Hash-based join is a compute- and memory-intensive algorithm. It achieves good performance and scales well to large datasets, if sufficient memory is available to hold the hash table and the distribution of computing had across nodes is balanced. We compare three adaptive algorithms that start with a partitioning of the hash table across a group of nodes and expand during the hash table building phase to additional resources, when memory on a node is used up. The split-based algorithm partitions the hash table range assigned to the node, on which memory is full, into two segments and assigns one of the segments to a new node in the system. The replication-based algorithm replicates the hash table range on a new node. The hybrid algorithm combines the first and second strategies in order to address each strategy's short comings. We perform an experimental performance evaluation of these algorithms on a PC cluster. Our results show that among the three algorithms, in most cases the hybrid algorithm either performs close to the better of the two or is the best algorithm.
{"title":"Strategies for using additional resources in parallel hash-based join algorithms","authors":"Xi Zhang, T. Kurç, T. Pan, Ümit V. Çatalyürek, S. Narayanan, P. Wyckoff, J. Saltz","doi":"10.1109/HPDC.2004.34","DOIUrl":"https://doi.org/10.1109/HPDC.2004.34","url":null,"abstract":"Hash-based join is a compute- and memory-intensive algorithm. It achieves good performance and scales well to large datasets, if sufficient memory is available to hold the hash table and the distribution of computing had across nodes is balanced. We compare three adaptive algorithms that start with a partitioning of the hash table across a group of nodes and expand during the hash table building phase to additional resources, when memory on a node is used up. The split-based algorithm partitions the hash table range assigned to the node, on which memory is full, into two segments and assigns one of the segments to a new node in the system. The replication-based algorithm replicates the hash table range on a new node. The hybrid algorithm combines the first and second strategies in order to address each strategy's short comings. We perform an experimental performance evaluation of these algorithms on a PC cluster. Our results show that among the three algorithms, in most cases the hybrid algorithm either performs close to the better of the two or is the best algorithm.","PeriodicalId":446429,"journal":{"name":"Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127978298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Grid computing has excited many with the promise of access to huge amounts of resources distributed across the globe. However, there are no largely adopted solutions for automatically assembling grids, and this limits the scale of today's grids. Some argue that this is due to the overwhelming complexity of the proposed economy-based solutions. Peer-to-peer grids Iwve emerged as a less complex alternative. We are currently deploying OurGrid, one such peer-to-peer grid. OurGrid is a CPU-sharing grid that targets bag-of-tasks applications (i.e. parallel applications whose tasks are independent). In order to ease system deployment, OurGrid is based on a very lightweight autonomous reputation scheme. Free riding is an important issue for any peer-to-peer system. The aim is to show that OurGrid's reputation system successfully discourages free riding, making it in each peer s own interest to collaborate with the peer-to-peer community. We show this in two steps. First, we analyze the conditions under which a reputation scheme can discourage free riding in a CPU-sharing grid. Second, we show that OurGrid's reputation scheme satisfies these conditions, even in the presence of malicious peers. Unlike other distributed mechanisms for discouraging free riding, OurGrid's reputation scheme achieves this without requiring a shared cryptographic infrastructure or specialized storage.
{"title":"Discouraging free riding in a peer-to-peer CPU-sharing grid","authors":"N. Andrade, F. Brasileiro, W. Cirne, M. Mowbray","doi":"10.1109/HPDC.2004.9","DOIUrl":"https://doi.org/10.1109/HPDC.2004.9","url":null,"abstract":"Grid computing has excited many with the promise of access to huge amounts of resources distributed across the globe. However, there are no largely adopted solutions for automatically assembling grids, and this limits the scale of today's grids. Some argue that this is due to the overwhelming complexity of the proposed economy-based solutions. Peer-to-peer grids Iwve emerged as a less complex alternative. We are currently deploying OurGrid, one such peer-to-peer grid. OurGrid is a CPU-sharing grid that targets bag-of-tasks applications (i.e. parallel applications whose tasks are independent). In order to ease system deployment, OurGrid is based on a very lightweight autonomous reputation scheme. Free riding is an important issue for any peer-to-peer system. The aim is to show that OurGrid's reputation system successfully discourages free riding, making it in each peer s own interest to collaborate with the peer-to-peer community. We show this in two steps. First, we analyze the conditions under which a reputation scheme can discourage free riding in a CPU-sharing grid. Second, we show that OurGrid's reputation scheme satisfies these conditions, even in the presence of malicious peers. Unlike other distributed mechanisms for discouraging free riding, OurGrid's reputation scheme achieves this without requiring a shared cryptographic infrastructure or specialized storage.","PeriodicalId":446429,"journal":{"name":"Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132736264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The WSRF specifications [Foster, I. et al., (2004)] represent the merging of "the Web " and "the grid". This poster describes a design to achieve compliance with the WS-ResourceFramework specifications using Microsoft .NET technologies. Our design seeks to leverage Microsoft tools wherever possible and to make WSRF compliant services easy to program. While our work on OGSI.NET [Wasson, G. et al., (2004)] provides invaluable insight that guides the design of WSRF.NET, we feel that a different set of abstractions are necessary to capture the full potential of the WS-ResourceFramework. This poster describes our work to date on WSRF.NET The poster discusses topics such as the implementation of WS-Resources, the WSRF.NET programming model, our security architecture and our future release plans (including our first release at HPDC 13).
WSRF规范[Foster, I. et al.,(2004)]代表了“Web”和“网格”的融合。这张海报描述了使用Microsoft . net技术实现WS-ResourceFramework规范遵从性的设计。我们的设计力求尽可能地利用Microsoft工具,并使符合WSRF的服务易于编程。而我们在OGSI上的工作。. NET [Wasson, G. et al.,(2004)]提供了指导WSRF设计的宝贵见解。NET中,我们认为需要一组不同的抽象来充分发挥WS-ResourceFramework的潜力。这张海报描述了我们迄今为止在WSRF上的工作。海报讨论了诸如WS-Resources的实现、WSRF等主题。. NET编程模型,我们的安全架构和我们未来的发布计划(包括我们在HPDC 13上的第一个发布)。
{"title":"WS-ResourceFramework on .NET","authors":"G. Wasson, N. Beekwilder, M. Morgan, M. Humphrey","doi":"10.1109/HPDC.2004.42","DOIUrl":"https://doi.org/10.1109/HPDC.2004.42","url":null,"abstract":"The WSRF specifications [Foster, I. et al., (2004)] represent the merging of \"the Web \" and \"the grid\". This poster describes a design to achieve compliance with the WS-ResourceFramework specifications using Microsoft .NET technologies. Our design seeks to leverage Microsoft tools wherever possible and to make WSRF compliant services easy to program. While our work on OGSI.NET [Wasson, G. et al., (2004)] provides invaluable insight that guides the design of WSRF.NET, we feel that a different set of abstractions are necessary to capture the full potential of the WS-ResourceFramework. This poster describes our work to date on WSRF.NET The poster discusses topics such as the implementation of WS-Resources, the WSRF.NET programming model, our security architecture and our future release plans (including our first release at HPDC 13).","PeriodicalId":446429,"journal":{"name":"Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004.","volume":"355 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122799531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hash-based randomization is a powerful technique used in clusters and distributed systems for load management. It offers uniform distribution, efficient addressing, little shared state, and scalability. However, simple hash-based randomization is unable to deal with skew and heterogeneity and, therefore, cannot achieve load balance in many environments. Virtual processors have been proposed as a solution to simple randomization's problem. We evaluate an alternative load management scheme for heterogeneous, shared-disk clusters. Our scheme directly tunes hash-based randomized load placement using a technique called adaptive, nonuniform (ANU) randomization [2003] and compares favorably to the virtual processor approach. It provides the load balancing benefits of virtual processors with less shared state. It also automatically adapts to workload and cluster configuration changes, such as failure and recovery and adding or removing servers, without human involvement. Experimental results show that our scheme outperforms virtual processors and performs comparably to prescient load-balancing algorithms. They also show that our system maintains consistent performance across all servers while moving a minimal amount of load.
{"title":"Achieving performance consistency in heterogeneous clusters","authors":"Changxun Wu, R. Burns","doi":"10.1109/HPDC.2004.1","DOIUrl":"https://doi.org/10.1109/HPDC.2004.1","url":null,"abstract":"Hash-based randomization is a powerful technique used in clusters and distributed systems for load management. It offers uniform distribution, efficient addressing, little shared state, and scalability. However, simple hash-based randomization is unable to deal with skew and heterogeneity and, therefore, cannot achieve load balance in many environments. Virtual processors have been proposed as a solution to simple randomization's problem. We evaluate an alternative load management scheme for heterogeneous, shared-disk clusters. Our scheme directly tunes hash-based randomized load placement using a technique called adaptive, nonuniform (ANU) randomization [2003] and compares favorably to the virtual processor approach. It provides the load balancing benefits of virtual processors with less shared state. It also automatically adapts to workload and cluster configuration changes, such as failure and recovery and adding or removing servers, without human involvement. Experimental results show that our scheme outperforms virtual processors and performs comparably to prescient load-balancing algorithms. They also show that our system maintains consistent performance across all servers while moving a minimal amount of load.","PeriodicalId":446429,"journal":{"name":"Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116911235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Haupt, Anand Kalyanasundaram, Nisreen Ammari, Archana Chilukuri, Maxim Khotournenko
The poster presents a successful implementation of the SPURport - a prototype Grid Portal for the earthquake engineering community. Developed as a pert of the SPUR project, it extends functionality of the NEESgrid, which in turn, is an application of OGSI/Globus 3.0. We found that the implementation of a Grid portal is much easier when one introduces high-level middle-tier services that aggregate and coordinate lower-level services provided by the Globus toolkit. For example, our high level job submission service orchestrates resolution of logical entities to physical ones, file transfers, and data streaming prior to actual the resources allocation. We found it very useful to employ application descriptors that facilitate automatic generation of RSL documents.
{"title":"SPURport","authors":"T. Haupt, Anand Kalyanasundaram, Nisreen Ammari, Archana Chilukuri, Maxim Khotournenko","doi":"10.1109/HPDC.2004.33","DOIUrl":"https://doi.org/10.1109/HPDC.2004.33","url":null,"abstract":"The poster presents a successful implementation of the SPURport - a prototype Grid Portal for the earthquake engineering community. Developed as a pert of the SPUR project, it extends functionality of the NEESgrid, which in turn, is an application of OGSI/Globus 3.0. We found that the implementation of a Grid portal is much easier when one introduces high-level middle-tier services that aggregate and coordinate lower-level services provided by the Globus toolkit. For example, our high level job submission service orchestrates resolution of logical entities to physical ones, file transfers, and data streaming prior to actual the resources allocation. We found it very useful to employ application descriptors that facilitate automatic generation of RSL documents.","PeriodicalId":446429,"journal":{"name":"Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004.","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128789056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}