Pub Date : 2001-08-07DOI: 10.1109/HPDC.2001.945209
K. Czajkowski, A. K. Demir, C. Kesselman, M. Thiébaux
Computational grids are enabling collaboration between scientists and organizations to generate and archive extremely large datasets across shared, distributed resources. There is a need to visually explore such data throughout the life-cycle of projects. Practical exploration of large datasets requires visualization tools that can function in the same grid environment in which the data is created and stored. Resource management interfaces are an important structural component of grid computing environments because they enable uniform access to the wide variety of resources necessary for scientific work. We describe a new advance-reservation system for graphics resources; and an application of existing grid technology to create general-purpose active storage systems. We report our experience with prototype infrastructure and application components, involving experiments coupling end-to-end resources for interactive visual exploration of large data in representative distributed environments.
{"title":"Practical resource management for grid-based visual exploration","authors":"K. Czajkowski, A. K. Demir, C. Kesselman, M. Thiébaux","doi":"10.1109/HPDC.2001.945209","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945209","url":null,"abstract":"Computational grids are enabling collaboration between scientists and organizations to generate and archive extremely large datasets across shared, distributed resources. There is a need to visually explore such data throughout the life-cycle of projects. Practical exploration of large datasets requires visualization tools that can function in the same grid environment in which the data is created and stored. Resource management interfaces are an important structural component of grid computing environments because they enable uniform access to the wide variety of resources necessary for scientific work. We describe a new advance-reservation system for graphics resources; and an application of existing grid technology to create general-purpose active storage systems. We report our experience with prototype infrastructure and application components, involving experiments coupling end-to-end resources for interactive visual exploration of large data in representative distributed environments.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132638469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-07DOI: 10.1109/HPDC.2001.945214
Thomas Dramlitsch, Gabrielle Allen, E. Seidel
We discuss a set of novel techniques we are developing, which build on standard tools, to make distributed computing for large-scale simulations across multiple machines (even scattered across different continents) a reality. With these techniques we demonstrate that we are able to scale a tightly coupled scientific application in metacomputing environments. Such research and development in metacomputing will lead the way to routine, straightforward and efficient use of distributed computing resources anywhere around the world. This work applies not only to the large-scale simulations in astrophysics which provide the motivation for this work, but also opens the way for new, innovative application scenarios.
{"title":"Efficient techniques for distributed computing","authors":"Thomas Dramlitsch, Gabrielle Allen, E. Seidel","doi":"10.1109/HPDC.2001.945214","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945214","url":null,"abstract":"We discuss a set of novel techniques we are developing, which build on standard tools, to make distributed computing for large-scale simulations across multiple machines (even scattered across different continents) a reality. With these techniques we demonstrate that we are able to scale a tightly coupled scientific application in metacomputing environments. Such research and development in metacomputing will lead the way to routine, straightforward and efficient use of distributed computing resources anywhere around the world. This work applies not only to the large-scale simulations in astrophysics which provide the motivation for this work, but also opens the way for new, innovative application scenarios.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132688157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-07DOI: 10.1109/HPDC.2001.945186
E. Weigle, Wu-chun Feng
Computational grids such as the Information Power Grid (Johnston et al., 1999), Particle Physics Data Grid , and Earth System Grid depend on TCP to provide reliable communication between nodes across a wide-area network (WAN). Of the available TCP implementations, TCP Reno and its variants are the most widely deployed; however, Reno's performance in computational grids is mediocre at best. Due to conflicting results in the evaluation of TCP implementations, we present a detailed simulation study that unifies the conflicting results and demonstrates the limitations of earlier work. We focus on the two most debated versions of TCP-Reno and Vegas. Using real traffic distributions, we show that Vegas performs well over modern high-performance links and better than Reno with the proper selection of the Vegas parameters /spl alpha/ and /spl beta/. Our results exhibit ways to significantly enhance the performance of distributed computational grids that rely on TCP.
诸如信息电网(Johnston et al., 1999)、粒子物理数据网格和地球系统网格等计算网格依靠TCP在广域网(WAN)上的节点之间提供可靠的通信。在可用的TCP实现中,TCP Reno及其变体是部署最广泛的;然而,雷诺在计算网格方面的表现充其量是平庸的。由于评估TCP实现时的结果相互矛盾,我们提出了一个详细的模拟研究,统一了相互矛盾的结果,并展示了早期工作的局限性。我们关注的是两个最具争议的tcp版本——雷诺和维加斯。使用真实的流量分布,我们表明拉斯维加斯在现代高性能链路上表现良好,并且在适当选择拉斯维加斯参数/spl alpha/和/spl beta/时优于里诺。我们的结果展示了显著提高依赖TCP的分布式计算网格性能的方法。
{"title":"A case for TCP Vegas in high-performance computational grids","authors":"E. Weigle, Wu-chun Feng","doi":"10.1109/HPDC.2001.945186","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945186","url":null,"abstract":"Computational grids such as the Information Power Grid (Johnston et al., 1999), Particle Physics Data Grid , and Earth System Grid depend on TCP to provide reliable communication between nodes across a wide-area network (WAN). Of the available TCP implementations, TCP Reno and its variants are the most widely deployed; however, Reno's performance in computational grids is mediocre at best. Due to conflicting results in the evaluation of TCP implementations, we present a detailed simulation study that unifies the conflicting results and demonstrates the limitations of earlier work. We focus on the two most debated versions of TCP-Reno and Vegas. Using real traffic distributions, we show that Vegas performs well over modern high-performance links and better than Reno with the proper selection of the Vegas parameters /spl alpha/ and /spl beta/. Our results exhibit ways to significantly enhance the performance of distributed computational grids that rely on TCP.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114241063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-07DOI: 10.1109/HPDC.2001.945200
D. Thain, J. Basney, Se-Chang Son, M. Livny
Access to remote data is one of the principal challenges of Grid computing. While performing I/O, Grid applications must be prepared for server crashes, performance variations and exhausted resources. To achieve high throughput in such a hostile environment, applications need a resilient service that moves data while hiding errors and latencies. We illustrate this idea with Kangaroo, a simple data movement system that makes opportunistic use of disks and networks to keep applications running. We demonstrate that Kangaroo can achieve better end-to-end performance than traditional data movement techniques, even though its individual components do not achieve high performance.
{"title":"The Kangaroo approach to data movement on the Grid","authors":"D. Thain, J. Basney, Se-Chang Son, M. Livny","doi":"10.1109/HPDC.2001.945200","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945200","url":null,"abstract":"Access to remote data is one of the principal challenges of Grid computing. While performing I/O, Grid applications must be prepared for server crashes, performance variations and exhausted resources. To achieve high throughput in such a hostile environment, applications need a resilient service that moves data while hiding errors and latencies. We illustrate this idea with Kangaroo, a simple data movement system that makes opportunistic use of disks and networks to keep applications running. We demonstrate that Kangaroo can achieve better end-to-end performance than traditional data movement techniques, even though its individual components do not achieve high performance.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116338066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-07DOI: 10.1109/HPDC.2001.945216
W. Johnston, S. Talwar, K. Jackson
Large-scale science and engineering is frequently done through the interaction of collaborating groups, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed. The overall motivation for "Grids" is to enable the routine interactions of these resources to enhance this type of large-scale science and engineering, and thus substantially increase the computing and data handling capabilities available to science and engineering projects. However, even if this environment works in every other way, it will not be viable if it is constantly disrupted by hackers and their kin. Distributed applications are potentially more vulnerable than conventional scientific problem solving environments because there are substantially more targets to attack in order to impact a single application. Much of the overall security of Grids is inherited from the security of the underlying systems. There are, however, some security considerations at the Grid level that are independent of the underlying systems, and we focus on this latter aspect.
{"title":"Overview of security considerations for computational and data grids","authors":"W. Johnston, S. Talwar, K. Jackson","doi":"10.1109/HPDC.2001.945216","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945216","url":null,"abstract":"Large-scale science and engineering is frequently done through the interaction of collaborating groups, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed. The overall motivation for \"Grids\" is to enable the routine interactions of these resources to enhance this type of large-scale science and engineering, and thus substantially increase the computing and data handling capabilities available to science and engineering projects. However, even if this environment works in every other way, it will not be viable if it is constantly disrupted by hackers and their kin. Distributed applications are potentially more vulnerable than conventional scientific problem solving environments because there are substantially more targets to attack in order to impact a single application. Much of the overall security of Grids is inherited from the security of the underlying systems. There are, however, some security considerations at the Grid level that are independent of the underlying systems, and we focus on this latter aspect.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123422960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-07DOI: 10.1109/HPDC.2001.945202
J. Romein, H. Bal
The distributed searching of state spaces containing cycles is a challenging task and has been studied for several years. Traditional parallel search algorithms either ignore the cyclic nature of the state space and waste much time in duplicated search effort, or they rely on heavy communication to reduce duplicate work, resulting in a large communication overhead. Both methods perform poorly, even when using a fast, local interconnection. A recently-developed task distribution scheme, called transposition-driven scheduling (TDS), performs much better, since it communicates asynchronously and efficiently suppresses duplicate search effort. TDS, however, requires bandwidths of megabytes per second per processor. In this paper, we investigate how cyclic state spaces can be searched efficiently on a meta-computing system containing multiple clusters, connected by high-latency, low-bandwidth wide-area links. This is quite a challenge, because the wide-area links provide neither the bandwidth required for TDS nor the latency required for traditional distributed search algorithms. We propose a scheme that strongly reduces communication between clusters at the expense of some duplicate search effort. Performance measurements for several applications show that the new scheme outperforms traditional schemes by a wide margin.
{"title":"Wide-area transposition-driven scheduling","authors":"J. Romein, H. Bal","doi":"10.1109/HPDC.2001.945202","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945202","url":null,"abstract":"The distributed searching of state spaces containing cycles is a challenging task and has been studied for several years. Traditional parallel search algorithms either ignore the cyclic nature of the state space and waste much time in duplicated search effort, or they rely on heavy communication to reduce duplicate work, resulting in a large communication overhead. Both methods perform poorly, even when using a fast, local interconnection. A recently-developed task distribution scheme, called transposition-driven scheduling (TDS), performs much better, since it communicates asynchronously and efficiently suppresses duplicate search effort. TDS, however, requires bandwidths of megabytes per second per processor. In this paper, we investigate how cyclic state spaces can be searched efficiently on a meta-computing system containing multiple clusters, connected by high-latency, low-bandwidth wide-area links. This is quite a challenge, because the wide-area links provide neither the bandwidth required for TDS nor the latency required for traditional distributed search algorithms. We propose a scheme that strongly reduces communication between clusters at the expense of some duplicate search effort. Performance measurements for several applications show that the new scheme outperforms traditional schemes by a wide margin.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126170703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-07DOI: 10.1109/HPDC.2001.945207
J. Skicewicz, P. Dinda, J. Schopf
Different adaptive applications are interested in the dynamic behavior of a resource over different fine- to coarse-grain time-scales. The resource's sensor runs at some fine-grain resource-appropriate sampling rate, producing a discrete-time resource signal. It can be very inefficient to to answer a coarse-grain application query by directly using the fine-grain resource signal. We address this gap between the sensor and its different client applications with a novel query model that explicitly incorporates time-scale as a parameter. The query model is implemented on top of an inherently multi-scale wavelet-based representation of the signal (which could be communicated over a set of multicast channels). A query uses only the wavelet coefficients necessary for its time-scale (and thus could listen to a subset of the channels), greatly reducing the data that need to be communicated. We present very promising initial results on host load signals, showing the tradeoff between compactness and query error. Finally, we describe some of the other operations that the wavelet representation enables.
{"title":"Multi-resolution resource behavior queries using wavelets","authors":"J. Skicewicz, P. Dinda, J. Schopf","doi":"10.1109/HPDC.2001.945207","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945207","url":null,"abstract":"Different adaptive applications are interested in the dynamic behavior of a resource over different fine- to coarse-grain time-scales. The resource's sensor runs at some fine-grain resource-appropriate sampling rate, producing a discrete-time resource signal. It can be very inefficient to to answer a coarse-grain application query by directly using the fine-grain resource signal. We address this gap between the sensor and its different client applications with a novel query model that explicitly incorporates time-scale as a parameter. The query model is implemented on top of an inherently multi-scale wavelet-based representation of the signal (which could be communicated over a set of multicast channels). A query uses only the wavelet coefficients necessary for its time-scale (and thus could listen to a subset of the channels), greatly reducing the data that need to be communicated. We present very promising initial results on host load signals, showing the tradeoff between compactness and query error. Finally, we describe some of the other operations that the wavelet representation enables.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125953915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-07DOI: 10.1109/HPDC.2001.945210
R. Lovas, V. Sunderam
In order to solve the emerging debugging issues in the field of metacomputing we defined the fundamental principles of an adaptive and integrated debugging and visualization tool: a novel metadebugger. The current prototype has been implemented in the Harness metacomputing framework.
{"title":"A metadebugger prototype for the HARNESS metacomputing framework","authors":"R. Lovas, V. Sunderam","doi":"10.1109/HPDC.2001.945210","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945210","url":null,"abstract":"In order to solve the emerging debugging issues in the field of metacomputing we defined the fundamental principles of an adaptive and integrated debugging and visualization tool: a novel metadebugger. The current prototype has been implemented in the Harness metacomputing framework.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126881462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-07DOI: 10.1109/HPDC.2001.945204
Jirada Kuntraruk, W. Pottenger
One of the primary tasks in mining distributed textual data is feature extraction. The widespread digitization of information has created a wealth of data that requires novel approaches to feature extraction in a distributed environment. We propose a massively parallel model for feature extraction that employs unused cycles on networks of PCs/workstations in a highly distributed environment. We have developed an analytical model of the time and communication complexity of the feature extraction process in this environment based on feature extraction algorithms developed in our textual data mining research with HDDI/sup TM/ (Hierarchical Distributed Dynamic Indexing). We show that speedups linear in the number of processors are achievable for applications involving reduction operations based on a novel, parallel pipelined model of execution. We are in the process of validating our analytical model with empirical observations based on the extraction of features from a large number of pages on the World Wide Web.
{"title":"Massively parallel distributed feature extraction in textual data mining using HDDI/sup TM/","authors":"Jirada Kuntraruk, W. Pottenger","doi":"10.1109/HPDC.2001.945204","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945204","url":null,"abstract":"One of the primary tasks in mining distributed textual data is feature extraction. The widespread digitization of information has created a wealth of data that requires novel approaches to feature extraction in a distributed environment. We propose a massively parallel model for feature extraction that employs unused cycles on networks of PCs/workstations in a highly distributed environment. We have developed an analytical model of the time and communication complexity of the feature extraction process in this environment based on feature extraction algorithms developed in our textual data mining research with HDDI/sup TM/ (Hierarchical Distributed Dynamic Indexing). We show that speedups linear in the number of processors are achievable for applications involving reduction operations based on a novel, parallel pipelined model of execution. We are in the process of validating our analytical model with empirical observations based on the extraction of features from a large number of pages on the World Wide Web.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133418099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-08-07DOI: 10.1109/HPDC.2001.945213
Byoung-Dai Lee, J. Weissman
As the Internet is evolving away from providing simple connectivity towards providing more sophisticated services, it is difficult to provide efficient delivery of high-demand services to end users, due to the dynamic sharing of the network and connected servers. To address this problem, we propose the service grid architecture that incorporates dynamic replication and deletion of services.
{"title":"Dynamic replica management in the service grid","authors":"Byoung-Dai Lee, J. Weissman","doi":"10.1109/HPDC.2001.945213","DOIUrl":"https://doi.org/10.1109/HPDC.2001.945213","url":null,"abstract":"As the Internet is evolving away from providing simple connectivity towards providing more sophisticated services, it is difficult to provide efficient delivery of high-demand services to end users, due to the dynamic sharing of the network and connected servers. To address this problem, we propose the service grid architecture that incorporates dynamic replication and deletion of services.","PeriodicalId":304683,"journal":{"name":"Proceedings 10th IEEE International Symposium on High Performance Distributed Computing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114519986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}