Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199355
C. Cérin, Hazem Fkaier, M. Jemni
The paper considers the problem of parallel external sorting in the context of a form of heterogeneous clusters. We introduce two algorithms and we compare them to another one that we have previously developed. Since most common sort algorithms assume high-speed random access to all intermediate memory, they are unsuitable if the values to be sorted don't fit in main memory. This is the case for cluster computing platforms which are made of standard, cheap and scarce components. For that class of computing resources a good use of I/O operations compatible with the requirements of load balancing and computational complexity are the key to success. We explore three techniques and show how they can be deployed for clusters with processor performances related by a multiplicative factor. We validate the approaches in showing experimental results for the load balancing factor.
{"title":"A synthesis of parallel out-of-core sorting programs on heterogeneous clusters","authors":"C. Cérin, Hazem Fkaier, M. Jemni","doi":"10.1109/CCGRID.2003.1199355","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199355","url":null,"abstract":"The paper considers the problem of parallel external sorting in the context of a form of heterogeneous clusters. We introduce two algorithms and we compare them to another one that we have previously developed. Since most common sort algorithms assume high-speed random access to all intermediate memory, they are unsuitable if the values to be sorted don't fit in main memory. This is the case for cluster computing platforms which are made of standard, cheap and scarce components. For that class of computing resources a good use of I/O operations compatible with the requirements of load balancing and computational complexity are the key to success. We explore three techniques and show how they can be deployed for clusters with processor performances related by a multiplicative factor. We validate the approaches in showing experimental results for the load balancing factor.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"os-34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127778102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199388
K. Nakauchi, Y. Ishikawa, H. Morikawa, T. Aoyama
Decentralized and unstructured peer-to-peer (P2P) networks such as Gnutella are attractive for Internet-scale information retrieval and search systems because they require neither any centralized directory nor any centralized management of overlay network topology and data placement. However, due to this decentralized architecture, current P2P keyword search systems lack useful global knowledge such as popularity of data items and relationships between keywords and data items. As a result, current P2P keyword search systems supports only naive text-match search and can find only data items with a keyword (or meta-data) exactly indicated in a query. In this paper, we show an efficient P2P search system which increases possibility of discovering desired data items. The key mechanism is query expansion, where a received query is expanded based on keyword relationships managed in a distributed fashion by participating nodes. Keyword relationships are improved through search and retrieval processes and each relationships is shared among nodes holding similar data items. We also present implementation of our P2P search system.
{"title":"Peer-to-peer keyword search using keyword relationship","authors":"K. Nakauchi, Y. Ishikawa, H. Morikawa, T. Aoyama","doi":"10.1109/CCGRID.2003.1199388","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199388","url":null,"abstract":"Decentralized and unstructured peer-to-peer (P2P) networks such as Gnutella are attractive for Internet-scale information retrieval and search systems because they require neither any centralized directory nor any centralized management of overlay network topology and data placement. However, due to this decentralized architecture, current P2P keyword search systems lack useful global knowledge such as popularity of data items and relationships between keywords and data items. As a result, current P2P keyword search systems supports only naive text-match search and can find only data items with a keyword (or meta-data) exactly indicated in a query. In this paper, we show an efficient P2P search system which increases possibility of discovering desired data items. The key mechanism is query expansion, where a received query is expanded based on keyword relationships managed in a distributed fashion by participating nodes. Keyword relationships are improved through search and retrieval processes and each relationships is shared among nodes holding similar data items. We also present implementation of our P2P search system.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115240984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199389
Abhishek Agrawal, H. Casanova
Being able to identify clusters of nearby hosts among Internet clients provides very useful information for a number of internet and p2p applications. Examples of such applications include web applications, request routing in peer-to-peer overlay network, and distributed computing applications. In this paper, we present and formulate the internet host clustering problem. Leveraging previous work on internet host distance measurement, we propose two hierarchical clustering techniques to solve this problem. The first technique is a marker based hierarchical partitioning approach. The second technique is based on the well known K-means clustering algorithm. We evaluated these two approaches in simulation using a representative Internet topology generated with the GT ITM generator for over 1,000 hosts. Our simulation results demonstrate that our algorithmic clustering approaches effectively identify clusters with arbitrary diameters. Our conclusion is that by leveraging previous work on internet host distance estimation, it is possible to cluster Internet hosts to benefit various applications with various requirements.
{"title":"Clustering hosts in P2P and global computing platforms","authors":"Abhishek Agrawal, H. Casanova","doi":"10.1109/CCGRID.2003.1199389","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199389","url":null,"abstract":"Being able to identify clusters of nearby hosts among Internet clients provides very useful information for a number of internet and p2p applications. Examples of such applications include web applications, request routing in peer-to-peer overlay network, and distributed computing applications. In this paper, we present and formulate the internet host clustering problem. Leveraging previous work on internet host distance measurement, we propose two hierarchical clustering techniques to solve this problem. The first technique is a marker based hierarchical partitioning approach. The second technique is based on the well known K-means clustering algorithm. We evaluated these two approaches in simulation using a representative Internet topology generated with the GT ITM generator for over 1,000 hosts. Our simulation results demonstrate that our algorithmic clustering approaches effectively identify clusters with arbitrary diameters. Our conclusion is that by leveraging previous work on internet host distance estimation, it is possible to cluster Internet hosts to benefit various applications with various requirements.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115651425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199421
H. Duque, J. Montagnat, J. Pierson, L. Brunie, I. Magnin
Medical data represent tremendous amount of data for which automatic analysis is increasingly needed. Grids are very promising to face today's challenging health issues such as epidemiological studies through large image data sets. However, the sensitive nature of medical data makes it difficult to widely distribute medical applications over computational grids. In this paper, we review fundamental medical data manipulation requirements and we propose a distributed data management architecture that addresses the medical data security and high performance constraints. A prototype is currently being developed inside our laboratories to demonstrate the architecture capability to face realistic distributed medical data manipulation situations.
{"title":"DM/sup 2/: a distributed medical data manager for grids","authors":"H. Duque, J. Montagnat, J. Pierson, L. Brunie, I. Magnin","doi":"10.1109/CCGRID.2003.1199421","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199421","url":null,"abstract":"Medical data represent tremendous amount of data for which automatic analysis is increasingly needed. Grids are very promising to face today's challenging health issues such as epidemiological studies through large image data sets. However, the sensitive nature of medical data makes it difficult to widely distribute medical applications over computational grids. In this paper, we review fundamental medical data manipulation requirements and we propose a distributed data management architecture that addresses the medical data security and high performance constraints. A prototype is currently being developed inside our laboratories to demonstrate the architecture capability to face realistic distributed medical data manipulation situations.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130833630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199439
J. Abawajy
As computation and communication hardware performance continue to rapidly increase, I/O represents a growing fraction of application execution time. This gap between the I/O subsystem and others is expected to increase in future since I/O performance is limited by physical motion. Therefore, it is imperative that novel techniques for improving I/O performance be developed. Parallel I/O is a promising approach to alleviating this bottleneck. However, very little work exist with respect to scheduling parallel I/O operations explicitly. In this paper, we address the problem of effective management of parallel I/O in cluster computing systems by using appropriate I/O scheduling strategies. We propose two new I/O scheduling algorithms and compare them with two existing scheduling Approaches. The preliminary results show that the proposed policies outperform existing policies substantially.
{"title":"Performance analysis of parallel I/O scheduling approaches on cluster computing systems","authors":"J. Abawajy","doi":"10.1109/CCGRID.2003.1199439","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199439","url":null,"abstract":"As computation and communication hardware performance continue to rapidly increase, I/O represents a growing fraction of application execution time. This gap between the I/O subsystem and others is expected to increase in future since I/O performance is limited by physical motion. Therefore, it is imperative that novel techniques for improving I/O performance be developed. Parallel I/O is a promising approach to alleviating this bottleneck. However, very little work exist with respect to scheduling parallel I/O operations explicitly. In this paper, we address the problem of effective management of parallel I/O in cluster computing systems by using appropriate I/O scheduling strategies. We propose two new I/O scheduling algorithms and compare them with two existing scheduling Approaches. The preliminary results show that the proposed policies outperform existing policies substantially.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125297588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199375
R. Sundaresan, T. Kurç, Mario Lauria, S. Parthasarathy, J. Saltz
An increasing number of online applications operate on data from disparate, and often wide-spread, data sources. This paper studies the design of a system for the automated monitoring of on-line data sources. In this system a number of ad-hoc data warehouses, which maintain client-specified views, are interposed between clients and data sources. We present a model of coherence, referred to here as slacker coherence, to address the freshness problem in the context of pull-based protocols. We experimentally examine various techniques for estimating update rates and polling adaptively. We also look at the impact on the coherence model performance of the request scheduling algorithm at the source.
{"title":"A slacker coherence protocol for pull-based monitoring of on-line data sources","authors":"R. Sundaresan, T. Kurç, Mario Lauria, S. Parthasarathy, J. Saltz","doi":"10.1109/CCGRID.2003.1199375","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199375","url":null,"abstract":"An increasing number of online applications operate on data from disparate, and often wide-spread, data sources. This paper studies the design of a system for the automated monitoring of on-line data sources. In this system a number of ad-hoc data warehouses, which maintain client-specified views, are interposed between clients and data sources. We present a model of coherence, referred to here as slacker coherence, to address the freshness problem in the context of pull-based protocols. We experimentally examine various techniques for estimating update rates and polling adaptively. We also look at the impact on the coherence model performance of the request scheduling algorithm at the source.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125621717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199403
R. Badrinath, C. Morin, Geoffroy R. Vallée
This paper describes issues in the design and implementation of checkpointing and recovery modules for the Kerrighed DSM cluster system. Our design is for a DSM supporting the sequential consistency model. The mechanisms are general enough to be used in a number of different checkpointing and recovery protocols. It is designed to support common optimizations for performance suggested in literature, while staying light-weight during fault free execution. We also present preliminary performance results of the current implementation.
{"title":"Checkpointing and recovery of shared memory parallel applications in a cluster","authors":"R. Badrinath, C. Morin, Geoffroy R. Vallée","doi":"10.1109/CCGRID.2003.1199403","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199403","url":null,"abstract":"This paper describes issues in the design and implementation of checkpointing and recovery modules for the Kerrighed DSM cluster system. Our design is for a DSM supporting the sequential consistency model. The mechanisms are general enough to be used in a number of different checkpointing and recovery protocols. It is designed to support common optimizations for performance suggested in literature, while staying light-weight during fault free execution. We also present preliminary performance results of the current implementation.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122462359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199372
Chris M. Kenyon, G. Cheliotis
Cycle-harvesting is a significant part of the Grid computing landscape. However, creating commercial service contracts based on resources made available by cycle-harvesting is a significant challenge: the characteristics of the harvested resources are inherently stochastic; and secondly, in a commercial environment, purchasers can expect providers to optimize against the quality of service (QoS) definitions. The essential point for creating commercially valuable QoS definitions is to guarantee a set of statistical parameters for each contract instance. Here we describe an appropriate QoS definition, Hard Statistical QoS (HSQ), and show how this can be implemented using a hybrid stochastic-deterministic system. We analyze algorithm behavior analytically using a distribution free approach versus the expected proportion of deterministic resources required for an HSQ specification. We conclude that commercial service contracts based on cycle-harvested resources are viable both from a conceptual point of view and quantitatively.
{"title":"Creating services with hard guarantees from cycle-harvesting systems","authors":"Chris M. Kenyon, G. Cheliotis","doi":"10.1109/CCGRID.2003.1199372","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199372","url":null,"abstract":"Cycle-harvesting is a significant part of the Grid computing landscape. However, creating commercial service contracts based on resources made available by cycle-harvesting is a significant challenge: the characteristics of the harvested resources are inherently stochastic; and secondly, in a commercial environment, purchasers can expect providers to optimize against the quality of service (QoS) definitions. The essential point for creating commercially valuable QoS definitions is to guarantee a set of statistical parameters for each contract instance. Here we describe an appropriate QoS definition, Hard Statistical QoS (HSQ), and show how this can be implemented using a hybrid stochastic-deterministic system. We analyze algorithm behavior analytically using a distribution free approach versus the expected proportion of deterministic resources required for an HSQ specification. We conclude that commercial service contracts based on cycle-harvested resources are viable both from a conceptual point of view and quantitatively.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"425 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134064699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199420
T. Fuhrmann, A. Schafferhans, T. Etzold
SRS is a widely used system for integrating biological databases. Currently, SRS relies only on locally provided copies of these databases. In this paper we propose a mechanism that also allows the seamless integration of remote databases. To this end, our proposed mechanism splits the existing SRS functionality into two components and adds a third component that enables us to employ peer-to-peer computing techniques to create optimized overlay-networks within which database queries can efficiently be routed. As an additional benefit, this mechanism also reduces the administration effort that would be needed with a conventional approach using replicated databases.
{"title":"An overlay-network approach for distributed access to SRS","authors":"T. Fuhrmann, A. Schafferhans, T. Etzold","doi":"10.1109/CCGRID.2003.1199420","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199420","url":null,"abstract":"SRS is a widely used system for integrating biological databases. Currently, SRS relies only on locally provided copies of these databases. In this paper we propose a mechanism that also allows the seamless integration of remote databases. To this end, our proposed mechanism splits the existing SRS functionality into two components and adds a third component that enables us to employ peer-to-peer computing techniques to create optimized overlay-networks within which database queries can efficiently be routed. As an additional benefit, this mechanism also reduces the administration effort that would be needed with a conventional approach using replicated databases.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114012209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199416
Sophia Corsava, V. Getov
Researchers in the biological and health industries need powerful and stable systems for their work. These systems must be dependable, fault-tolerant, highly available and easy to use. To cope with these demands we propose the use of computational and data clusters in a fail-over configuration combined with the grid technology and job scheduling. Our infrastructure has been deployed successfully for running time-critical applications in commercial environments. We also present experimental results from this pilot implementation that demonstrate the viability of our approach.
{"title":"Cluster infrastructure for biological and health related research","authors":"Sophia Corsava, V. Getov","doi":"10.1109/CCGRID.2003.1199416","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199416","url":null,"abstract":"Researchers in the biological and health industries need powerful and stable systems for their work. These systems must be dependable, fault-tolerant, highly available and easy to use. To cope with these demands we propose the use of computational and data clusters in a fail-over configuration combined with the grid technology and job scheduling. Our infrastructure has been deployed successfully for running time-critical applications in commercial environments. We also present experimental results from this pilot implementation that demonstrate the viability of our approach.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115551329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}