Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392620
B. Cheung, Cho-Li Wang, F. Lau
Software DSM provides good programmability for cluster computing, but its performance and limited shared memory space for large applications hinder its popularity. This paper introduces LOTS, a C++ runtime library supporting a large shared object space. With its dynamic memory mapping mechanism, LOTS can map more objects, lazily from the local disk to the virtual memory during access, leaving only a trace of control information for each object in the local process space. To our knowledge, LOTS is the first pure runtime software DSM supporting a shared object space larger than the local process space. Our testing shows that LOTS can utilize all the free hard disk space available to support hundreds of gigabytes of shared objects with a small overhead. The scope consistency memory model and a mixed coherence protocol allow LOTS to achieve better scalability with respect to problem size and cluster size.
{"title":"LOTS: a software DSM supporting large object space","authors":"B. Cheung, Cho-Li Wang, F. Lau","doi":"10.1109/CLUSTR.2004.1392620","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392620","url":null,"abstract":"Software DSM provides good programmability for cluster computing, but its performance and limited shared memory space for large applications hinder its popularity. This paper introduces LOTS, a C++ runtime library supporting a large shared object space. With its dynamic memory mapping mechanism, LOTS can map more objects, lazily from the local disk to the virtual memory during access, leaving only a trace of control information for each object in the local process space. To our knowledge, LOTS is the first pure runtime software DSM supporting a shared object space larger than the local process space. Our testing shows that LOTS can utilize all the free hard disk space available to support hundreds of gigabytes of shared objects with a small overhead. The scope consistency memory model and a mixed coherence protocol allow LOTS to achieve better scalability with respect to problem size and cluster size.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131042119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392595
R. Asbury, M. Wrinn
Intel/spl copy/ Cluster Tools assist developers of distributed parallel software to analyze and optimize applications on clusters. This tutorial uses a combination of lecture, demo, and (primarily) lab exercises with these tools to introduce event-based tracing techniques for MPI applications. The tools used in this tutorial were formerly marketed as Vampir and Vampirtrace.
{"title":"MPI tuning with Intel/spl copy/ Trace Analyzer and Intel/spl copy/ Trace Collector","authors":"R. Asbury, M. Wrinn","doi":"10.1109/CLUSTR.2004.1392595","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392595","url":null,"abstract":"Intel/spl copy/ Cluster Tools assist developers of distributed parallel software to analyze and optimize applications on clusters. This tutorial uses a combination of lecture, demo, and (primarily) lab exercises with these tools to introduce event-based tracing techniques for MPI applications. The tools used in this tutorial were formerly marketed as Vampir and Vampirtrace.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123994084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392656
J. Parker, G. Lyzenga, C. Norton, E. Tisdale, A. Donnellan
Development has boosted the GeoFEST system for simulating the faulted crust from a local desktop research application to a community model deployed on advanced cluster platforms, including an Apple G5, Intel P4, SGI Altix 3000, and HP Itaniam 2 clusters. GeoFEST uses unstructured tetrahedral meshes to follow details of stress evolution, fault slip, and plastic/elastic processes in quake-prone inhomogeneous regions, like Los Angeles. This makes it ideal for interpreting GPS and radar measurements of deformation. To remake GeoFEST as a high-performance community code, essential new features are Web accessibility, scalable performance on popular clusters, and parallel adaptive mesh refinement (PAMR). While GeoFEST source is available for free download, a Web portal environment is also supported. Users cap work entirely within a Web browser from problem definition to results animation, using tools like a database of faults, meshing, GeoFEST, and visualization. For scalable deployment, GeoFEST now relies on the PYRAMID library. The direct solver was rewritten as an iterative method, using PYRAMID'S support for partitioning. Analysis determined that scaling is most sensitive to solver communication required at the domain boundaries. Direct pairwise exchange proved successful (linear), while a binary tree method involving all domains was not. On current Intel clusters with Myrinet the application has insignificant communication overhead for problems down to /spl sim/1000s of elements per processor. Over one million elements run well on 64 processors. Initial tests using PYRAMID for the PAMR (essential for regional simulations) and a strain-energy metric produce quality meshes.
{"title":"A community faulted-crust model using PYRAMID on cluster platforms","authors":"J. Parker, G. Lyzenga, C. Norton, E. Tisdale, A. Donnellan","doi":"10.1109/CLUSTR.2004.1392656","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392656","url":null,"abstract":"Development has boosted the GeoFEST system for simulating the faulted crust from a local desktop research application to a community model deployed on advanced cluster platforms, including an Apple G5, Intel P4, SGI Altix 3000, and HP Itaniam 2 clusters. GeoFEST uses unstructured tetrahedral meshes to follow details of stress evolution, fault slip, and plastic/elastic processes in quake-prone inhomogeneous regions, like Los Angeles. This makes it ideal for interpreting GPS and radar measurements of deformation. To remake GeoFEST as a high-performance community code, essential new features are Web accessibility, scalable performance on popular clusters, and parallel adaptive mesh refinement (PAMR). While GeoFEST source is available for free download, a Web portal environment is also supported. Users cap work entirely within a Web browser from problem definition to results animation, using tools like a database of faults, meshing, GeoFEST, and visualization. For scalable deployment, GeoFEST now relies on the PYRAMID library. The direct solver was rewritten as an iterative method, using PYRAMID'S support for partitioning. Analysis determined that scaling is most sensitive to solver communication required at the domain boundaries. Direct pairwise exchange proved successful (linear), while a binary tree method involving all domains was not. On current Intel clusters with Myrinet the application has insignificant communication overhead for problems down to /spl sim/1000s of elements per processor. Over one million elements run well on 64 processors. Initial tests using PYRAMID for the PAMR (essential for regional simulations) and a strain-energy metric produce quality meshes.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133922441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392617
R. Brightwell, D. Doerfler, K. Underwood
Quadrics Elan-4 and 4X InfiniBand have comparable performance in terms of peak bandwidth and ping-pong latency. In contrast, the two network architectures differ dramatically in details ranging from signaling technologies to programming interface design to software stacks. Both networks compete in the high performance computing marketplace, and InfiniBand is currently receiving a significant amount of attention, due mostly to its potential cost/performance advantage. This work compares 4X InfiniBand and Quadrics Elan-4 on identical compute hardware using application benchmarks of importance to the DOE community. We use scaling efficiency as the main performance metric, and we also provide a cost analysis for different network configurations. Although our 32-node test platform is relatively small, some scaling issues are evident. In general, the Quadrics hardware scales slightly better on most of the applications tested.
{"title":"A comparison of 4X InfiniBand and Quadrics Elan-4 technologies","authors":"R. Brightwell, D. Doerfler, K. Underwood","doi":"10.1109/CLUSTR.2004.1392617","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392617","url":null,"abstract":"Quadrics Elan-4 and 4X InfiniBand have comparable performance in terms of peak bandwidth and ping-pong latency. In contrast, the two network architectures differ dramatically in details ranging from signaling technologies to programming interface design to software stacks. Both networks compete in the high performance computing marketplace, and InfiniBand is currently receiving a significant amount of attention, due mostly to its potential cost/performance advantage. This work compares 4X InfiniBand and Quadrics Elan-4 on identical compute hardware using application benchmarks of importance to the DOE community. We use scaling efficiency as the main performance metric, and we also provide a cost analysis for different network configurations. Although our 32-node test platform is relatively small, some scaling issues are evident. In general, the Quadrics hardware scales slightly better on most of the applications tested.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132920332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392598
A. Pant, Hassan Jafri
Emerging infrastructure of computational grids composed of clusters-of-clusters (CoC) interlinked through high throughput channels promises unprecedented raw compute power for terascale applications. Projects such as the NSF Teragrid and EU Datagrid deploy CoCs across multiple geographical sites providing tens ofteraflops. Efficient scaling of terascale applications on these grids poses a challenge due to the heterogeneous nature of the resources (operating systems and SANs) present at each site that makes interoperability among multiple clusters difficult. In addition, due to the enormous disparity in latency and throughput of the channels within the SAN and those interlinking multiple clusters, these CoC grids contain deep communication hierarchies that prohibit efficient scaling of tightly-coupled applications. We present a design of a grid-enabled MPI called MPICH-VMI for running terascale applications over CoC based computational grids. MPICH- VMI is based on MPICH implementation of MPI 1.1 standard and utilizes a middleware messaging library called the virtual machine interface (VMI). VM enables MPICH- VMI to communicate over heterogeneous networks common in CoC based grid. MPICH-VMI also features novel optimizations for hiding communication hierarchies present in CoC based grids. We also present some preliminary results with MPICH-VMI running on the TeraGridfor MPl benchmarks and applications.
{"title":"Communicating efficiently on cluster based grids with MPICH-VMI","authors":"A. Pant, Hassan Jafri","doi":"10.1109/CLUSTR.2004.1392598","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392598","url":null,"abstract":"Emerging infrastructure of computational grids composed of clusters-of-clusters (CoC) interlinked through high throughput channels promises unprecedented raw compute power for terascale applications. Projects such as the NSF Teragrid and EU Datagrid deploy CoCs across multiple geographical sites providing tens ofteraflops. Efficient scaling of terascale applications on these grids poses a challenge due to the heterogeneous nature of the resources (operating systems and SANs) present at each site that makes interoperability among multiple clusters difficult. In addition, due to the enormous disparity in latency and throughput of the channels within the SAN and those interlinking multiple clusters, these CoC grids contain deep communication hierarchies that prohibit efficient scaling of tightly-coupled applications. We present a design of a grid-enabled MPI called MPICH-VMI for running terascale applications over CoC based computational grids. MPICH- VMI is based on MPICH implementation of MPI 1.1 standard and utilizes a middleware messaging library called the virtual machine interface (VMI). VM enables MPICH- VMI to communicate over heterogeneous networks common in CoC based grid. MPICH-VMI also features novel optimizations for hiding communication hierarchies present in CoC based grids. We also present some preliminary results with MPICH-VMI running on the TeraGridfor MPl benchmarks and applications.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123393940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392643
R. Minnich
One of the slowest and most annoying aspects of system management is the simple act of rebooting the system. The sysadmin starts from a known state $the OS is running - and hands the computer over to an untrustworthy piece of software. With enough nodes involved, there is a certain chance that the process will fail on one of them. Bootstrapping is well named - it takes the system down to a low level, from which return is uncertain. It would be much better if we could use the known, trusted OS software to manage the boot process. The OS can apply all its power to the problem of locating, verifying, and loading a new OS image. Error checking and feedback can be far more robust. We discuss five systems for Linux and Plan 9 that allow the OS to boot the OS. These systems allow for the complete elimination of old-fashioned bootstrap.
{"title":"Give your bootstrap the boot: using the operating system to boot the operating system","authors":"R. Minnich","doi":"10.1109/CLUSTR.2004.1392643","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392643","url":null,"abstract":"One of the slowest and most annoying aspects of system management is the simple act of rebooting the system. The sysadmin starts from a known state $the OS is running - and hands the computer over to an untrustworthy piece of software. With enough nodes involved, there is a certain chance that the process will fail on one of them. Bootstrapping is well named - it takes the system down to a low level, from which return is uncertain. It would be much better if we could use the known, trusted OS software to manage the boot process. The OS can apply all its power to the problem of locating, verifying, and loading a new OS image. Error checking and feedback can be far more robust. We discuss five systems for Linux and Plan 9 that allow the OS to boot the OS. These systems allow for the complete elimination of old-fashioned bootstrap.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122897061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392635
Z. Cvetanovic
As cluster computer environments increase in size and complexity, it is becoming more challenging to analyze and identify factors that limit performance and scalability. Easy-to-use tools that help identify such bottlenecks are crucial for tuning applications and configuring systems for best performance. We present a collection of visualization tools, which allow users to monitor load on all cluster components simultaneously, with negligible overhead, and no changes in the application. We include examples where the tools have been used to identify bottlenecks within a cluster and improve performance. We provide several examples of application profiles gathered using the tools and outline the methodology for projecting performance of future cluster platforms.
{"title":"Performance analysis tools for large-scale Linux clusters","authors":"Z. Cvetanovic","doi":"10.1109/CLUSTR.2004.1392635","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392635","url":null,"abstract":"As cluster computer environments increase in size and complexity, it is becoming more challenging to analyze and identify factors that limit performance and scalability. Easy-to-use tools that help identify such bottlenecks are crucial for tuning applications and configuring systems for best performance. We present a collection of visualization tools, which allow users to monitor load on all cluster components simultaneously, with negligible overhead, and no changes in the application. We include examples where the tools have been used to identify bottlenecks within a cluster and improve performance. We provide several examples of application profiles gathered using the tools and outline the methodology for projecting performance of future cluster platforms.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126581633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392653
Dongyoung Kim, Dongseung Kim
Collective communication functions including the broadcast in cluster computers usually take O(m log P) time in propagating the size-m message to P processors. We have devised a new O(m) broadcast algorithm, independent of the number of processors involved, by using divided-and-conquer algorithm. Details are given below.
{"title":"Fast broadcast by the divide-and-conquer algorithm","authors":"Dongyoung Kim, Dongseung Kim","doi":"10.1109/CLUSTR.2004.1392653","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392653","url":null,"abstract":"Collective communication functions including the broadcast in cluster computers usually take O(m log P) time in propagating the size-m message to P processors. We have devised a new O(m) broadcast algorithm, independent of the number of processors involved, by using divided-and-conquer algorithm. Details are given below.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125752696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392645
Fei Chen, K. B. Theobald, G. Gao
Conjugate gradient (CG) is one of the most popular iterative approaches to solving large sparse linear systems of equations. This work reports a parallel implementation of CG on clusters with EARTH multithreaded runtime support. Interphase and intraphase communication costs are balanced using a two-dimensional blocking method, minimizing overall communication costs. EARTH'S adaptive, event-driven multithreaded execution model gives additional opportunities to overlap communication and computation to achieve even better scalability. Experiments on a large Beowulf cluster with gigabit Ethernet show notable improvements over other parallel CG implementations. For example, with the NAS CG benchmark problem size Class C, our implementation achieved a speedup of 41 on a 64-node cluster, compared to 13 for the MPl-based NAS version. The results demonstrate that the combination of the two-dimensional blocking method and the EARTH architectural runtime support helps to compensate for the low communications bandwidth common to most clusters.
{"title":"Implementing parallel conjugate gradient on the EARTH multithreaded architecture","authors":"Fei Chen, K. B. Theobald, G. Gao","doi":"10.1109/CLUSTR.2004.1392645","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392645","url":null,"abstract":"Conjugate gradient (CG) is one of the most popular iterative approaches to solving large sparse linear systems of equations. This work reports a parallel implementation of CG on clusters with EARTH multithreaded runtime support. Interphase and intraphase communication costs are balanced using a two-dimensional blocking method, minimizing overall communication costs. EARTH'S adaptive, event-driven multithreaded execution model gives additional opportunities to overlap communication and computation to achieve even better scalability. Experiments on a large Beowulf cluster with gigabit Ethernet show notable improvements over other parallel CG implementations. For example, with the NAS CG benchmark problem size Class C, our implementation achieved a speedup of 41 on a 64-node cluster, compared to 13 for the MPl-based NAS version. The results demonstrate that the combination of the two-dimensional blocking method and the EARTH architectural runtime support helps to compensate for the low communications bandwidth common to most clusters.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133788551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392644
S. Momose, K. Sano, K. Suzuki, Tadao Nakamura
Vector quantization (VQ) is an attractive technique for lossy data compression, which is a key technology for data storage and/or transfer. So far, various competitive learning (CL) algorithms have been proposed to design optimal codebooks presenting quantization with minimized errors. However, their practical use has been limited for large scale problems, due to the computational complexity of competitive learning. This work presents a parallel competitive learning algorithm for fast code-book design based on space partitioning. The algorithm partitions input-vector space into some subspaces, and independently designs corresponding subcodebooks for these subspaces with computational complexity reduced. Independent processing on different subspaces can be processed in parallel without synchronization overhead, resulting in high scalability. We perform experiments of parallel codebook design on a commodity PC cluster with 8 nodes. Experimental results show that the high speedup of the codebook design is obtained without increase of quantization errors.
{"title":"Parallel competitive learning algorithm for fast codebook design on partitioned space","authors":"S. Momose, K. Sano, K. Suzuki, Tadao Nakamura","doi":"10.1109/CLUSTR.2004.1392644","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392644","url":null,"abstract":"Vector quantization (VQ) is an attractive technique for lossy data compression, which is a key technology for data storage and/or transfer. So far, various competitive learning (CL) algorithms have been proposed to design optimal codebooks presenting quantization with minimized errors. However, their practical use has been limited for large scale problems, due to the computational complexity of competitive learning. This work presents a parallel competitive learning algorithm for fast code-book design based on space partitioning. The algorithm partitions input-vector space into some subspaces, and independently designs corresponding subcodebooks for these subspaces with computational complexity reduced. Independent processing on different subspaces can be processed in parallel without synchronization overhead, resulting in high scalability. We perform experiments of parallel codebook design on a commodity PC cluster with 8 nodes. Experimental results show that the high speedup of the codebook design is obtained without increase of quantization errors.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121065095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}