Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392617
R. Brightwell, D. Doerfler, K. Underwood
Quadrics Elan-4 and 4X InfiniBand have comparable performance in terms of peak bandwidth and ping-pong latency. In contrast, the two network architectures differ dramatically in details ranging from signaling technologies to programming interface design to software stacks. Both networks compete in the high performance computing marketplace, and InfiniBand is currently receiving a significant amount of attention, due mostly to its potential cost/performance advantage. This work compares 4X InfiniBand and Quadrics Elan-4 on identical compute hardware using application benchmarks of importance to the DOE community. We use scaling efficiency as the main performance metric, and we also provide a cost analysis for different network configurations. Although our 32-node test platform is relatively small, some scaling issues are evident. In general, the Quadrics hardware scales slightly better on most of the applications tested.
{"title":"A comparison of 4X InfiniBand and Quadrics Elan-4 technologies","authors":"R. Brightwell, D. Doerfler, K. Underwood","doi":"10.1109/CLUSTR.2004.1392617","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392617","url":null,"abstract":"Quadrics Elan-4 and 4X InfiniBand have comparable performance in terms of peak bandwidth and ping-pong latency. In contrast, the two network architectures differ dramatically in details ranging from signaling technologies to programming interface design to software stacks. Both networks compete in the high performance computing marketplace, and InfiniBand is currently receiving a significant amount of attention, due mostly to its potential cost/performance advantage. This work compares 4X InfiniBand and Quadrics Elan-4 on identical compute hardware using application benchmarks of importance to the DOE community. We use scaling efficiency as the main performance metric, and we also provide a cost analysis for different network configurations. Although our 32-node test platform is relatively small, some scaling issues are evident. In general, the Quadrics hardware scales slightly better on most of the applications tested.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132920332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392641
Greg Bruno, M. Katz, Federico D. Sacerdoti, P. Papadopoulos
The Rocks toolkit uses a graph-based framework to describe the configuration of all node types (termed appliances) that make up a complete cluster. With hundreds of deployed clusters, our turnkey systems approach has shown to be quite easily adapted to different hardware and logical node configurations. However, the Rocks architecture and implementation contains a significant asymmetry: the graph definition of all appliance types except the initial frontend can be modified and extended by the end-user before installation. However, frontends can be modified only afterward by hands-on system administration. To address this administrative discontinuity between nodes and frontends, we describe the design and implementation of Rolls. First and foremost, Rolls provide both the architecture and mechanisms that enable the end-user to incrementally and programmatically modify the graph description for all appliance types. New functionality can be added and any Rocks-supplied software component can be overwritten or removed simply by inserting the desired Roll CD(s) at installation time. This symmetric approach to cluster construction has allowed us to shrink the core of the Rocks implementation while increasing flexibility for the end-user. Rolls are optional, automatically configured, cluster-aware software systems. Current add-ons include: scheduling systems (SGE, PBS), grid support (based on NSF Middleware Initiative), database support (DB2), Condor, integrity checking (Tripwire) and the Intel compiler. Community-specific Rolls can be and are developed by groups outside of the Rocks core development group.
{"title":"Rolls: modifying a standard system installer to support user-customizable cluster frontend appliances","authors":"Greg Bruno, M. Katz, Federico D. Sacerdoti, P. Papadopoulos","doi":"10.1109/CLUSTR.2004.1392641","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392641","url":null,"abstract":"The Rocks toolkit uses a graph-based framework to describe the configuration of all node types (termed appliances) that make up a complete cluster. With hundreds of deployed clusters, our turnkey systems approach has shown to be quite easily adapted to different hardware and logical node configurations. However, the Rocks architecture and implementation contains a significant asymmetry: the graph definition of all appliance types except the initial frontend can be modified and extended by the end-user before installation. However, frontends can be modified only afterward by hands-on system administration. To address this administrative discontinuity between nodes and frontends, we describe the design and implementation of Rolls. First and foremost, Rolls provide both the architecture and mechanisms that enable the end-user to incrementally and programmatically modify the graph description for all appliance types. New functionality can be added and any Rocks-supplied software component can be overwritten or removed simply by inserting the desired Roll CD(s) at installation time. This symmetric approach to cluster construction has allowed us to shrink the core of the Rocks implementation while increasing flexibility for the end-user. Rolls are optional, automatically configured, cluster-aware software systems. Current add-ons include: scheduling systems (SGE, PBS), grid support (based on NSF Middleware Initiative), database support (DB2), Condor, integrity checking (Tripwire) and the Intel compiler. Community-specific Rolls can be and are developed by groups outside of the Rocks core development group.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130566356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392656
J. Parker, G. Lyzenga, C. Norton, E. Tisdale, A. Donnellan
Development has boosted the GeoFEST system for simulating the faulted crust from a local desktop research application to a community model deployed on advanced cluster platforms, including an Apple G5, Intel P4, SGI Altix 3000, and HP Itaniam 2 clusters. GeoFEST uses unstructured tetrahedral meshes to follow details of stress evolution, fault slip, and plastic/elastic processes in quake-prone inhomogeneous regions, like Los Angeles. This makes it ideal for interpreting GPS and radar measurements of deformation. To remake GeoFEST as a high-performance community code, essential new features are Web accessibility, scalable performance on popular clusters, and parallel adaptive mesh refinement (PAMR). While GeoFEST source is available for free download, a Web portal environment is also supported. Users cap work entirely within a Web browser from problem definition to results animation, using tools like a database of faults, meshing, GeoFEST, and visualization. For scalable deployment, GeoFEST now relies on the PYRAMID library. The direct solver was rewritten as an iterative method, using PYRAMID'S support for partitioning. Analysis determined that scaling is most sensitive to solver communication required at the domain boundaries. Direct pairwise exchange proved successful (linear), while a binary tree method involving all domains was not. On current Intel clusters with Myrinet the application has insignificant communication overhead for problems down to /spl sim/1000s of elements per processor. Over one million elements run well on 64 processors. Initial tests using PYRAMID for the PAMR (essential for regional simulations) and a strain-energy metric produce quality meshes.
{"title":"A community faulted-crust model using PYRAMID on cluster platforms","authors":"J. Parker, G. Lyzenga, C. Norton, E. Tisdale, A. Donnellan","doi":"10.1109/CLUSTR.2004.1392656","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392656","url":null,"abstract":"Development has boosted the GeoFEST system for simulating the faulted crust from a local desktop research application to a community model deployed on advanced cluster platforms, including an Apple G5, Intel P4, SGI Altix 3000, and HP Itaniam 2 clusters. GeoFEST uses unstructured tetrahedral meshes to follow details of stress evolution, fault slip, and plastic/elastic processes in quake-prone inhomogeneous regions, like Los Angeles. This makes it ideal for interpreting GPS and radar measurements of deformation. To remake GeoFEST as a high-performance community code, essential new features are Web accessibility, scalable performance on popular clusters, and parallel adaptive mesh refinement (PAMR). While GeoFEST source is available for free download, a Web portal environment is also supported. Users cap work entirely within a Web browser from problem definition to results animation, using tools like a database of faults, meshing, GeoFEST, and visualization. For scalable deployment, GeoFEST now relies on the PYRAMID library. The direct solver was rewritten as an iterative method, using PYRAMID'S support for partitioning. Analysis determined that scaling is most sensitive to solver communication required at the domain boundaries. Direct pairwise exchange proved successful (linear), while a binary tree method involving all domains was not. On current Intel clusters with Myrinet the application has insignificant communication overhead for problems down to /spl sim/1000s of elements per processor. Over one million elements run well on 64 processors. Initial tests using PYRAMID for the PAMR (essential for regional simulations) and a strain-energy metric produce quality meshes.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133922441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392654
Arnaud Legrand, Olivier Beaumont, L. Marchal, Y. Robert
Summary form only given. In this work, we consider the problem of allocating and scheduling a collection of independent, equal-sized tasks on heterogeneous star-shaped platforms. We also address the same problem for divisible tasks. For both cases, we take memory constraints into account. We prove strong NP-completeness results for different objective functions, namely makespan minimization and throughput maximization, on simple star-shaped platforms. We propose an approximation algorithm based on the unconstrained version (with unlimited memory) of the problem. We introduce several heuristics, which are evaluated and compared through extensive simulations. An unexpected conclusion drawn from these experiments is that classical scheduling heuristics that try to greedily minimize the completion time of each task are outperformed by the simple heuristic that consists in assigning the task to the available processor that has the smallest communication time, regardless of computation power (hence a "bandwidth-centric" distribution).
{"title":"Master slave scheduling on heterogeneous star-shaped platforms with limited memory","authors":"Arnaud Legrand, Olivier Beaumont, L. Marchal, Y. Robert","doi":"10.1109/CLUSTR.2004.1392654","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392654","url":null,"abstract":"Summary form only given. In this work, we consider the problem of allocating and scheduling a collection of independent, equal-sized tasks on heterogeneous star-shaped platforms. We also address the same problem for divisible tasks. For both cases, we take memory constraints into account. We prove strong NP-completeness results for different objective functions, namely makespan minimization and throughput maximization, on simple star-shaped platforms. We propose an approximation algorithm based on the unconstrained version (with unlimited memory) of the problem. We introduce several heuristics, which are evaluated and compared through extensive simulations. An unexpected conclusion drawn from these experiments is that classical scheduling heuristics that try to greedily minimize the completion time of each task are outperformed by the simple heuristic that consists in assigning the task to the available processor that has the smallest communication time, regardless of computation power (hence a \"bandwidth-centric\" distribution).","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129972994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392643
R. Minnich
One of the slowest and most annoying aspects of system management is the simple act of rebooting the system. The sysadmin starts from a known state $the OS is running - and hands the computer over to an untrustworthy piece of software. With enough nodes involved, there is a certain chance that the process will fail on one of them. Bootstrapping is well named - it takes the system down to a low level, from which return is uncertain. It would be much better if we could use the known, trusted OS software to manage the boot process. The OS can apply all its power to the problem of locating, verifying, and loading a new OS image. Error checking and feedback can be far more robust. We discuss five systems for Linux and Plan 9 that allow the OS to boot the OS. These systems allow for the complete elimination of old-fashioned bootstrap.
{"title":"Give your bootstrap the boot: using the operating system to boot the operating system","authors":"R. Minnich","doi":"10.1109/CLUSTR.2004.1392643","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392643","url":null,"abstract":"One of the slowest and most annoying aspects of system management is the simple act of rebooting the system. The sysadmin starts from a known state $the OS is running - and hands the computer over to an untrustworthy piece of software. With enough nodes involved, there is a certain chance that the process will fail on one of them. Bootstrapping is well named - it takes the system down to a low level, from which return is uncertain. It would be much better if we could use the known, trusted OS software to manage the boot process. The OS can apply all its power to the problem of locating, verifying, and loading a new OS image. Error checking and feedback can be far more robust. We discuss five systems for Linux and Plan 9 that allow the OS to boot the OS. These systems allow for the complete elimination of old-fashioned bootstrap.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122897061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392598
A. Pant, Hassan Jafri
Emerging infrastructure of computational grids composed of clusters-of-clusters (CoC) interlinked through high throughput channels promises unprecedented raw compute power for terascale applications. Projects such as the NSF Teragrid and EU Datagrid deploy CoCs across multiple geographical sites providing tens ofteraflops. Efficient scaling of terascale applications on these grids poses a challenge due to the heterogeneous nature of the resources (operating systems and SANs) present at each site that makes interoperability among multiple clusters difficult. In addition, due to the enormous disparity in latency and throughput of the channels within the SAN and those interlinking multiple clusters, these CoC grids contain deep communication hierarchies that prohibit efficient scaling of tightly-coupled applications. We present a design of a grid-enabled MPI called MPICH-VMI for running terascale applications over CoC based computational grids. MPICH- VMI is based on MPICH implementation of MPI 1.1 standard and utilizes a middleware messaging library called the virtual machine interface (VMI). VM enables MPICH- VMI to communicate over heterogeneous networks common in CoC based grid. MPICH-VMI also features novel optimizations for hiding communication hierarchies present in CoC based grids. We also present some preliminary results with MPICH-VMI running on the TeraGridfor MPl benchmarks and applications.
{"title":"Communicating efficiently on cluster based grids with MPICH-VMI","authors":"A. Pant, Hassan Jafri","doi":"10.1109/CLUSTR.2004.1392598","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392598","url":null,"abstract":"Emerging infrastructure of computational grids composed of clusters-of-clusters (CoC) interlinked through high throughput channels promises unprecedented raw compute power for terascale applications. Projects such as the NSF Teragrid and EU Datagrid deploy CoCs across multiple geographical sites providing tens ofteraflops. Efficient scaling of terascale applications on these grids poses a challenge due to the heterogeneous nature of the resources (operating systems and SANs) present at each site that makes interoperability among multiple clusters difficult. In addition, due to the enormous disparity in latency and throughput of the channels within the SAN and those interlinking multiple clusters, these CoC grids contain deep communication hierarchies that prohibit efficient scaling of tightly-coupled applications. We present a design of a grid-enabled MPI called MPICH-VMI for running terascale applications over CoC based computational grids. MPICH- VMI is based on MPICH implementation of MPI 1.1 standard and utilizes a middleware messaging library called the virtual machine interface (VMI). VM enables MPICH- VMI to communicate over heterogeneous networks common in CoC based grid. MPICH-VMI also features novel optimizations for hiding communication hierarchies present in CoC based grids. We also present some preliminary results with MPICH-VMI running on the TeraGridfor MPl benchmarks and applications.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123393940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392635
Z. Cvetanovic
As cluster computer environments increase in size and complexity, it is becoming more challenging to analyze and identify factors that limit performance and scalability. Easy-to-use tools that help identify such bottlenecks are crucial for tuning applications and configuring systems for best performance. We present a collection of visualization tools, which allow users to monitor load on all cluster components simultaneously, with negligible overhead, and no changes in the application. We include examples where the tools have been used to identify bottlenecks within a cluster and improve performance. We provide several examples of application profiles gathered using the tools and outline the methodology for projecting performance of future cluster platforms.
{"title":"Performance analysis tools for large-scale Linux clusters","authors":"Z. Cvetanovic","doi":"10.1109/CLUSTR.2004.1392635","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392635","url":null,"abstract":"As cluster computer environments increase in size and complexity, it is becoming more challenging to analyze and identify factors that limit performance and scalability. Easy-to-use tools that help identify such bottlenecks are crucial for tuning applications and configuring systems for best performance. We present a collection of visualization tools, which allow users to monitor load on all cluster components simultaneously, with negligible overhead, and no changes in the application. We include examples where the tools have been used to identify bottlenecks within a cluster and improve performance. We provide several examples of application profiles gathered using the tools and outline the methodology for projecting performance of future cluster platforms.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126581633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392653
Dongyoung Kim, Dongseung Kim
Collective communication functions including the broadcast in cluster computers usually take O(m log P) time in propagating the size-m message to P processors. We have devised a new O(m) broadcast algorithm, independent of the number of processors involved, by using divided-and-conquer algorithm. Details are given below.
{"title":"Fast broadcast by the divide-and-conquer algorithm","authors":"Dongyoung Kim, Dongseung Kim","doi":"10.1109/CLUSTR.2004.1392653","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392653","url":null,"abstract":"Collective communication functions including the broadcast in cluster computers usually take O(m log P) time in propagating the size-m message to P processors. We have devised a new O(m) broadcast algorithm, independent of the number of processors involved, by using divided-and-conquer algorithm. Details are given below.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125752696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392645
Fei Chen, K. B. Theobald, G. Gao
Conjugate gradient (CG) is one of the most popular iterative approaches to solving large sparse linear systems of equations. This work reports a parallel implementation of CG on clusters with EARTH multithreaded runtime support. Interphase and intraphase communication costs are balanced using a two-dimensional blocking method, minimizing overall communication costs. EARTH'S adaptive, event-driven multithreaded execution model gives additional opportunities to overlap communication and computation to achieve even better scalability. Experiments on a large Beowulf cluster with gigabit Ethernet show notable improvements over other parallel CG implementations. For example, with the NAS CG benchmark problem size Class C, our implementation achieved a speedup of 41 on a 64-node cluster, compared to 13 for the MPl-based NAS version. The results demonstrate that the combination of the two-dimensional blocking method and the EARTH architectural runtime support helps to compensate for the low communications bandwidth common to most clusters.
{"title":"Implementing parallel conjugate gradient on the EARTH multithreaded architecture","authors":"Fei Chen, K. B. Theobald, G. Gao","doi":"10.1109/CLUSTR.2004.1392645","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392645","url":null,"abstract":"Conjugate gradient (CG) is one of the most popular iterative approaches to solving large sparse linear systems of equations. This work reports a parallel implementation of CG on clusters with EARTH multithreaded runtime support. Interphase and intraphase communication costs are balanced using a two-dimensional blocking method, minimizing overall communication costs. EARTH'S adaptive, event-driven multithreaded execution model gives additional opportunities to overlap communication and computation to achieve even better scalability. Experiments on a large Beowulf cluster with gigabit Ethernet show notable improvements over other parallel CG implementations. For example, with the NAS CG benchmark problem size Class C, our implementation achieved a speedup of 41 on a 64-node cluster, compared to 13 for the MPl-based NAS version. The results demonstrate that the combination of the two-dimensional blocking method and the EARTH architectural runtime support helps to compensate for the low communications bandwidth common to most clusters.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133788551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392644
S. Momose, K. Sano, K. Suzuki, Tadao Nakamura
Vector quantization (VQ) is an attractive technique for lossy data compression, which is a key technology for data storage and/or transfer. So far, various competitive learning (CL) algorithms have been proposed to design optimal codebooks presenting quantization with minimized errors. However, their practical use has been limited for large scale problems, due to the computational complexity of competitive learning. This work presents a parallel competitive learning algorithm for fast code-book design based on space partitioning. The algorithm partitions input-vector space into some subspaces, and independently designs corresponding subcodebooks for these subspaces with computational complexity reduced. Independent processing on different subspaces can be processed in parallel without synchronization overhead, resulting in high scalability. We perform experiments of parallel codebook design on a commodity PC cluster with 8 nodes. Experimental results show that the high speedup of the codebook design is obtained without increase of quantization errors.
{"title":"Parallel competitive learning algorithm for fast codebook design on partitioned space","authors":"S. Momose, K. Sano, K. Suzuki, Tadao Nakamura","doi":"10.1109/CLUSTR.2004.1392644","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392644","url":null,"abstract":"Vector quantization (VQ) is an attractive technique for lossy data compression, which is a key technology for data storage and/or transfer. So far, various competitive learning (CL) algorithms have been proposed to design optimal codebooks presenting quantization with minimized errors. However, their practical use has been limited for large scale problems, due to the computational complexity of competitive learning. This work presents a parallel competitive learning algorithm for fast code-book design based on space partitioning. The algorithm partitions input-vector space into some subspaces, and independently designs corresponding subcodebooks for these subspaces with computational complexity reduced. Independent processing on different subspaces can be processed in parallel without synchronization overhead, resulting in high scalability. We perform experiments of parallel codebook design on a commodity PC cluster with 8 nodes. Experimental results show that the high speedup of the codebook design is obtained without increase of quantization errors.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121065095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}