Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392606
G. Zheng, L. Shi, L. Kalé
As high performance clusters continue to grow in size, the mean time between failures shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challenging factors for application scalability. The traditional disk-based method of dealing with faults is to checkpoint the state of the entire application periodically to reliable storage and restart from the recent checkpoint. The recovery of the application from faults involves (often manually) restarting applications on all processors and having it read the data from disks on all processors. The restart can therefore take minutes after it has been initiated. Such a strategy requires that the failed processor can be replaced so that the number of processors at checkpoint-time and recovery-time are the same. We present FTC-Charms ++, a fault-tolerant runtime based on a scheme for fast and scalable in-memory checkpoint and restart. At restart, when there is no extra processor, the program can continue to run on the remaining processors while minimizing the performance penalty due to losing processors. The method is useful for applications whose memory footprint is small at the checkpoint state, while a variation of this scheme - in-disk checkpoint/restart can be applied to applications with large memory footprint. The scheme does not require any individual component to be fault-free. We have implemented this scheme for Charms++ and AMPI (an adaptive version of MPl). This work describes the scheme and shows performance data on a cluster using 128 processors.
{"title":"FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI","authors":"G. Zheng, L. Shi, L. Kalé","doi":"10.1109/CLUSTR.2004.1392606","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392606","url":null,"abstract":"As high performance clusters continue to grow in size, the mean time between failures shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challenging factors for application scalability. The traditional disk-based method of dealing with faults is to checkpoint the state of the entire application periodically to reliable storage and restart from the recent checkpoint. The recovery of the application from faults involves (often manually) restarting applications on all processors and having it read the data from disks on all processors. The restart can therefore take minutes after it has been initiated. Such a strategy requires that the failed processor can be replaced so that the number of processors at checkpoint-time and recovery-time are the same. We present FTC-Charms ++, a fault-tolerant runtime based on a scheme for fast and scalable in-memory checkpoint and restart. At restart, when there is no extra processor, the program can continue to run on the remaining processors while minimizing the performance penalty due to losing processors. The method is useful for applications whose memory footprint is small at the checkpoint state, while a variation of this scheme - in-disk checkpoint/restart can be applied to applications with large memory footprint. The scheme does not require any individual component to be fault-free. We have implemented this scheme for Charms++ and AMPI (an adaptive version of MPl). This work describes the scheme and shows performance data on a cluster using 128 processors.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115135368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392615
Renaud Lachaize, J. Hansen
Cluster storage systems where storage devices are distributed across a large number of nodes are able to reduce the I/O bottleneck problems present in most centralized storage systems. However, such distributed storage devices are hard to manage efficiently. In this paper, we examine the use of explicit, component-based (command and data) paths between hosts and disks as a vehicle for performing nondisruptive storage system reconfiguration. We describe the mechanisms necessary to perform reconfigurations and show how they can be used to handle two management tasks: migration between network technologies and rebuilding a disk in a mirror. Our approach is validated through initial performance measurements of these two tasks using a prototype implementation. The results show that online reconfiguration is possible at a modest cost
{"title":"Simplifying administration through dynamic reconfiguration. in a cooperative cluster storage system","authors":"Renaud Lachaize, J. Hansen","doi":"10.1109/CLUSTR.2004.1392615","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392615","url":null,"abstract":"Cluster storage systems where storage devices are distributed across a large number of nodes are able to reduce the I/O bottleneck problems present in most centralized storage systems. However, such distributed storage devices are hard to manage efficiently. In this paper, we examine the use of explicit, component-based (command and data) paths between hosts and disks as a vehicle for performing nondisruptive storage system reconfiguration. We describe the mechanisms necessary to perform reconfigurations and show how they can be used to handle two management tasks: migration between network technologies and rebuilding a disk in a mirror. Our approach is validated through initial performance measurements of these two tasks using a prototype implementation. The results show that online reconfiguration is possible at a modest cost","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123112392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392594
D. Panda
Summary forn only given. The tutorial aims to familiarize with IBA, its benefits, available IBA hardware/software solutions, and the latest trends in designing high-end computing, networking, and storage systems with IBA, and providing a critical assessment of whether IBA is ready for prime-time or not.
{"title":"State of InfiniBand in designing HPC clusters, storage/file systems, and datacenters [datacenters read as data centers]","authors":"D. Panda","doi":"10.1109/CLUSTR.2004.1392594","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392594","url":null,"abstract":"Summary forn only given. The tutorial aims to familiarize with IBA, its benefits, available IBA hardware/software solutions, and the latest trends in designing high-end computing, networking, and storage systems with IBA, and providing a critical assessment of whether IBA is ready for prime-time or not.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124272728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392609
Pierre Lemarinier, Aurélien Bouteiller, T. Hérault, Géraud Krawezik, F. Cappello
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection and recovery for message passing systems with different impact on application performance and the capacity to tolerate a high fault rate. In a recent paper, we have demonstrated that the main differences between pessimistic sender based message logging and coordinated checkpointing are: 1) the communication latency and 2) the performance penalty in case of faults. Pessimistic message logging increases the latency, due to additional blocking control messages. When faults occur at a high rate, coordinated checkpointing implies a higher performance penalty than message logging due to a higher stress on the checkpoint server. We extend this study to improved versions of message logging and coordinated checkpoint protocols which respectively reduces the latency overhead of pessimistic message logging and the server stress of coordinated checkpoint. We detail the protocols and their implementation into the new MPICH-V fault tolerant framework. We compare their performance against the previous versions and we compare the novel message logging protocols against the improved coordinated checkpointing one using the NAS benchmark on a typical high performance cluster equipped with a high speed network. The contribution of This work is twofold: a) an original message logging protocol and an improved coordinated checkpointing protocol and b) the comparison between them.
{"title":"Improved message logging versus improved coordinated checkpointing for fault tolerant MPI","authors":"Pierre Lemarinier, Aurélien Bouteiller, T. Hérault, Géraud Krawezik, F. Cappello","doi":"10.1109/CLUSTR.2004.1392609","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392609","url":null,"abstract":"Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection and recovery for message passing systems with different impact on application performance and the capacity to tolerate a high fault rate. In a recent paper, we have demonstrated that the main differences between pessimistic sender based message logging and coordinated checkpointing are: 1) the communication latency and 2) the performance penalty in case of faults. Pessimistic message logging increases the latency, due to additional blocking control messages. When faults occur at a high rate, coordinated checkpointing implies a higher performance penalty than message logging due to a higher stress on the checkpoint server. We extend this study to improved versions of message logging and coordinated checkpoint protocols which respectively reduces the latency overhead of pessimistic message logging and the server stress of coordinated checkpoint. We detail the protocols and their implementation into the new MPICH-V fault tolerant framework. We compare their performance against the previous versions and we compare the novel message logging protocols against the improved coordinated checkpointing one using the NAS benchmark on a typical high performance cluster equipped with a high speed network. The contribution of This work is twofold: a) an original message logging protocol and an improved coordinated checkpointing protocol and b) the comparison between them.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125306726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392652
A. Hunter, D. Schibeci, H. L. Hiew, M. Bellgard
Summary form only given. Bioinformatics is an important application area for grid computing. The grid computing issues required to tackle current bioinformatics challenges include processing power, large-scale data access and management, security, application integration, data integrity and curation, control/automation/tracking of workflows, data format consistency and resource discovery. In this poster, we describe preliminary steps taken to develop a grid environment to advance bioinformatics research. We developed a system called Grendel, with the aims of providing bioinformatics researchers transparent access to basic computational resources used in their research. Grendel is a platform and language independent Web-services based system for distributed resource management utilising Sun Grid Engine that provides a single entry point for computational tasks while keeping the actual resources transparent to the user. Grendel is developed in Java and deployed using the Tomcat. Client libraries have been developed in Perl and Java to provide access to computation resource exported via Grendel.
{"title":"GRID-enabled bioinformatics applications for comparative genomic analysis at the CBBC","authors":"A. Hunter, D. Schibeci, H. L. Hiew, M. Bellgard","doi":"10.1109/CLUSTR.2004.1392652","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392652","url":null,"abstract":"Summary form only given. Bioinformatics is an important application area for grid computing. The grid computing issues required to tackle current bioinformatics challenges include processing power, large-scale data access and management, security, application integration, data integrity and curation, control/automation/tracking of workflows, data format consistency and resource discovery. In this poster, we describe preliminary steps taken to develop a grid environment to advance bioinformatics research. We developed a system called Grendel, with the aims of providing bioinformatics researchers transparent access to basic computational resources used in their research. Grendel is a platform and language independent Web-services based system for distributed resource management utilising Sun Grid Engine that provides a single entry point for computational tasks while keeping the actual resources transparent to the user. Grendel is developed in Java and deployed using the Tomcat. Client libraries have been developed in Perl and Java to provide access to computation resource exported via Grendel.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124747240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392634
A. Baranovski, G. Garzoglio, I. Terekhov, A. Roy, T. Tannenbaum
When designing SAMGrid, a project for distributing high-energy physics computations on a grid, we discovered that it was challenging to decide where to place user's jobs. Jobs typically need to access hundreds of files, and each site has a different subset of the files. Our data system SAM knows what portion of a user's data may be at each site, but does not know how to submit grid jobs. Our job submission system Condor-G knows how to submit grid jobs, but originally it required users to choose grid sites and gave them no assistance in choosing. This work describes how we enhanced Condor-G to interact with SAM to make good decisions about where jobs should be executed, and thereby improve the performance of grid jobs that access large amounts of data. All these enhancements are general enough to be applicable to grid computing beyond the data-intensive computing with SAMGrid.
{"title":"Management of grid jobs and data within SAMGrid","authors":"A. Baranovski, G. Garzoglio, I. Terekhov, A. Roy, T. Tannenbaum","doi":"10.1109/CLUSTR.2004.1392634","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392634","url":null,"abstract":"When designing SAMGrid, a project for distributing high-energy physics computations on a grid, we discovered that it was challenging to decide where to place user's jobs. Jobs typically need to access hundreds of files, and each site has a different subset of the files. Our data system SAM knows what portion of a user's data may be at each site, but does not know how to submit grid jobs. Our job submission system Condor-G knows how to submit grid jobs, but originally it required users to choose grid sites and gave them no assistance in choosing. This work describes how we enhanced Condor-G to interact with SAM to make good decisions about where jobs should be executed, and thereby improve the performance of grid jobs that access large amounts of data. All these enhancements are general enough to be applicable to grid computing beyond the data-intensive computing with SAMGrid.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114707099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392610
Weikuan Yu, D. Panda, Darius Buntinas
All-to-all broadcast is one of the common collective operations that involve dense communication between all processes in a parallel program. Previously, programmable network interface cards (NICs) have been leveraged to efficiently support collective operations, including barrier, broadcast, and reduce. This work explores the characteristics of all-to-all broadcast and proposes new algorithms to exploit the potential advantages of NIC programmability. Along with these algorithms, salient strategies have been used to provide scalable topology management, global buffer management, efficient communication processing, and message reliability. The algorithms have been incorporated into a NIC-based collective protocol over Myrinet/GM. The NIC-based all-to-all broadcast operations improve all-to-all broadcast bandwidth over 16 nodes by a factor of 3, compared to host-based all-to-all broadcast operation. Furthermore, the NIC-based operations have been demonstrated to achieve better scalability to large systems and very low host CPU utilization.
{"title":"Scalable, high-performance NIC-based all-to-all broadcast over Myrinet/GM","authors":"Weikuan Yu, D. Panda, Darius Buntinas","doi":"10.1109/CLUSTR.2004.1392610","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392610","url":null,"abstract":"All-to-all broadcast is one of the common collective operations that involve dense communication between all processes in a parallel program. Previously, programmable network interface cards (NICs) have been leveraged to efficiently support collective operations, including barrier, broadcast, and reduce. This work explores the characteristics of all-to-all broadcast and proposes new algorithms to exploit the potential advantages of NIC programmability. Along with these algorithms, salient strategies have been used to provide scalable topology management, global buffer management, efficient communication processing, and message reliability. The algorithms have been incorporated into a NIC-based collective protocol over Myrinet/GM. The NIC-based all-to-all broadcast operations improve all-to-all broadcast bandwidth over 16 nodes by a factor of 3, compared to host-based all-to-all broadcast operation. Furthermore, the NIC-based operations have been demonstrated to achieve better scalability to large systems and very low host CPU utilization.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134536845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392633
S. Kleban, S. Clearwater
This work expands upon our earlier work involving the concept of computation-at-risk (CaR). In particular, CaR refers to the risk that certain computations may not get done within a timely manner. We examine a number of CaR distributions on several large clusters. The important contribution of This work is that it shows that there exist CaR-reducing strategies and by employing such strategies, a facility can significantly reduce the risk of inefficient resource utilization. Grids are shown to be one means for employing a CaR-reducing strategy. For example, we show that a CaR-reducing strategy applied to a common queue can have a dramatic effect on the wait times for jobs on a grid of clusters. In particular, we defined a CaR Sharpe rule that provides a decision rule for determining the best machine in a grid to place a new job.
{"title":"Computation-at-risk: employing the grid for computational risk management","authors":"S. Kleban, S. Clearwater","doi":"10.1109/CLUSTR.2004.1392633","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392633","url":null,"abstract":"This work expands upon our earlier work involving the concept of computation-at-risk (CaR). In particular, CaR refers to the risk that certain computations may not get done within a timely manner. We examine a number of CaR distributions on several large clusters. The important contribution of This work is that it shows that there exist CaR-reducing strategies and by employing such strategies, a facility can significantly reduce the risk of inefficient resource utilization. Grids are shown to be one means for employing a CaR-reducing strategy. For example, we show that a CaR-reducing strategy applied to a common queue can have a dramatic effect on the wait times for jobs on a grid of clusters. In particular, we defined a CaR Sharpe rule that provides a decision rule for determining the best machine in a grid to place a new job.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130829454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392601
P. Strazdins, Johannes Uhlmann
Gang scheduling and related techniques are widely believed to be necessary for efficient job scheduling on distributed memory parallel computers. This is because they minimize context switching overheads and permit the parallel job currently running to progress at the fastest possible rate. However, in the case of cluster computers, and particularly those with COTS networks, these benefits can be outweighed in the multiple jobs time-sharing context by the loss the ability to utilize the CPU for other jobs when the current job is waiting for messages. Experiments on a Linux Beowulf cluster with 100 Mb fast Ethernet switches are made comparing the SCore buddy-based gang scheduling with local scheduling (provided by the Linux 2.4 kernel with MPI implemented over TCP/IP). Results for communication-intensive numerical applications on 16 nodes reveal that gang scheduling results in 'slowdowns ' up to a factor of two greater for 8 simultaneous jobs. This phenomenon is not due to any deficiencies in SCore but due to the relative costs of context switching versus message overhead, and we expect similar results holds for any gang scheduling implementation. A performance analysis of local scheduling indicates that cache pollution due to context switching is more significant than the direct context switching overhead on the applications studied. When this is taken into account, local scheduling behaviour comes close to achieving ideal slowdowns for finer-grained computations such as Linpack. The performance models also indicate that similar trends are to be expected for clusters with faster networks.
{"title":"A comparison of local and gang scheduling on a Beowulf cluster","authors":"P. Strazdins, Johannes Uhlmann","doi":"10.1109/CLUSTR.2004.1392601","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392601","url":null,"abstract":"Gang scheduling and related techniques are widely believed to be necessary for efficient job scheduling on distributed memory parallel computers. This is because they minimize context switching overheads and permit the parallel job currently running to progress at the fastest possible rate. However, in the case of cluster computers, and particularly those with COTS networks, these benefits can be outweighed in the multiple jobs time-sharing context by the loss the ability to utilize the CPU for other jobs when the current job is waiting for messages. Experiments on a Linux Beowulf cluster with 100 Mb fast Ethernet switches are made comparing the SCore buddy-based gang scheduling with local scheduling (provided by the Linux 2.4 kernel with MPI implemented over TCP/IP). Results for communication-intensive numerical applications on 16 nodes reveal that gang scheduling results in 'slowdowns ' up to a factor of two greater for 8 simultaneous jobs. This phenomenon is not due to any deficiencies in SCore but due to the relative costs of context switching versus message overhead, and we expect similar results holds for any gang scheduling implementation. A performance analysis of local scheduling indicates that cache pollution due to context switching is more significant than the direct context switching overhead on the applications studied. When this is taken into account, local scheduling behaviour comes close to achieving ideal slowdowns for finer-grained computations such as Linpack. The performance models also indicate that similar trends are to be expected for clusters with faster networks.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134528140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392596
U. V. Vinod, P. K. Baruah
An enormous body of image and video processing software has been written for conventional (sequential) desktop computers. These implement a wide range of operations, such as convolution, histogram equalization and template matching. These applications usually have a tremendous potential for parallelism. However a significant barrier in exploiting such parallelism is the difficulty of writing parallel software. In this work, the design and implementation of MPIIMGEN -- a code transformer that automatically transforms these sequential image processing codes into parallel codes that are capable of running on a cluster of workstations is presented. This tool uses a pattern driven approach to parallelize the sequential codes.
{"title":"MPIIMGEN - a code transformer that parallelizes image processing codes to run on a cluster of workstations","authors":"U. V. Vinod, P. K. Baruah","doi":"10.1109/CLUSTR.2004.1392596","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392596","url":null,"abstract":"An enormous body of image and video processing software has been written for conventional (sequential) desktop computers. These implement a wide range of operations, such as convolution, histogram equalization and template matching. These applications usually have a tremendous potential for parallelism. However a significant barrier in exploiting such parallelism is the difficulty of writing parallel software. In this work, the design and implementation of MPIIMGEN -- a code transformer that automatically transforms these sequential image processing codes into parallel codes that are capable of running on a cluster of workstations is presented. This tool uses a pattern driven approach to parallelize the sequential codes.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114721323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}