Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040858
Farag Azzedin, Muthucumaru Maheswaran
Grid computing systems that have been the focus of much research in recent years provide a virtual framework for controlled sharing of resources across institutional boundaries. Security is a major concern in any system that enables remote execution. Several techniques can be used for providing security in grid systems including sandboxing, encryption, and other access control and authentication mechanisms. The additional overhead caused by these mechanisms may negate the performance advantages gained by grid computing. Hence, we contend that it is essential for the scheduler to consider the security implications while performing resource allocations. In this paper, we present a trust model for grid systems and show how the model can be used to incorporate security implications into scheduling algorithms. Three scheduling heuristics that can be used in a grid system are modified to incorporate the trust notion and simulations are performed to evaluate the performance.
{"title":"Integrating trust into grid resource management systems","authors":"Farag Azzedin, Muthucumaru Maheswaran","doi":"10.1109/ICPP.2002.1040858","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040858","url":null,"abstract":"Grid computing systems that have been the focus of much research in recent years provide a virtual framework for controlled sharing of resources across institutional boundaries. Security is a major concern in any system that enables remote execution. Several techniques can be used for providing security in grid systems including sandboxing, encryption, and other access control and authentication mechanisms. The additional overhead caused by these mechanisms may negate the performance advantages gained by grid computing. Hence, we contend that it is essential for the scheduler to consider the security implications while performing resource allocations. In this paper, we present a trust model for grid systems and show how the model can be used to incorporate security implications into scheduling algorithms. Three scheduling heuristics that can be used in a grid system are modified to incorporate the trust notion and simulations are performed to evaluate the performance.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130882413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040868
Wu-chun Feng, Michael S. Warren, E. Weigle
In this paper, we present a novel twist on the Beowulf cluster - the Bladed Beowulf. Designed by RLX Technologies and integrated and configured at Los Alamos National Laboratory, our Bladed Beowulf consists of compute nodes made from commodity off-the-shelf parts mounted on motherboard blades measuring 14.7" /spl times/ 4.7" /spl times/ 0.58". Each motherboard blade (node) contains a 633 MHz Trans-meta TM5600/spl trade/ CPU, 256 MB memory, 10 GB hard disk, and three 100-Mb/s Fast Ethernet network interfaces. Using a chassis provided by RLX, twenty-four such nodes mount side-by-side in a vertical orientation to fit in a rack-mountable 3U space, i.e., 19" in width and 5.25" in height. A Bladed Beowulf can reduce the total cost of ownership (TCO) of a traditional Beowulf by a factor of three while providing Beowulf-like performance. Accordingly, rather than use the traditional definition of price-performance ratio where price is the cost of acquisition, we introduce a new metric called ToPPeR: total price-performance ratio, where total price encompasses TCO. We also propose two related (but more concrete) metrics: performance-space ratio and performance-power ratio.
{"title":"Honey, I shrunk the Beowulf!","authors":"Wu-chun Feng, Michael S. Warren, E. Weigle","doi":"10.1109/ICPP.2002.1040868","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040868","url":null,"abstract":"In this paper, we present a novel twist on the Beowulf cluster - the Bladed Beowulf. Designed by RLX Technologies and integrated and configured at Los Alamos National Laboratory, our Bladed Beowulf consists of compute nodes made from commodity off-the-shelf parts mounted on motherboard blades measuring 14.7\" /spl times/ 4.7\" /spl times/ 0.58\". Each motherboard blade (node) contains a 633 MHz Trans-meta TM5600/spl trade/ CPU, 256 MB memory, 10 GB hard disk, and three 100-Mb/s Fast Ethernet network interfaces. Using a chassis provided by RLX, twenty-four such nodes mount side-by-side in a vertical orientation to fit in a rack-mountable 3U space, i.e., 19\" in width and 5.25\" in height. A Bladed Beowulf can reduce the total cost of ownership (TCO) of a traditional Beowulf by a factor of three while providing Beowulf-like performance. Accordingly, rather than use the traditional definition of price-performance ratio where price is the cost of acquisition, we introduce a new metric called ToPPeR: total price-performance ratio, where total price encompasses TCO. We also propose two related (but more concrete) metrics: performance-space ratio and performance-power ratio.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122539267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040891
A. Doğan, F. Özgüner
Finding an optimal solution to the problem of scheduling an application modeled by a directed acyclic graph (DAG) onto a set of heterogeneous machines is known to be an NP-hard problem. In this study, we present a duplication based scheduling algorithm, namely the levelized duplication based scheduling (LDBS) algorithm, which solves this problem efficiently. The primary goal of LDBS is to minimize the schedule length of applications. LDBS can accommodate different duplication heuristics, thanks to its modular design. Specifically, we have designed two different duplication heuristics with different time complexities. The simulation studies confirm that LDBS is a very competitive scheduling algorithm in terms of minimizing the schedule length of applications.
将由有向无环图(DAG)建模的应用程序调度到一组异构机器上的问题的最优解是一个np困难问题。在本研究中,我们提出了一种基于复制的调度算法,即levelized duplication based scheduling (LDBS)算法,有效地解决了这一问题。LDBS的主要目标是最小化应用程序的进度长度。由于其模块化设计,LDBS可以适应不同的复制启发式。具体来说,我们设计了两种具有不同时间复杂度的重复启发式算法。仿真研究表明,在最小化应用程序调度长度方面,LDBS是一种极具竞争力的调度算法。
{"title":"LDBS: a duplication based scheduling algorithm for heterogeneous computing systems","authors":"A. Doğan, F. Özgüner","doi":"10.1109/ICPP.2002.1040891","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040891","url":null,"abstract":"Finding an optimal solution to the problem of scheduling an application modeled by a directed acyclic graph (DAG) onto a set of heterogeneous machines is known to be an NP-hard problem. In this study, we present a duplication based scheduling algorithm, namely the levelized duplication based scheduling (LDBS) algorithm, which solves this problem efficiently. The primary goal of LDBS is to minimize the schedule length of applications. LDBS can accommodate different duplication heuristics, thanks to its modular design. Specifically, we have designed two different duplication heuristics with different time complexities. The simulation studies confirm that LDBS is a very competitive scheduling algorithm in terms of minimizing the schedule length of applications.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126579189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040866
Jeffrey Tang, A. Bilas
In this paper, we investigate how system area networks can deal with transient and permanent network failures. We design and implement a firmware-level retransmission scheme to tolerate transient failures and an on-demand network mapping scheme to deal with permanent failures. Both schemes are transparent to applications and are conceptually simple and suitable for low-level implementations, e.g. in firmware. We then examine how the retransmission scheme affects system performance and how various protocol parameters impact system behavior. We analyze and evaluate system performance by using a real implementation on a state-of-the art cluster and both micro-benchmarks and real applications from the SPLASH-2 suite.
{"title":"Tolerating network failures in system area networks","authors":"Jeffrey Tang, A. Bilas","doi":"10.1109/ICPP.2002.1040866","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040866","url":null,"abstract":"In this paper, we investigate how system area networks can deal with transient and permanent network failures. We design and implement a firmware-level retransmission scheme to tolerate transient failures and an on-demand network mapping scheme to deal with permanent failures. Both schemes are transparent to applications and are conceptually simple and suitable for low-level implementations, e.g. in firmware. We then examine how the retransmission scheme affects system performance and how various protocol parameters impact system behavior. We analyze and evaluate system performance by using a real implementation on a state-of-the art cluster and both micro-benchmarks and real applications from the SPLASH-2 suite.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122483116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040889
A. Kalyanaraman, S. Aluru, S. Kothari
Expressed sequence tags, ESTs, are DNA molecules experimentally derived from expressed portions of genes. Clustering of ESTs is essential for gene recognition and understanding important genetic variations such as those resulting in diseases. In this paper, we present the design and development of a parallel software system for EST clustering. To our knowledge, this is the first such effort to address the problem of EST clustering in parallel. The novel features of our approach include 1) design of space efficient algorithms to keep the space requirement linear in the size of the input data set, 2) a combination of algorithmic techniques to reduce the total work without sacrificing the quality of EST clustering, and 3) use of parallel processing to reduce the run-time and facilitate the clustering of large datasets. Using a combination of these techniques, we report the clustering of 81,414 Arabidopsis ESTs in under 2.5 minutes on a 64-processor IBM SP, a problem that is estimated to take 9 hours of run-time with a state-of-the-art software, provided the memory required to run the software can be made available.
{"title":"Space and time efficient parallel algorithms and software for EST clustering","authors":"A. Kalyanaraman, S. Aluru, S. Kothari","doi":"10.1109/ICPP.2002.1040889","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040889","url":null,"abstract":"Expressed sequence tags, ESTs, are DNA molecules experimentally derived from expressed portions of genes. Clustering of ESTs is essential for gene recognition and understanding important genetic variations such as those resulting in diseases. In this paper, we present the design and development of a parallel software system for EST clustering. To our knowledge, this is the first such effort to address the problem of EST clustering in parallel. The novel features of our approach include 1) design of space efficient algorithms to keep the space requirement linear in the size of the input data set, 2) a combination of algorithmic techniques to reduce the total work without sacrificing the quality of EST clustering, and 3) use of parallel processing to reduce the run-time and facilitate the clustering of large datasets. Using a combination of these techniques, we report the clustering of 81,414 Arabidopsis ESTs in under 2.5 minutes on a 64-processor IBM SP, a problem that is estimated to take 9 hours of run-time with a state-of-the-art software, provided the memory required to run the software can be made available.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131384925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040863
R. Prodan, T. Fahringer
This paper describes ZEN, a directive-based language for the specification of arbitrarily complex program executions by varying the problem, system, or machine parameters for parallel and distributed applications. ZEN introduces directives to substitute strings and to insert assignment statements inside arbitrary files, such as program, input, script, or make-files. The programmer thus can invoke experiments for arbitrary value ranges of any problem parameter, including program variables, file names, compiler options, target machines, machine sizes, scheduling strategies, data distributions, etc. The number of experiments can be controlled through ZEN constraint directives. Finally, the programmer may request a large set of performance metrics to be computed for any code region of interest. The scope of ZEN directives can be restricted to arbitrary file or code regions. We implemented a prototype tool for automatic experiment management that is based on ZEN. We report results for the performance analysis of an ocean simulation application and for the parameter study of a computational finance code.
{"title":"ZEN: a directive-based language for automatic experiment management of distributed and parallel programs","authors":"R. Prodan, T. Fahringer","doi":"10.1109/ICPP.2002.1040863","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040863","url":null,"abstract":"This paper describes ZEN, a directive-based language for the specification of arbitrarily complex program executions by varying the problem, system, or machine parameters for parallel and distributed applications. ZEN introduces directives to substitute strings and to insert assignment statements inside arbitrary files, such as program, input, script, or make-files. The programmer thus can invoke experiments for arbitrary value ranges of any problem parameter, including program variables, file names, compiler options, target machines, machine sizes, scheduling strategies, data distributions, etc. The number of experiments can be controlled through ZEN constraint directives. Finally, the programmer may request a large set of performance metrics to be computed for any code region of interest. The scope of ZEN directives can be restricted to arbitrary file or code regions. We implemented a prototype tool for automatic experiment management that is based on ZEN. We report results for the performance analysis of an ocean simulation application and for the parameter study of a computational finance code.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134124757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040865
Hairong Kuang, L. Bic, M. Dillencourt
We describe an environment for the distributed solution of iterative grid-based applications. The environment is built using the MESSENGERS mobile agent system. The main advantage of paradigm-oriented distributed computing is that the user only needs to specify the application-specific sequential code, while the underlying infrastructure takes care of the parallelization and distribution. The two paradigms discussed in this papers are: the finite difference method, and individual-based simulation. These paradigms present some interesting challenges, both in terms of performance (because they require frequent synchronized communication between nodes) and in terms of repeatability (because the mapping of the user space onto the network may change due to load balancing or due to changes in the underlying logical network). We describe their use, implementation, and performance within a mobile agent-based environment.
{"title":"Iterative grid-based computing using mobile agents","authors":"Hairong Kuang, L. Bic, M. Dillencourt","doi":"10.1109/ICPP.2002.1040865","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040865","url":null,"abstract":"We describe an environment for the distributed solution of iterative grid-based applications. The environment is built using the MESSENGERS mobile agent system. The main advantage of paradigm-oriented distributed computing is that the user only needs to specify the application-specific sequential code, while the underlying infrastructure takes care of the parallelization and distribution. The two paradigms discussed in this papers are: the finite difference method, and individual-based simulation. These paradigms present some interesting challenges, both in terms of performance (because they require frequent synchronized communication between nodes) and in terms of repeatability (because the mapping of the user space onto the network may change due to load balancing or due to changes in the underlying logical network). We describe their use, implementation, and performance within a mobile agent-based environment.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130354807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040915
Peter Sulatycke, K. Ghose
We present in-core and out-of-core parallel techniques for implementing isosurface rendering based on the notion of span-space buckets. Our in-core technique makes conservative use of the RAM and is amenable to parallelization. The out-of-core variant keeps the amount of data read in the search process to a minimum, visiting only the cells that intersect the isosurface. The out-of-core technique additionally minimizes disk I/O time through in-order seeking, interleaving data records on the disk and by overlapping computational and I/O threads. The overall isosurface rendering time achieved using our out-of-core span space buckets is comparable to that of well-optimized in-core techniques that have enough RAM at their disposal to avoid thrashing. When the RAM size is limited, our out-of-core span-space buckets maintains its performance level while in-core algorithms either start to thrash or must sacrifice performance for a smaller memory footprint.
我们提出了核内和核外并行技术来实现基于跨空间桶概念的等值面渲染。我们的核内技术保守地使用了RAM,并且适合并行化。out-of-core变体将搜索过程中读取的数据量保持在最低限度,只访问与等值面相交的单元格。外核技术还通过顺序查找、交错磁盘上的数据记录以及重叠计算和I/O线程来最小化磁盘I/O时间。使用我们的out-of-core span space bucket获得的整体等面渲染时间与优化良好的in-core技术相当,后者拥有足够的RAM以避免抖动。当RAM大小有限时,我们的外核跨空间桶保持其性能水平,而内核算法要么开始颠簸,要么必须牺牲性能以获得更小的内存占用。
{"title":"Multithreaded isosurface rendering on SMPs using span-space buckets","authors":"Peter Sulatycke, K. Ghose","doi":"10.1109/ICPP.2002.1040915","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040915","url":null,"abstract":"We present in-core and out-of-core parallel techniques for implementing isosurface rendering based on the notion of span-space buckets. Our in-core technique makes conservative use of the RAM and is amenable to parallelization. The out-of-core variant keeps the amount of data read in the search process to a minimum, visiting only the cells that intersect the isosurface. The out-of-core technique additionally minimizes disk I/O time through in-order seeking, interleaving data records on the disk and by overlapping computational and I/O threads. The overall isosurface rendering time achieved using our out-of-core span space buckets is comparable to that of well-optimized in-core techniques that have enough RAM at their disposal to avoid thrashing. When the RAM size is limited, our out-of-core span-space buckets maintains its performance level while in-core algorithms either start to thrash or must sacrifice performance for a smaller memory footprint.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134515656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040895
Umit Rencuzogullari, S. Dwarkadas
Clusters of workstations (COW) offer high performance relative to their cost. Generally these clusters operate as autonomous systems running independent copies of the operating system, where access to machines is not controlled and all users enjoy the same access privileges. While these features are desirable and reduce operating costs, they create adverse effects on parallel applications running on these clusters. Load imbalances are common for parallel applications on COWs due to: 1) variable amount of load on nodes caused by an inherent lack of parallelism, 2) variable resource availability on nodes, and 3) independent scheduling decisions made by the independent schedulers on each node. Our earlier study has shown that an approach combining static program analysis, dynamic load balancing, and scheduler cooperation is effective in countering the adverse effects mentioned above. In our current study, we investigate the scalability of our approach as the number of processors is increased. We further relax the requirement of global synchronization, avoiding the need to use barriers and allowing the use of any other synchronization primitives while still achieving dynamic load balancing. The use of alternative synchronization primitives avoids the inherent vulnerability of barriers to load imbalance. It also allows load balancing to take place at any point in the course of execution, rather than only at a synchronization point, potentially reducing the time the application runs imbalanced. Moreover, load readjustment decisions are made in a distributed fashion, thus preventing any need for processes to globally synchronize in order to redistribute load.
{"title":"A technique for adaptation to available resources on clusters independent of synchronization methods used","authors":"Umit Rencuzogullari, S. Dwarkadas","doi":"10.1109/ICPP.2002.1040895","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040895","url":null,"abstract":"Clusters of workstations (COW) offer high performance relative to their cost. Generally these clusters operate as autonomous systems running independent copies of the operating system, where access to machines is not controlled and all users enjoy the same access privileges. While these features are desirable and reduce operating costs, they create adverse effects on parallel applications running on these clusters. Load imbalances are common for parallel applications on COWs due to: 1) variable amount of load on nodes caused by an inherent lack of parallelism, 2) variable resource availability on nodes, and 3) independent scheduling decisions made by the independent schedulers on each node. Our earlier study has shown that an approach combining static program analysis, dynamic load balancing, and scheduler cooperation is effective in countering the adverse effects mentioned above. In our current study, we investigate the scalability of our approach as the number of processors is increased. We further relax the requirement of global synchronization, avoiding the need to use barriers and allowing the use of any other synchronization primitives while still achieving dynamic load balancing. The use of alternative synchronization primitives avoids the inherent vulnerability of barriers to load imbalance. It also allows load balancing to take place at any point in the course of execution, rather than only at a synchronization point, potentially reducing the time the application runs imbalanced. Moreover, load readjustment decisions are made in a distributed fashion, thus preventing any need for processes to globally synchronize in order to redistribute load.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131618006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040917
Dakai Zhu, Nevine AbouGhazaleh, D. Mossé, R. Melhem
Power aware computing has become popular recently and many techniques have been proposed to manage the energy consumption for traditional real-time applications. We have previously proposed (2001) two greedy slack sharing scheduling algorithms for such applications on multi-processor systems. In this paper, we are concerned mainly with real-time applications that have different execution paths consisting of different number of tasks. The AND/OR graph model is used to represent the application data dependence and control flow. The contribution of this paper is twofold. First, we extend our greedy slack sharing algorithm for traditional applications to deal with applications represented by AND/OR graphs. Then, using the statistical information about the applications, we propose a few variations of speculative scheduling algorithms that intend to save energy by reducing the number of speed changes (and thus the overhead) while ensuring that the applications meet the timing constraints. The performance of the algorithms is analyzed with respect to energy savings. The results obtained show that the greedy scheme is better than some speculative schemes and that the greedy scheme is good enough when a reasonable minimal speed exists in the system.
{"title":"Power aware scheduling for AND/OR graphs in multiprocessor real-time systems","authors":"Dakai Zhu, Nevine AbouGhazaleh, D. Mossé, R. Melhem","doi":"10.1109/ICPP.2002.1040917","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040917","url":null,"abstract":"Power aware computing has become popular recently and many techniques have been proposed to manage the energy consumption for traditional real-time applications. We have previously proposed (2001) two greedy slack sharing scheduling algorithms for such applications on multi-processor systems. In this paper, we are concerned mainly with real-time applications that have different execution paths consisting of different number of tasks. The AND/OR graph model is used to represent the application data dependence and control flow. The contribution of this paper is twofold. First, we extend our greedy slack sharing algorithm for traditional applications to deal with applications represented by AND/OR graphs. Then, using the statistical information about the applications, we propose a few variations of speculative scheduling algorithms that intend to save energy by reducing the number of speed changes (and thus the overhead) while ensuring that the applications meet the timing constraints. The performance of the algorithms is analyzed with respect to energy savings. The results obtained show that the greedy scheme is better than some speculative schemes and that the greedy scheme is good enough when a reasonable minimal speed exists in the system.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114873402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}